Documentation Built-In Sharding Wiki Page

Thursday, December 1, 2016

As part of today's PgConf.Asia 2016 Developer Meeting, I was asked to write a wiki page with a rough outline of how built-in sharding could be implemented. I have now created such a page. We also discussed built-in sharding at today's unconference.

View or Post Comments

Conference Thirteen Conferences in Four Months

Friday, November 18, 2016

While considering future conference attendance, I realized there are 13 dedicated Postgres conferences or PGDays in the next four months, and nicely spread worldwide:

The continent/region count is:

Unfortunately, they are not spread chronologically — 8 of the 13 are in March of 2017. You can see most of these listed on the Postgres events web page.

View or Post Comments

Presentation Postgres Videos

Thursday, October 6, 2016

Having spoken at many Postgres events over the years, I have accumulated 25+ talks, whose slides are all on my website under a Creative Commons license. Fortunately, many of these conferences have recorded my presentations. In the past, I have added video links next to each event as I found them. Unfortunately, unless website visitors chose to view event locations using the link at the top of the page, they wouldn't see the video links. Using some JavaScript trickery, all video links now appear by default, and also next to locations if you choose to view them. This should make it easier for website visitors to find my presentation videos. (If you are aware of any missing videos, please let me know.)

The community has done a great job of creating wiki pages for most of the Postgres conferences with links to the slides. Unfortunately there are no links to videos matching the slide decks. It would be great if someone would add such links. For example, the PGConf US's Youtube page has 89 videos from 2015 and 2016 conferences. It would be great if those could be linked to from the Postgres wiki. The same could be done for other conferences who publish videos.

In addition, it would be nice if presentation and video URLs were loaded into a database (maybe Postgres ) so they can be categorized and searched, e.g., it would be great to see all the performance or security slides and videos listed together on one web page. In summary, I feel we are not fully utilizing the work that has been done by conferences to disseminate the proceedings of their conferences. These high-quality proceedings are valuable resources for our growing user-base.

As a background, conference videos are very time-consuming to produce and edit, so they are often released months after the conference. Because the conference has long passed, it is often hard to publicize their release to attendees and a wider audience. I think giving these videos more prominence would benefit the community greatly.

View or Post Comments

Thoughts Analyzing Postgres Adoption

Monday, October 3, 2016

As an open source project, the Postgres community has always had great difficulty in measuring many aspects of Postgres adoption. For example, how many people use Postgres? We don't know, because people can get Postgres from so many sources, and we have no easy way to track them. Surveys tell us that Postgres is probably the fourth most popular database, but more detailed information has proven elusive. We do get detailed Postgres case studies occasionally, e.g., Yandex Mail, but these are, of course, single-user reports.

Fortunately, EnterpriseDB commissioned idc to study its own Postgres customers. The recently-released report has some valuable information, both for users considering Postgres and for Postgres support companies trying to convince users to choose Postgres.

The report only surveyed seven EnterpriseDB customers, but that is probably a representative sample of enterprises using Postgres. The study interviewed each of them and got interesting statistics about administrative overhead, deployment flexibility, and the amount of money saved. It also has some nice tables and customer quotes.

Idc will be presenting on Wednesday afternoon at the Postgres Vision conference. It might be nice to get this data repackaged for other conferences — I rarely see management-oriented, data-driven talks about why to choose Postgres. Fortunately, there is a recording (email registration required) of a webcast presenting this material.

View or Post Comments

Business Yandex Mail Completes Three-Year Migration from Oracle to Postgres

Wednesday, September 28, 2016

It is not every day you get to hear about how a big organization switched from Oracle to Postgres, but it was covered this year at PGCon in Ottawa. I just got time to watch the video (English/Russian slides) of Vladimir Borodin from Yandex Mail explaining the massive migration.

He goes into great detail about how they planned and deployed the migration of 300TB across ninety Postgres shards serving 200+ million users. I loved the line he mentioned twice, "We expected lots of problems, and there weren't." Of course, slide 41 is their wish list of features, but fortunately most of those are being worked on.

I am attending Highload++ this year in Moscow, and I am sure that Yandex will be a hot topic. The migration was also recently discussed on Hacker News. If you are interested in the challenges of deploying Postgres in large enterprises, the management-focused Postgres Vision conference in two weeks would be a good choice. I know Oleg Bartunov will be there and perhaps he will share more details. Of course, there are many other conferences this season, and several I am attending.

View or Post Comments

Documentation Postgres 9.6 Features

Wednesday, September 21, 2016

I have written a presentation covering the important features in Postgres 9.6 and some of the features we hope for in Postgres 10.

View or Post Comments

Optimizer Be Prepared for Prepare

Wednesday, June 15, 2016

Prepared statements are queries where the constants used in the query are separated from the query text. While this improves security by avoiding SQL injection attacks, it also allows repeatedly-executed queries to avoid parsing and planning overhead by executing saved generic plans that represent queries with typical constants. While generic plans don't have the advantage of being planned based on the statistics of specific constants, the avoidance of parsing and planning overhead is often more beneficial.

Before Postgres 9.2, generic plans were used for all prepared queries. In Postgres 9.2, logic was added to use a generic plan only if it has a cheaper cost after five or more executions. Unfortunately, this behavior was undocumented, causing confusion for users surprised to find the explain plans of prepared queries changing after five executions, sometimes for the worse.

After much discussion, this has been remedied by improving the Notes section of the 9.6 prepare manual page. It now explains how generic plans are created, and when they are used. I have also created an sql script that, when run through psql (output), illustrates the documented behavior.

View or Post Comments

Conference Lots-O-Travel

Tuesday, May 31, 2016

Since January, I have had the pleasure of speaking about Postgres in 15 cities: Singapore, Seoul, Tokyo, San Francisco, Los Angeles, Phoenix, St. Louis, Bloomington (Illinois), Chicago, Charlotte, New York City, Brussels, Helsinki, Moscow, and Krasnoyarsk (Siberia).

I am particularly excited about the new cities I visited in Asia, and growth there will continue in the coming months. You can see from my travel map that the two areas still lacking Postgres activity are the Middle East and Africa. Fortunately, Umair Shahid has already started on the Middle East.

In more Postgres-saturated continents, like North America and Europe, there are now several conferences per year during different months, in different cities, and with different focuses. While proprietary database companies usually have just one huge conference a year per continent, our distributed conference teams allow for smaller, more frequent, more geographically distributed conferences, which better meet the needs of our users. Smaller conferences allow for more interaction with speakers and leaders. More frequent conferences allow people to attend a conference quickly, rather than waiting eleven months for the next yearly conference. Geographically distributed conferences allow for reduced travel costs, which is particularly important for first-time attendees. I realize seeing thousands of Postgres people together is motivating, but once that wears off, the benefits of more, smaller conferences are hard to beat.

View or Post Comments

Presentation Non-Relational Postgres

Monday, April 25, 2016

Having worked on Postgres for twenty years, I have seen a lot of features added. However, I usually evaluate new features in isolation and rarely see the big picture of how they fit together. Last month I created a new talk, Non-Relational Postgres, which presents a comprehensive overview of non-relational Postgres storage.

The consistent use of @> (containment operator) listed on slide 70 is interesting. It also has good coverage of how character strings can be split apart and indexed.

View or Post Comments

Security Layers of Security

Saturday, April 9, 2016

Security can be complex, and for databases, it can be very complex because there are so many different users and access patterns. Postgres supports nine layers of security from the client application to a specific data row. I thought it would be interesting to list them, in order:

  1. listen_addresses: This controls what interfaces the server listens to. localhost listens just on the local network interface (e.g., 127.0.0.1), while * listens on all interfaces. unix_socket_directories, unix_socket_group, and unix_socket_permissions control Unix-domain sockets in a similar manner.
  2. pg_hba.conf: This file allows specification of eleven authentication methods, filtered by host name, IP address, network range, database name, user name, and ssl usage.
  3. nologin: Setting a user to nologin prevents future logins.
  4. allow_connections: Setting the database option allow_connections to false prevents future connections to that database.
  5. grant ... connect: This controls which roles can connect to the specified database.
  6. connection limit: This per-database and per-user setting limits the number of connections; a zero value prevents new connections.
  7. grant ... schema: Usage permission is required to access objects inside a schema. (By default, the public schema has usage and create permissions granted to the public role.)
  8. grant ... table: Per-command access, e.g., select, insert, is configurable per table.
  9. row security: New in Postgres 9.5, this allows row-level access control using a create policy expression.

So, the next time you can't access something, consider the security layers involved.

View or Post Comments

Performance Cpus Are Slowing Us Down

Friday, April 1, 2016

In my Database Hardware Selection Guidelines talk, I emphasized that database servers are usually limited by I/O and memory constraints, not cpu. However, on slide 24 I mentioned one use case that is cpu-bound — read-only workloads where the working set fits into ram. Over time, this distinction has gotten more pronounced with the addition of faster I/O (SSDs) and larger-memory systems. What hasn't improved much is cpu speed, though cpu core count has certainly increased.

A subset of cpu-bound workloads are data warehouse queries, where the working set fits into ram and a large percentage of time is spent in the executor. (The executor runs a state machine, a "plan", created by the optimizer.) This workload distinction is well outlined on slide 5 of a Vitesse DB talk. For queries that spend most of their time in the executor and process data already in ram, executor overhead is significant.

There were two presentations at the recent PGDay Asia conference that highlight projects designed to reduce executor overhead, leading to massive speedups for specific workloads. The more limited approach was by Kumar Rajeev Rastogi of Huawei. Their approach is to create shared object libraries at table creation time that know about the column structure of each table — this allow specific columns to be accessed rapidly by reducing row access overhead.

A more radical approach is taken by Vitesse DB (closed source). At runtime, Vitesse DB compiles optimizer-generated plans into object files that are run during execution. This avoids executor state-machine overhead. Plans that cannot be compiled are run using the normal state-machine executor. (Both projects use llvm.)

This clearly shows that there are workloads where the overhead of the executor is significant, and that groups trying to solve this problem. I think the Postgres community distribution is going to need to address this someday as well. (While parallelism and sharding allow multiple cpus to be used for a single query, they don't improve single-cpu throughput.)

View or Post Comments

Business Oracle Attacks Postgres in Russia

Tuesday, March 22, 2016

During my twenty years with Postgres, I knew the day would come when proprietary databases could no longer ignore Postgres and would start attacking us.

Well, that day has come, at least in Russia. During the past few weeks, Oracle sent a letter (Russian, English translation) to Russian partners and customers comparing Oracle favorably to Postgres as a way of cirumventing a new law favoring Russian-produced software. This is the first direct attack I have seen on Postgres, and is probably representative of the kinds of attacks we will see from other vendors and in other countries in the years to come.

The press has picked up on the news (Russian, English) and given balanced coverage. Comments on the English article were, in general, positive — I particularly liked this one. There are two Hacker News threads about it (1, 2), a community thread about it, and another community thread about Oracle RAC.

Taken together, this is an interesting time for Postgres.

View or Post Comments

Multi-host The Plan for FDW-Based Sharding

Thursday, February 25, 2016

Earlier this week I posted an email to the hackers list outlining a plan for built-in sharding. Later emails in the thread clarify many of my initial remarks. (I also posted a presentation about this a year ago.)

The emails detail the steps being taken to enhance existing Postgres features (foreign data wrappers (FDW), parallelism, partitioning) to minimize the code changes necessary to add sharding. While there is still uncertainty about the final design, what workloads it will support, and even the probability of success, steady progress is being made. As feature enhancements are completed, the path to built-in Postgres sharding will become clearer.

View or Post Comments