Bruce Momjian

Postgres Blog

This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Online status:
Unread Postgres emails:
Email graphs: incoming, outgoing, unread, commits (details)

Presentation Lessons from the Online Conference Trenches

Monday, June 1, 2020

Having presented at two online conferences in the past two weeks, presenting at two this coming week, and presenting at many edb-sponsored webinars, I have learned a few things about online presentations that might be helpful for Postgres organizers, speakers, and attendees:

For Organizers: With no need for speakers and attendees to travel to online conferences, there are many more potential speakers available, and many more potential attendees than for in-person conferences. However, the technical challenges of hosting an online conference are significant because producing and consuming content can require multiple platforms that must be integrated seamlessly for a positive user experience. The content production platform, e.g. Zoom, is often different than the consumption platform, e.g. YouTube, Facebook Live. If the user experience is poor, people will leave because they are not bound to a physical location like an in-person event. Adjusting to the time zones of speakers and attendees can be complex — doing a 24-hour conference like Precona Live (organizer tips) solves many of the time zone problems, but requires moderators from many time zones. The moderator's job is much more extensive for online conferences since they control access, deal with technical problems, and manage the all-important chat channel. For online conferences, chat is the best way to promote attendee engagement. If chat is done well, user engagement during presentations can be even better than in-person conferences.

For Speakers: Just like for organizers, speakers have more technical challenges than in-person conferences — it is harder to engage the audience, more things can go wrong, and attendees can more easily leave. As a speaker, I have a checklist that I always references before each presentation:

  • Mute phone, chat, email, and upgrade notifications
  • Use a laptop on AC power with wired Ethernet, for reliability
  • Have a count-up clock to keep track of the talk duration
  • Use a headset so you don't lean toward a microphone
  • Use a presentation remote so you don't lean forward to change slides
  • Turn on your video camera to increase audience engagement
  • Make sure the lighting is good and the background is uncluttered

For attendees: Many online conferences are a mix of different technologies, not always seamlessly integrated, so anticipate that it will take time to get initially connected. Consider watching the conference on a large television, or from a tablet you can carry around. Use chat to engage with the speaker and other attendees. Feel free to switch to a more interesting presentation without guilt. When you get frustrated, consider how much time you are saving by not having to travel.


Administration Visualizing Collations

Friday, May 29, 2020

There is still significant confusion about characters sets, encodings, and collations. This is because in the real, non-digital world, we usually treat languages, their characters, and ordering as unified, but in the digital world, they are treated separately, and their distinction can be hard to visualize.

These two posted queries illustrate collation in a very creative way. The first query, SELECT chr(i) FROM generate_series(1, 50000) i ORDER BY chr(i) COLLATE "C", outputs characters in their "C" binary order, with ascii as the first 128 characters, successive groups of languages following, and ending with pictographic languages.

The second query, SELECT chr(i) FROM generate_series(1, 50000) i ORDER BY chr(i) COLLATE "en_US.utf8", outputs the same 50,000 characters in "United States English" utf8 order. The output starts with pictographic languages, not ascii. The Latin alphabet appears, but not until line 19068. What is interesting is that there are 118 symbols grouped together that look like 'a', 'a' with diacritics, or have 'a' as part of their symbol. Then 'b' appears with a group of 38 symbols that look like or use 'b', and so on through the Latin alphabet. (If you highlight a character and paste it into a search box, Google will tell you about that Unicode symbol.)

I found it interesting that it groups letters that look like Latin letters, even if the they are not from Latin alphabets and don't sound like Latin letters. Cyrillic is grouped in a section after the Latin alphabet section. These sql queries are the clearest example I have seen of collation ordering.

If I had used a different collation, instead of "United States English", there would have been a different ordering. This is why index storage is sensitive to collations, i.e., indexes with different collations store the same stings in a different order. Collation affects other things like upper/lower case processing, the ordering of query output, and certain optimizations.

Post a Comment

Administration What is an Lsn?

Wednesday, May 27, 2020

You might have seen that there is a pg_lsn data type:

test=> \dTS pg_lsn
              List of data types
   Schema   |  Name  |       Description
 pg_catalog | pg_lsn | PostgreSQL LSN datatype

Client programs pg_receivewal and pg_recvlogical have options that take lsn values, but what is an lsn? It stands for "Log Sequence Number" — it is a 64-bit value that represents a position in the write-ahead log. It is usually displayed as two 32-bit hex values, separated by a slash. For example, pg_controldata displays lsn values:

$ pg_controldata
Latest checkpoint location:           0/15AE1B8
Latest checkpoint's REDO location:    0/15AE180

So, the next time you view or need to specify a write-ahead log location, you are using an lsn.

Post a Comment

Administration Taking Snapshots of Clusters Which Use Tablespaces

Monday, May 25, 2020

Postgres already documents the ability to backup the database cluster using file system snapshots. Unfortunately, database clusters that use tablespaces often cannot use this method if the storage system doesn't support simultaneous snapshots across file systems.

However, simultaneous snapshots across file systems might not be a hard requirement for Postgres snapshot backups. It might be possible for snapshots to be non-simultaneous as long as the write-ahead log that spans the time frame between snapshots is included in the backup, and checkpoints do not happen during that time frame.

Internally, starting Postgres from a snapshot backup replays write-ahead log records to make a single file system snapshot consistent. Potentially, it could do the same for non-simultaneous snapshots of multiple file systems. However, documenting this, giving users a reliable list of steps to perform, and making sure it always works is probably too complex to justify.

Post a Comment

Administration Moving Tables, Indexes, and Tablespaces Between Clusters

Friday, May 22, 2020

Currently, it is impossible to move tables, indexes, and entire tablespaces from one cluster to another — that is because each table and index file is bound to the cluster's infrastructure because of:

  • Table and index definitions
  • pg_xact (commit/abort/in-progress transaction status records)
  • pg_multixact (used for multi-session row locking)

Fyi, you can easily move tablespaces to new directories as long as it remains in the same cluster, and move tables and indexes between tablespaces.

So, how could it be made possible? Freeze can remove references to pg_xact and pg_multixact, assuming there are no active transactions during the freeze operation. Table and index definitions can be more complex, but it certainly seems possible. This requires more research.

Post a Comment

Administration Why Pgdata Should Not Be at the Top of a Mount Point

Wednesday, May 20, 2020

This email thread is illustrative of why it is unwise to place the Postgres data directory (pgdata) at the top of a mount point. Instead, create a subdirectory under the mount point and put pgdata there. This has the advantage of avoiding possible data corruption if mounting fails, and allows more efficient use of pg_upgrade.

Post a Comment

Conference Percona Live Online

Monday, May 18, 2020

I am planning to virtually attend and present at the Percona Live Online conference tomorrow, May 19. It starts at 10am, Eastern us time, and spans 24 hours, so it covers every time zone. I am giving my Will Postgres Live Forever? presentation at noon, Eastern us time.

Attendance is free, so you might want to check it out. I saw some interesting topics on the program. I am also curious to experience a 24-hour virtual conference, though I am unlikely to remain awake that long.

Post a Comment

Administration Using Non-Login Roles

Monday, May 18, 2020

When we talk about database roles, most people immediately think of login roles, which allow people to log in. However, another user management feature is the ability to create non-login roles, formerly called groups. Non-login roles can also be assigned permissions, e.g., via grant, and can have login roles as members. Non-login roles can be even be members of other non-login roles.

What is the value of using non-login roles? They allow a group of people to be assigned as members of a non-login role, and that role can be used to abstract permission assignment. For example, if you have shop foremen, you can configure the login roles of all foremen to be members of a non-login foreman role. As people are added and removed from that staff position, they can be added/removed from the non-login role without the need to change permissions for the foreman role.

A further advantage of non-login roles, as explained in a recent email, is that Postgres can start to suffer performance problems if more than a few dozen roles are granted permission on an object. A much simpler and more manageable solution is to add users to a non-login role and assign object permissions to that non-login role.

View or Post Comments

News Draft of Postgres 13 Release Notes

Friday, May 15, 2020

I have completed the draft version of the Postgres 13 release notes, containing 181 items. The release notes will be continually updated until the final release, which is expected to be in September or October of this year. Beta testing will start in the next few weeks.

View or Post Comments

Internals Avoiding Cache Wipe, Synchronized Scans

Wednesday, May 13, 2020

Whenever you are dealing with a lot of data, it helps to cache it. Postgres does this using shared_buffers. However, one risk of caching data is that a large query that accesses a lot of data might remove frequently-accessed data from the cache; this is called cache wipe. To avoid this, Postgres limits the number of shared buffers used by data operations that are expected to access a lot of data.

Looking at C function GetAccessStrategy(), you can see there are four shared buffer access strategies. The first one, bas_normal, is used for normal scans; the rest are to avoiding cache wiping:

  • bas_bulkread is enabled for large reads and uses 256 kilobytes of shared buffers (typically 32 8kB shared buffers)
  • bas_bulkwrite: is enabled for large writes and uses 16 megabytes of shared buffers
  • bas_vacuum is for vacuum operations and uses 256 kilobytes of shared buffers

For example, function initscan() uses bas_bulkread if the scan is expected to access more than one-quarter of shared buffers. Similarly, table rewrites, create table as, and copy from use bas_bulkwrite. Bas_bulkwrite is larger because we can't discard written buffers from the cache until they are written to storage, unlike unmodified buffers which can be discarded anytime.

You might wonder, with operations using so few shared buffers, what happens if another session needs to scan the same data? Doesn't performance suffer? Well, another Postgres facility, that was developed independently, helps with this: synchronized scans. The top of syncscan.c explains it well:

When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. The goal is to read each page into shared buffer cache only once, and let all backends that take part in the shared scan process the page before it falls out of the cache.

Both synchronized scans and shared buffer access strategies work automatically, so most Postgres users don't even know they exist, but they do improve Postgres performance.

View or Post Comments

Internals Why Do We Freeze?

Monday, May 11, 2020

You might have seen autovacuum running, and noticed that it sometimes performs freeze operations on transaction ids (32 bits) and multi-xacts (used for multi-session row locking). The frequency of freeze operations is controlled by autovacuum_freeze_max_age and autovacuum_multixact_freeze_max_age. You can reduce the frequency of freezing if you are sure the freeze operation will complete before transaction wraparound is reached.

There is regular discussion about how freezing could be avoided, and this email does the best job of explaining the options. We could expand transaction ids to 64 bits, either on each row or perhaps with a page-level default, but pg_xact (commit/abort/in-progress transaction status records) still need cleanup. This area probably needs more thought.

View or Post Comments

Internals Postgres Internals Website

Friday, May 8, 2020

I am often asked how someone can learn more about Postgres, particularly the internals. There is the Postgres developer page, which has links to many resources, and the developer's faq item about learning about the internals. One link on that page I was not aware of is Hironobu Suzuki's very detailed website about Postgres internals. It has a lot of details I have never seen written before, so I suggest those interested should check it out.

View or Post Comments

Internals Portability's Surprising Win

Wednesday, May 6, 2020

When writing software, it is necessary to decide whether to use external facilities available in command-line tools, libraries, frameworks, and the operating system, or write the facilities yourself. Why would you write them yourself? You might be worried about adding reliance on an external facility or a facility might not have sufficient flexibility or performance.

The Postgres development team has had to make similar decisions. Fortunately, we have tended to favor reliance on common operating system interfaces, tools, and libraries, e.g., OpenSSL, bison. We have avoided reliance on external facilities that are uncommon or not well maintained.

Postgres has reproduced facilities that were commonly available in the operating system, tools, or libraries only when there was a clear benefit. Reproducing such facilities for a small benefit, like a little more control or a little more performance, is rarely wise. While relying on external facilities often makes Postgres less flexible and perhaps less performant, there are long-term benefits:

  • As external facilities improve their feature-set and performance, Postgres benefits from these improvements with little effort
  • As new demands are required of these external facilities, Postgres again benefits effortlessly

Let's be specific. In the early days of Postgres, file systems were not optimized for database storage. Fortunately, Postgres never implemented file systems on raw devices. Now that modern file systems, like ext4, give good database performance, Postgres benefits from file system improvements with almost no effort. Even when ssds started being used, the only change needed in Postgres was the ability to set random_page_cost at the tablespace level to handle databases where some tablespaces are on ssds and some are on magnetic storage.

When virtualization, cloud, containers, and container orchestration (e.g., Kubernetes) became popular, Postgres had to do almost nothing to run well on these platforms. (It is true that some of the Postgres enterprise deployment tools required repackaging and re-engineering.)

Because of this philosophy, Postgres has remained relatively light-weight compared to other relational database systems, and this has benefited Postgres in environments where nimble deployments are favored.

View or Post Comments

Performance With ... Materialized and Optimizer Control

Monday, May 4, 2020

Before Postgres 12, queries specified as common table expressions (with clauses) always behaved as optimization barriers, meaning that common table expression queries were executed independently, and were not moved to later parts of the query.

Starting in Postgres 12, if a common table expression is referenced only once, and the keyword materialized is not used, it can be moved to a place later in the query where it can be better optimized; this improves optimization possibilities. However, if the movement of common table expression queries increases the from clause table count above the geqo_threshold, Postgres will decide it can't efficiently optimize such a high table count query and will use the genetic query optimizer.

So, while the new Postgres 12 behavior of in-lining common table expressions usually increases the quality of optimized plans, in some cases it can decrease them by enabling the genetic query optimizer. In a way, in pre-Postgres 12 or with the use of materialized, the query author is doing the optimization by creating common table expressions, while in other cases, the optimizer has greater control, though even the optimizer can determine the query is too complex and fall back to less-precise genetic query optimization.

View or Post Comments

Performance Background Writes

Friday, May 1, 2020

Postgres must guarantee durability and good performance. To meet these objectives, Postgres does writes to the file system and storage in the background as much as possible. In fact, there are only two major cases where writes happen in the foreground:

  1. Write-ahead log writes happen before commits are acknowledged to the client
  2. A needed shared_buffer is dirty and must be written to storage so it can be replaced

Write-ahead log writes (#1) can be controlled using various settings. Dirty shared buffer writes (#2) that happen in the foreground are minimized if the background writer is operating efficiently. You can monitor such writes by viewing the probe buffer-write-dirty-start and buffer-write-dirty-done.

View or Post Comments

Performance Optimal Use of Ssds

Wednesday, April 29, 2020

Ssds have different performance characteristics than magnetic disks, and using them optimally isn't always clear. Ssds have several performance benefits:

  1. Very fast fsyncs
  2. Much faster random reads and writes
  3. Faster sequential reads and writes

So, if all your data is stored on ssds, you will certainly improve performance. If you are mixing ssds and magnetic disks, ideally you should use ssds in ways that give the greatest benefit. Starting with number one, putting the write-ahead log on ssds is a great way to improve fsync performance.

For number two, moving indexes to tablespaces using ssd storage can greatly improve performance because index access is usually random. When using ssds, the default value for random_page_cost should be lowered, perhaps to 1.1. This can be set at the tablespace level if there is a mix of tablespaces on ssds and magnetic disks.

For number three, it is also possible to create tablespaces on ssds for current data, and place archive data on tablespaces that use magnetic disks. By using table partitioning, a partitioned table can transparently span ssds and magnetic disk tablespaces.

View or Post Comments

Performance Does Postgres Support Compression?

Monday, April 27, 2020

I am often asked if Postgres supports compression, and my answer is always a complicated dance around what "compression" level they are asking about. There are six possible levels of database compression:

  1. single field
  2. across rows in a single page
  3. across rows in a single column
  4. across all columns and rows in a table
  5. across tables in a database
  6. across databases

Number one (single field) is currently done by toast. Number two (across rows in a single page) is a practical optimization where a compression routine blindly looks for repeating values in a page without understanding its structure. The difficulty of implementing this happens when a page is stored using its compressed length (rather than the uncompressed 8k), the page contents change, and the new contents compress less well than the previous contents. In this case, the compressed page contents would be larger and it would be very complex to fit the page into the existing space in the file. A different file layout is really required for this, so pages can be placed anywhere in the file, without affecting index access. A team is working on adding this feature using Postgres's table access method interface.

Number three (across rows in a single column) is the classic definition of a columnar database. A team is also working on that. Just like number two, this requires using a different storage layout than Postgres's default, and the table access method interface makes this possible.

Number four can be done using file system compression. Numbers five and six would be nice, but it unclear how this could be done efficiently without adding unacceptable complexity to the database.

View or Post Comments

Performance Multi-Host Technologies

Friday, April 24, 2020

There are so many multi-host technologies and it is hard to remember the benefits of each one, so I decided to create a list:

High availability: Streaming replication is the simplest way to have multiple copies of your database, ready for fail over

Read scaling: Pgpool allows replicas (slide 17) to handle a database's read-only workload

Write scaling: Sharding allows for write scaling

Partial replication: Logical replication allows partial replication

Reduce latency: Multi-master replication allows servers to be located close to users, reducing transmission latency

Hopefully this is helpful to people.

View or Post Comments

Performance Performance Goalposts

Wednesday, April 22, 2020

In talking to EnterpriseDB customers, I am often asked about the performance limits of Postgres: How many connections can it handle? How many tps?

Well, those are good questions, but it is hard to give accurate answers since so much depends on the hardware and workload. Eventually, testing of actual workloads on intended hardware has to be done, but not giving ball-park answers is unhelpful. What answer can I give? Well, I came up with this chart:

1Connections< 250direct connect
2Queries< 250pgbouncer
3Write queries< 250Pgpool with read-only replicas (slide 17)
4Write queries>= 250sharding

Earlier items use simpler architectures than later items, and are therefore preferred.

For under 250 simultaneous connections(#1), it isn't generally necessary to use any complex architecture, though using a pooler is recommended for workloads with many short-lived sessions to reduce connection startup time. Even when under this limit, performance can be limited by the number of cpus. Oltp databases typically cannot process more than 5 x cpu cores without having to time-slice among cpus. Olap uses 2 x cpu cores, or less if parallelism is used.

For over 250 simultaneous connections but under 250 simultaneous queries(#2), a connection pooler can be beneficial because it reduces the overhead of managing many open connections. For over 250 simultaneous queries but under 250 simultaneous write queries(#3), a combination of Pgpool with read queries routed to streaming replicas can be very efficient. For over 250 simultaneous write queries(#4), a sharding solution should be considered.

These numbers can vary greatly based on workload and hardware, but I think they are good starting points to consider.

View or Post Comments

Performance Fast Enough?

Monday, April 20, 2020

Everyone one wants their software to perform as fast as possible. Some people think that unless the program ends before it begins, it's too slow.

However, realistically, making something as fast as possible is not a universal good, meaning that increasing performance beyond a certain point can cause problems that far exceed the value of the improved performance. Let's give some examples:

  • Most people use high-level languages that are compiled into cpu instructions or interpreted while being run. In an ideal world, every program would be written in assembly language. (Postgres does use some assembly language code for locking.) However, writing something like a database in assembly language, though it might give slightly better performance if developers knew the behavior of every cpu, would be a huge challenge for even the most expert developers, and code maintenance and feature additions might be impossible. In general, there just isn't enough benefit to using assembly language for anything but the most discrete, performance-critical functions.
  • Java isn't often chosen for its performance, but rather its ability to allow development teams to produce complex software efficiently. Lighter-weight languages might give better performance, but they don't offer the same development efficiency as Java.
  • Sql is a heavy-weight way to request data, but it is very efficient for developers since they can express their requests in a declarative way. You could probably write a custom data storage program to run faster than sql, but the effort involved to create and maintain it would be nearly insurmountable.
  • You can even layout data in sql in more efficient ways, and sometimes it is worth it, but data maintainability, access flexibility, and storage efficiency can suffer.

The bottom line is that performance is rarely a universal good — it has to be balanced against development time, solution flexibility, and maintainability. Sometimes people suggest that Postgres should offer some "super fancy" optimization, and sometimes we can implement it, but we always have to balance development time, solution flexibility, and maintainability with improved performance.

View or Post Comments

Community No Travel

Friday, April 17, 2020

With the Coronavirus outbreak, almost all Postgres events through June have been either cancelled, rescheduled, or moved online. This has given me time to consider my past travel. I have been blessed to visit so many places, not as a tourist, but rather a guest. I see countries more as a native than as a tourist, and I have many event organizers and hosts to thank for this.

It is hard to know when on-site events will resume, but I can remember what it was like to travel roughly 90 days a year. Going to social outings at home often felt riding a train through my home town, waving from behind a train window to my friends on the platform. I would sit at home and wonder how long I would be there until I had to leave again. Now, sitting at home, the images of our family travels appearing on our kitchen slideshow show places that seem farther away than ever, and I wonder if I will ever see these places again.

I am sure many others have similar feelings, and I have hope that, someday, we will all return to the road to spend time together again.

View or Post Comments

Thoughts Database Interoperability at Risk

Monday, March 16, 2020

This article parallels Oracle's copying of the sql syntax from ibm in the late 1970's with Google's copying of the Java api. It also explains the possible impact of the current case soon to be decided by the US Supreme Court.

One thing it does not fully cover is the impact on sql-level interoperability between databases. If Oracle can claim the Java api as copyrightable, the sql language could be considered copyrightable, allowing ibm to sue all relational database vendors and users for copyright infringement. It might also allow database vendors to sue competitors when their sql-level features are copied, requiring either huge payments or the removal of interoperability syntax.

The Postgres copyright is open, so any database vendor copying Postgres syntax is fine, but Postgres copying the syntax of other databases could be problematic. Relational database interoperability has been of huge benefit to data management, and this single case could call that into question.

View or Post Comments

Presentation Databases, Containers, and the Cloud

Saturday, March 7, 2020

A few months ago, I wrote a high-level presentation about the deployment benefits of using containers and cloud infrastructure for databases. I am now ready to share this presentation. I have also added QR codes to the first and last slides of all my presentations, linking to my website.

View or Post Comments