Bruce Momjian

Postgres Blog


This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Online status:
Unread Postgres emails:
Email graphs: incoming, outgoing, unread, commits (details)



Conference Submitting Talks to Conferences

Monday, November 5, 2018

Having attended many conferences, I have a few suggestions on how to submit successful conference talks. First, determine the type of conference. Then, try to submit talks that match the conference type; possible topics include:

  • New Postgres features
  • User case studies
  • Internals
  • New workloads
  • Performance
  • Application development

Of course, only some of these topics match specific types of conferences.

Second, submit multiple talks. It is very possible that someone better known than you, or someone with a better abstract, will also submit to the conference. By submitting more than one topic, you increase your chances of submitting something unique and interesting.

Third, try to stand out. Maybe you have an interesting story about your topic, or an image that captures what you are going to talk about. Try to get that in front of the people who will be deciding if your talk should be accepted. Have you presented successfully before? Find a way to mention that too.

And finally, try to submit something that will attract attendees or will leave a lasting impression on your audience. You have to think creatively to accomplish this.

Oh, and once you are accepted, the hard part begins — you have to write your slides. However, don't think that some amazing idea will come to you 30 minutes before you present your talk. Accept the unlikelihood of that and create your slides weeks before your present them. Then, if you think of a way to improve your talk in the next few weeks, you have time to improve it. Creating slides solidifies your ideas, and often improvements will come to you as you think about how you will present your talk. A word of warning — don't change your slides the day of your talk; you are more likely to confuse yourself with unfamiliar slides than to improve the content.

 


Community Users vs. Developers

Thursday, November 1, 2018

Some open source projects have a distinction between the developers of the open source software and its users. Since Postgres was originally developed in a university, and none of the university developers continued when Internet-based development started in 1996, all our active developers see themselves as stewards of code developed before we arrived. This causes a flatter organizational structure and helps to forge closer user/developer ties.

Post a Comment

Community Long-Haul Behavior

Friday, October 19, 2018

People new to the Postgres community are often confused by the deliberateness of the community on some issues, and the willingness to break backward compatibility in other areas. The source code was created 32 years ago, and many of us have been involved with Postgres for 10-20 years. With that length of involvement, we are acutely aware of the long-term consequences of our decisions. Many proprietary and open source developers don't think years in advance, but with Postgres, it is our normal behavior.

This leads to some unusual Postgres behaviors:

  • Weekly source code comment improvements
  • Yearly source code reformatting
  • Exhaustive discussions about even minor changes, particularly user-visible ones
  • Api redesigns

This focus on detail often strikes new users as unusual, but it makes sense when a multi-year view is considered.

Post a Comment

Community "Get Off My Lawn"

Wednesday, October 17, 2018

As you might know, Postgres is old-school in its communication methods, relying on email for the bulk of its development discussion, bug reports, and general assistance. That's the way it was done in 1996, and we are still doing it today. Of course, some projects have moved on to github or Slack.

As a project, we try to take advantage of new technologies while retaining technologies that are still optimal, and email is arguably one of them. Email hasn't changed much since 1996, but email clients have. Previously all email was composed using a text-editor-like interface, which allowed for complex formatting and editing. New email tools, like Gmail, provide a more simplified interface, especially on mobile devices. This simplified interface is great for composing emails while commuting on a train, but less than ideal for communicating complex topics to thousands of people.

This email captures the requirements of communicating technical points to thousands of people, and the need for fine-grained composition. Basically, when you are emailing thousands of people, taking time to make the communication clear is worthwhile. However, this email explains the reality that many email tools are more tailored for effortless communication on devices with limited features.

I am not sure what the community can do to improve the situation. Crafting complex technical communication is always going to be important for this project, but the tooling available to make this happen is getting rarer.

Update: This blog post explains many of the mechanics of email lists vs more modern methods. 2018-10-17

Post a Comment

Community Three-Year Cycle

Monday, October 15, 2018

In the early years of Postgres's open source development, we focused on features that could be accomplished in a few weekends. Within a few years, we had completed many of those, and were challenged trying to accomplish big project with a mostly volunteer workforce. Soon, however, large companies started to sponsor developers' time and we launched into big feature development again.

Currently, the calendar is our only real challenge. We have major releases every year, but many features take multiple years to fully implement. We have seen this multi-year process with:

  • Windows port
  • Point-in-time recovery
  • Json
  • Streaming replication

and are in the process of completing even more:

So, if it seems like features appear in a limited form in one release, and the next two-to-three releases finally complete the feature, it is not your imagination.

Post a Comment

Client Installing PL/v8

Friday, October 12, 2018

PL/v8 is the JavaScript server-side language for Postgres. It has been available for several years, but a change in the way Google packages the v8 languages has made it burdensome for packagers to build PL/v8 packages.

Therefore, few package managers still distribute PL/v8. This is disappointing since it undercuts some of our NoSQL story. We previously supported json storage and a json-specific server-side language. This second option is effectively no longer available, and those users who are using PL/v8 will need to find alternatives. This highlights the risk of software significantly relying on other software that it does not control and cannot maintain itself.

Post a Comment

Client Multi-Host Libpq

Wednesday, October 10, 2018

Libpq is used by many client interface languages to communicate with the Postgres server. One new feature in Postgres 10 is the ability to specify multiple servers for connection attempts. Specifically, it allows the connection string to contain multiple sets of host, hostaddr, and port values. These are tried until one connects.

NoSQL solutions have used this method of multi-host connectivity for a while, so it is good Postgres can now do it too. It doesn't have all the features of a separate connection pooler, but it doesn't have the administrative or performance overhead of a separate connection pooler either, so it certainly fits a need for some environments.

Post a Comment

Administration Trigger Me Writable

Tuesday, October 9, 2018

Postgres support for hot standby servers allows read-only queries to be run on standby servers, but how are read-only sessions handled when promoting a standby server to primary? After a standby is promoted to primary, new connections are read/write, but existing connections also change to read/write:

SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
 
SHOW transaction_read_only;
 transaction_read_only
-----------------------
 on
 
\! touch /u/pg/data2/primary.trigger
 
-- wait five seconds for the trigger file to be detected
CREATE TABLE test (x INTEGER);
 
SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
 
SHOW transaction_read_only;
 transaction_read_only
-----------------------
 off

Notice the session did not need to disconnect and reconnect — it was promoted to read/write automatically.

View or Post Comments

Administration Moving Tablespaces

Wednesday, October 3, 2018

Tablespaces are designed to allow Postgres clusters to be spread across multiple storage devices. Create tablespace creates a symbolic link in the pg_tblspc directory in the cluster's data directory pointing to the newly-created tablespace directory.

Unfortunately, though there is a command to move tables and indexes between tablespaces, there is no command to move tablespaces to different directories. However, since Postgres 9.2, the process of moving tablespaces is quite simple:

  1. Record the oid of the tablespace you want to move
  2. Shut down the Postgres cluster
  3. Move the tablespace directory, either within the same file system or to a different file system
  4. Update the oid symbolic link that represents the moved tablespace to the new tablespace directory location
  5. Restart the server

Here's an example of moving a tablespace:

$ mkdir /u/postgres/test_tblspc
 
$ psql test
 
CREATE TABLESPACE test_tblspc LOCATION '/u/postgres/test_tblspc';
 
CREATE TABLE test_table (x int) TABLESPACE test_tblspc;
 
INSERT INTO test_table VALUES (1);
 
SELECT oid, * FROM pg_tablespace;
  oid  |   spcname   | spcowner | spcacl | spcoptions
-------+-------------+----------+--------+------------
  1663 | pg_default  |       10 |        |
  1664 | pg_global   |       10 |        |
 16385 | test_tblspc |       10 |        |
 
SELECT pg_tablespace_location(16385);
 pg_tablespace_location
-------------------------
 /u/postgres/test_tblspc
 
\q
 
$ pg_ctl stop
 
$ mv /u/postgres/test_tblspc /u/postgres/test2_tblspc/
 
$ cd $PGDATA/pg_tblspc/
 
$ ls -l
lrwxrwxrwx 1 postgres postgres 23 Sep  5 22:20 16385 -> /u/postgres/test_tblspc
 
$ ln -fs /u/postgres/test2_tblspc 16385
 
$ ls -l
lrwxrwxrwx 1 root root 24 Sep  5 22:25 16385 -> /u/postgres/test2_tblspc
 
$ pg_ctl start
 
$ psql test
 
SELECT * FROM test_table;
 x
---
 1
 
SELECT pg_tablespace_location(16385);
  pg_tablespace_location
--------------------------
 /u/postgres/test2_tblspc

View or Post Comments

Administration Switchover/Failover and Session Migration

Monday, October 1, 2018

I have already covered switchover and failover. In the case of failover, the old primary is offline so there are no options for migrating clients from the old primary to the new primary. If there is a switchover, there are options for client migration. Assuming you are using streaming replication, only one server can accept writes. Therefore, for client migration, you can either:

  • Force all clients on the old primary to exit, then promote the new primary
  • Wait for all clients on the old primary to exit, then promote the new primary

If you choose force, it will disrupt applications; they must be designed to handle disconnections and potentially reconfigure their sessions, e.g., session variables, cursors, open transactions. If you choose wait, how do you handle clients that want to connect while you are waiting for the existing clients to disconnect? Minor upgrades, which require a restart of the database server, have similar issues.

The only clean solution is to use multi-master replication so new clients can connect to the new primary while waiting for old-primary clients to finish and disconnect. However, it is operationally expensive to support multi-master just to minimize switchover disruption.

View or Post Comments

Presentation Postgres 11 Features Presentation

Friday, September 14, 2018

Now that I have given a presentation about Postgres 11 features in New York City, I have made my slides available online.

View or Post Comments

Administration Multi-Host Pg_dump

Wednesday, September 12, 2018

You have probably looked at logical dumps as supported by pg_dump and restores by pg_restore or, more simply, psql. What you might not have realized are the many options for dumping and restoring when multiple computers are involved.

The most simple case is dumping and restoring on the same server:

$ pg_dump -h localhost -Fc test > /home/postgres/dump.sql
$ pg_restore -h localhost test < /home/postgres/dump.sql

or with a plain text dump:

$ pg_dump -h localhost -f /home/postgres/dump.sql test
$ psql -h localhost -f /home/postgres/dump.sql test

Where this gets interesting is with multiple hosts. You can:

$ # dump a remote database to your local machine
$ pg_dump -h remotedb.mydomain.com -f /home/postgres/dump.sql test
 
$ # dump a local database and write to a remote machine
$ pg_dump -h remotedb.mydomain.com test | ssh postgres@remotedb.mydomain.com 'cat > dump.sql'
 
$ # dump a remote database and write to the same remote machine
$ pg_dump -h remotedb.mydomain.com test | ssh postgres@remotedb.mydomain.com 'cat > dump.sql'
 
$ # or a different remote machine
$ pg_dump -h remotedb1.mydomain.com test | ssh postgres@remotedb2.mydomain.com 'cat > dump.sql'

You also have similar restore options. I will use psql below but pg_restore works the same:

$ # dump a remote database and restore to your local machine
$ pg_dump -h remotedb.mydomain.com test1 | psql test2
 
$ # dump a local database and restore to a remote machine
$ pg_dump -h remotedb.mydomain.com test | ssh postgres@remotedb.mydomain.com 'psql test'
 
$ # dump a remote database and restore to the same remote machine
$ pg_dump -h remotedb.mydomain.com test1 | ssh postgres@remotedb.mydomain.com 'psql test2'
 
$ # or a different remote machine
$ pg_dump -h remotedb1.mydomain.com test | ssh postgres@remotedb2.mydomain.com 'psql test'

As you can see, there is a lot of flexibility possible.

View or Post Comments

Administration Monitoring Complexity

Monday, September 10, 2018

I have always had trouble understanding the many monitoring options available in Postgres. I was finally able to collect all popular monitoring tools into a single chart (slide 96). It shows the various levels of monitoring: OS, process, query, parser, planner, executor. It also separates instant-in-time reporting and across-time alerting/aggregation options.

View or Post Comments

Security Signing Rows

Friday, September 7, 2018

With the rsa keys created in my previous blog entry, we can now properly sign rows to provide integrity and non-repudiation, which we did not have before. To show this, let's create a modified version of the previous schema by renaming the last column to signature:

CREATE TABLE secure_demo2 (
        id SERIAL, car_type TEXT, license TEXT, activity TEXT, 
        event_timestamp TIMESTAMP WITH TIME ZONE, username NAME, signature BYTEA);

Now, let's do the insert as before:

INSERT INTO secure_demo2
VALUES (DEFAULT, 'Mazda Miata', 'AWR-331', 'flat tire',
CURRENT_TIMESTAMP, 'user1', NULL) RETURNING *;
 id |  car_type   | license | activity  |        event_timestamp        | username | signature
----+-------------+---------+-----------+-------------------------------+----------+-----------
  1 | Mazda Miata | AWR-331 | flat tire | 2017-07-08 10:20:30.842572-04 | user1    | (null)

Now, we sign it using our private rsa key, rather than creating a random key for this:

SELECT ROW(id, car_type, license, activity, event_timestamp, username)
FROM secure_demo2
WHERE id = 1
 
-- set psql variables to match output columns
\gset
 
\set signature `echo :'row' | openssl pkeyutl -sign -inkey ~user1/.pgkey/rsa.key | xxd -p | tr -d '\n'`
 
UPDATE secure_demo2 SET signature = decode(:'signature', 'hex') WHERE id = 1;
 
SELECT * FROM secure_demo2;
 id |  car_type   | license | activity  |        event_timestamp        | username | signature
----+-------------+---------+-----------+-------------------------------+----------+-------------
  1 | Mazda Miata | AWR-331 | flat tire | 2017-07-08 10:20:30.842572-04 | user1    | \x857310...

To later verify that the data row has not been modified, we can do:

SELECT ROW(id, car_type, license, activity, event_timestamp, username), signature
FROM secure_demo2
WHERE id = 1;
                                     row                                     | signature
-----------------------------------------------------------------------------+-------------
 (1,"Mazda Miata",AWR-331,"flat tire","2017-07-08 10:20:30.842572-04",user1) | \x857310...
\gset
 
\echo `echo :'signature' | xxd -p -revert > sig.tmp`
\echo `echo :'row' | openssl pkeyutl -verify -pubin -inkey /u/postgres/keys/user1.pub -sigfile sig.tmp`
Signature Verified Successfully
 
\! rm sig.tmp

Because signature verification is done using the public certificate, anyone can verify that the data has not been modified. It also allows non-authors to verify that the data was created by the owner of the private certificate.

This and the previous two blog entries are related. The first one explained how to create and store a simple message authentication code (mac). The second one explained how to encrypt data on the client side using symmetric and public key cryptography. This blog entry shows how to do message authentication via public key signing, so anyone with access to the public key can verify authorship.

View or Post Comments

Security Client Row Access Control

Wednesday, September 5, 2018

Usually the database administrator controls who can access database data. However, it is possible for clients to completely control who can access data they add to the database, with the help of openssl.

First, let's create rsa keys for three users from the command line. We first create an rsa public/private key pair for each user in their home subdirectory and then make a copy of their rsa public key in the shared directory /u/postgres/keys:

# # must be run as the root user
# cd /u/postgres/keys
# for USER in user1 user2 user3
> do    mkdir ~"$USER"/.pgkey
>       chown -R "$USER" ~"$USER"/.pgkey
>       chmod 0700 ~"$USER"/.pgkey
>       openssl genpkey -algorithm RSA -out ~"$USER"/.pgkey/rsa.key
>       chmod 0600 ~"$USER"/.pgkey/*
>       openssl pkey -in ~"$USER"/.pgkey/rsa.key -pubout -out "$USER".pub
> done

A more sophisticated setup would include creating a certificate authority and signing certificates for each user using the created keys. This allows the certificate authority to prove that the public keys belong to the specified users.

With this in place, it is now possible to encrypt data on the client using a public key that can only be decrypted by someone with access to the matching private key. Here is an example for user user1:

# echo test4 | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user1.pub
> openssl pkeyutl -decrypt -inkey ~"$USER"/.pgkey/rsa.key
test4

This encrypts text with user1's public key, then decrypts it with their private key.

Now, let's create a table to hold the encrypted data and add some encrypted data:

CREATE TABLE survey1 (id SERIAL, username NAME, result BYTEA);
 
\set enc `echo secret_message | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user1.pub | xxd -p | tr -d '\n'`
INSERT INTO survey1 VALUES (DEFAULT, 'user1', decode(:'enc', 'hex'));
 
-- save data for the other two users
\set enc `echo secret_message | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user2.pub | xxd -p | tr -d '\n'`
INSERT INTO survey1 VALUES (lastval(), 'user2', decode(:'enc', 'hex'));
 
\set enc `echo secret_message | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user3.pub | xxd -p | tr -d '\n'`
INSERT INTO survey1 VALUES (lastval(), 'user3', decode(:'enc', 'hex'));

We could have placed all the user-encrypted data in the same row using a bytea array:

CREATE TABLE survey2 (id SERIAL, username NAME[], result BYTEA[]);
 
\set enc1 `echo secret_message | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user1.pub | xxd -p | tr -d '\n'`
\set enc2 `echo secret_message | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user2.pub | xxd -p | tr -d '\n'`
\set enc3 `echo secret_message | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user3.pub | xxd -p | tr -d '\n'`
 
INSERT INTO survey2 VALUES (
        DEFAULT,
        '{user1, user2, user3}',
        ARRAY[decode(:'enc1', 'hex'), decode(:'enc2', 'hex'), decode(:'enc3', 'hex')]::bytea[]);

We could have stored the encrypted value only once using a random password and encrypted the password using each user's public key and stored those:

CREATE TABLE survey3 (id SERIAL, result BYTEA, username NAME[], keys BYTEA[]);
 
\set key `openssl rand -hex 32`
\set enc `echo secret_message | openssl enc -aes-256-cbc -pass pass\::key | xxd -p | tr -d '\n'`
 
\set enc1 `echo :'key' | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user1.pub | xxd -p | tr -d '\n'`
\set enc2 `echo :'key' | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user2.pub | xxd -p | tr -d '\n'`
\set enc3 `echo :'key' | openssl pkeyutl -encrypt -pubin -inkey /u/postgres/keys/user3.pub | xxd -p | tr -d '\n'`
 
INSERT INTO survey3 VALUES (
        DEFAULT,
        :'enc',
        '{user1, user2, user3}',
        ARRAY[decode(:'enc1', 'hex'), decode(:'enc2', 'hex'), decode(:'enc3', 'hex')]::bytea[]);

To decrypt data using the first schema (one user per row, no arrays), user1 would do:

SELECT * FROM survey1 WHERE username = 'user1';
 id | username |    result
----+----------+-------------
  1 | user1    | \x80c9d0...
 
-- set psql variables to match output columns
\gset
 
-- 'cut' removes \x
\set decrypt `echo :'result' | cut -c3- | xxd -p -revert | openssl pkeyutl -decrypt -inkey ~user1/.pgkey/rsa.key`
 
SELECT :'decrypt';
    ?column?
----------------
 secret_message

The process would be similar for survey2 and survey3. Of course, this does not prevent database administrators from removing data from the database, and because public keys are used to encrypt, they could add data too. A message authentication code (mac) would prevent this.

View or Post Comments

Security Cryptographically Authenticated Rows

Friday, August 31, 2018

When storing data in the database, there is an assumption that you have to trust the database administrator to not modify data in the database. While this is generally true, it is possible to detect changes (but not removal) of database rows.

To illustrate this, let's first create a table:

CREATE TABLE secure_demo (
        id SERIAL, car_type TEXT, license TEXT, activity TEXT, 
        event_timestamp TIMESTAMP WITH TIME ZONE, username NAME, hmac BYTEA);

The last column (hmac) is used for change detection. Let's insert a row into the table:

INSERT INTO secure_demo
VALUES (DEFAULT, 'Mazda Miata', 'AWR-331', 'flat tire',
CURRENT_TIMESTAMP, 'user1', NULL) RETURNING *;
 id |  car_type   | license | activity  |       event_timestamp        | username | hmac
----+-------------+---------+-----------+------------------------------+----------+------
  1 | Mazda Miata | AWR-331 | flat tire | 2017-07-06 20:15:59.16807-04 | user1    |

Notice that this query also returns a text representation of the inserted row, including the computed columns id and event_timestamp.

To detect row changes, it is necessary to generate a message authentication code (mac) which is generated with a secret known only to the client. It is necessary to generate the mac on the client so the secret is never transferred to the server. These psql queries update the inserted row to store the mac:

SELECT ROW(id, car_type, license, activity, event_timestamp, username)
FROM secure_demo
WHERE id = 1;
 
-- set psql variables to match output columns
\gset
 
\set hmac `echo :'row' | openssl dgst -sha256 -binary -hmac 'MY-SECRET' | xxd -p | tr -d '\n'`
 
UPDATE secure_demo SET hmac = decode(:'hmac', 'hex') WHERE id = 1;
 
SELECT * FROM secure_demo;
 id |  car_type   | license | activity  |       event_timestamp        | username |   hmac
----+-------------+---------+-----------+------------------------------+----------+-------------
  1 | Mazda Miata | AWR-331 | flat tire | 2017-07-06 20:15:59.16807-04 | user1    | \x9549f1...

To later verify that the data row has not been modified, do:

SELECT ROW(id, car_type, license, activity, event_timestamp, username), hmac
FROM secure_demo
WHERE id = 1;
                                    row                                     |   hmac
----------------------------------------------------------------------------+-------------
 (1,"Mazda Miata",AWR-331,"flat tire","2017-07-06 20:15:59.16807-04",user1) | \x9549f1...
 
\gset
 
\echo  ' E''\\\\x'`echo :'row' | openssl dgst -sha256 -binary -hmac 'MY-SECRET' | xxd -p | tr -d '\n'`''''
 E'\\x9549f10d54c6a368499bf98eaca716128c732132680424f74cb00e1d5a175b63'
 
\echo :'hmac'
 E'\\x9549f10d54c6a368499bf98eaca716128c732132680424f74cb00e1d5a175b63'

The database administrator could replace or remove the hmac value, but this would be detected. This is because computing a proper hmac requires the MY-SECRET key, which is never sent to the server.

The above solution only allows someone with access to the secret key to determine if the row has been modified, meaning only they can check the message's integrity. A more sophisticated solution would be to use a private key to sign a hash of the row value — this would allow anyone with access to the public key to check that the row has not been modified, but still only allow those with access to the private key to generate a new hmac. This would also allow for non-repudiation.

There is the risk that the row values returned by the insert do not match those that were supplied, so some client-side checks would need to be added. There is also no detection for removed rows; this is similar to the problem of trying to detect a blocked tls connection attempt.

Such a setup is clearly overkill for many databases, but there are some use-cases where data integrity guaranteed by the client, independent of the database administrator, is useful. Non-repudiation using public key infrastructure is also sometimes useful.

View or Post Comments

Security Foreign Data Wrappers and Passwords

Wednesday, August 29, 2018

Foreign data wrappers (fdw) allow data to be read and written to foreign data sources, like NoSQL stores or other Postgres servers. Unfortunately the authentication supported by fdws is typically limited to passwords defined using create user mapping. For example, postgres_fdw only supports password-based authentication, e.g., scram. Though only the database administrator can see the password, this can still be a security issue.

Ideally, at least some of the Postgres fdws should support more sophisticated authentication methods, particularly SSL certificates. Another option would be to allow user authentication to be sent through fdws, so the user has the same permissions on the fdw source and target. There is no technical reason fdw authentication is limited to passwords. This problem has been discussed, and it looks like someone has a plan for solving it, so hopefully it will be improved soon.

View or Post Comments

Security Certificate Revocation Lists

Monday, August 27, 2018

If you are setting up Postgres server or client TLS/SSL certificates, be sure to also configure support for a certificate revocation list (crl). This list, distributed by the certificate authority, lists certificates that should no longer be trusted.

While the crl will initially likely be empty, a time will come when a private key used by a certificate or device is exposed in an unauthorized manner, or an employee who had access to private keys leaves your organization. When that happens, you will need the ability to invalidate certificates — having that ability pre-configured will help, especially during a crisis.

View or Post Comments

Performance Oracle Real Application Clusters (RAC)

Monday, July 9, 2018

I get asked about Oracle RAC often. My usual answer is that Oracle RAC gives you 50% of high reliability (storage is shared, mirroring helps) and 50% of scaling (CPU and memory is scaled, storage is not). The requirement to partition applications to specific nodes to avoid cache consistency overhead is another downside. (My scaling presentation shows Oracle RAC.)

I said the community is unlikely to go the Oracle RAC direction because it doesn't fully solve a single problem, and is overly complex. The community prefers to fully-solve problems and simple solutions.

For me, streaming replication fully solves the high availability problem and sharding fully solves the scaling problem. Of course, if you need both, you have to deploy both, which gives you 100% of two solutions, rather than Oracle RAC which gives you 50% of each.

However, I do think database upgrades are easier with Oracle RAC, and I think it is much easier to add/remove nodes than with sharding. For me, this chart summarizes it:

                         HA   Scaling  Upgrade Add/Remove
        Oracle RAC       50%     50%    easy    easy
        Streaming Rep.  100%     25%*   hard    easy
        Sharding          0%    100%    hard    hard
        
        * Allows read scaling

(I posted this to pgsql-general in 2016.)

View or Post Comments

Performance Query Planner Interview

Saturday, June 23, 2018

Software Engineering Radio has just posted a one-hour audio recording of an interview I did about the Postgres query optimizer. It is generic enough to be useful to anyone wanting to understand how relational databases optimize queries.

On an unrelated note, I am leaving soon for a 25-day European Postgres speaking tour. I am speaking at one-day conferences in Zürich, London, and Amsterdam, a user group in Frankfurt, and will be presenting at ten EnterpriseDB events across Europe.

View or Post Comments

Presentation Will Postgres Live Forever?

Thursday, June 7, 2018

I had the opportunity to present an unusual topic at this year's Postgres Vision conference: Will Postgres Live Forever? It is not something I talk about often but it brings out some interesting contrasts in how open source is different from proprietary software, and why innovation is fundamental to software usage longevity. For the answer to the question, you will have to read the slides.

View or Post Comments

News Draft of Postgres 11 Release Notes

Thursday, May 17, 2018

I have completed the draft version of the Postgres 11 release notes. Consisting of 167 items, this release makes big advances in partitioning, parallelism, and server-side transaction control via procedures. One of the more unexpected yet useful features is "Allow 'quit' and 'exit' to exit psql when used in an empty buffer".

The release notes will be continually updated until the final release, which is expected in September or October of this year.

View or Post Comments

Documentation Intermediate Certificates

Monday, January 22, 2018

I previously mentioned the importance of high quality documentation, so we are always looking for improvements. This email thread from 2013 attempted to codify the rules for how to properly use intermediate ssl/tls certificates with Postgres. At this time, our documentation was updated to recommend storing intermediate certificates with root certificates because it was unclear under what circumstances intermediate certificates are transferred to the remote server to be chained to a trusted root certificate.

During research for my four security talks, I studied certificate handling. I found certificate chain resolution rules in the verify manual page. In testing various certificate locations, I also found that Postgres follows the same rules.

Based on this testing, I realized the conclusions reached in 2013 were inaccurate, or at least incomplete. While the documented procedure worked, the more practical and recommended approach is to store intermediate certificates (created with v3_ca extensions) with leaf certificates to be sent to the remote end. (I think the requirement of using the v3_ca extension when creating intermediate certificates is what caused much of the testing confusion in the past.)

This new procedure allows short-lived leaf and intermediate certificates to be replaced at expire time while long-lived root certificate stores remains unchanged. For example, for clients to verify the server's certificate, the server would contain the intermediate and server's leaf certificates, and clients only need root certificates, which rarely change.

The documentation of all supported Postgres versions has been updated to recommend this new procedure. I have also added sample scripts showing how to create root-leaf and root-intermediate-leaf certificate chains.

These changes will be distributed in the next minor Postgres releases, scheduled for next month. Until this new documentation is released, you can read the updates in the Postgres 11 docs in the server and libpq ssl sections. I am hopeful this clarified documentation will encourage people to use ssl and ssl certificate verification.

View or Post Comments

Presentation Four New Security Talks

Monday, January 15, 2018

In the past few months I have completed four new security talks, totaling 294 slides. The first and third talks explain the fundamentals of cryptography and cryptographic hardware, respectively. The second and fourth talks cover application of these fundametals. The second talk covers tls, including the use of ssl certificates by Postgres. The fourth covers the use of cryptographic hardware by applications, including Postgres.

View or Post Comments

Conference Video of Russian Interview

Wednesday, January 10, 2018

I just did a two-hour interview in English with the Russian Postgres user group. A video recording of the interview is online and covers questions asked by the Russian attendees.

View or Post Comments

Community Web Forums?

Friday, January 5, 2018

This email thread explores the idea of the community supporting web forums instead of or in addition to the email lists, where the majority of community development and discussion happen. Reason stated for not pursuing web forums included:

  • It has been tried before and failed
  • Larger web forum communities already exist, e.g., Stack Overflow
  • Established Postgres community members prefer email

Ultimately I think the Postgres community needs to do a better job of publicizing the existence of external communities that help Postgres users, e.g., Slack. As an example, the Postgres irc channel is well publicized and currently has 1,100 connected users. Also, EnterpriseDB created the Postgres Rocks web forum six months ago.

View or Post Comments

Documentation Wal and Xlog

Wednesday, January 3, 2018

Postgres isn't the best at naming things. Of course, there is the old computer saying, "There are only two hard things in Computer Science: cache invalidation and naming things." With Postgres being 31 years old and developed by several different project teams, naming can be even more inconsistent.

One naming inconsistency, which we have lived with for years, is the name of the write-ahead log. Postgres references this using the acronym wal in server variables like wal_level, but the PGDATA directory containing the write-ahead log files was called pg_xlog. In "pg_xlog," the "x" stands for "trans" which is short for "transaction", and of course "log" means "log", so "xlog" was short for "transaction log." This was also confusing because there is a clog directory which records "transaction status" information (commits, aborts). So, "xlog" or "transaction log" was already a bad name, and having it also referenced as wal just made it worse.

Postgres 10 has made the difficult change of removing references to "xlog" and "clog," and instead name them "wal" and "pg_xact" consistently. This email thread covers many of the gory details of what we changed and why. It isn't ideal to be changing the name of internal database objects, and it will cause some pain to those moving to Postgres 10, but future users of Postgres will have a more consistent experience of Postgres and how it works.

View or Post Comments