Q&A about Hot Standby

Updated!: See below.

Here are some questions that came up from trying to use the current PostgreSQL hot standby documentation:

Q: If you set hot_standby = off after having it on, what happens?

A: This change requires a database restart on the hot standby (or replica) server. The database goes into “warm standby” mode, and you can no longer issue queries against it. You can change this right back by setting the parameter to ‘on’ and restarting again.

Q: Can you use hot standby with only a single schema or database?

A: No. Hot Standby is all-or-nothing for a particular PostgreSQL database cluster. A cluster is made up of all the databases that live in a particular $PGDATA instance, and Hot Standby is currently not capable of distinguishing between changes occurring on different particular databases or schemas.

Q: Is the process for setting up hot standby any different for empty databases vs. populated databases?

A: No. The setup process is the same – you must create a base backup.

Q: How do I bring my hot standby out of standby mode?

A: If you’re using something like the following in your recovery.conf file:

restore_command = 'cp xxxx'
standby_mode = 'on'

Change: standby_mode = 'off' and restart your hot standby postgresql instance.

Q: Where did my recovery.conf file go? (after your database came out of warm/hot standby)

A: PostgreSQL automatically changes the name of the file to recovery.done when recovery completes. This helps prevent accidents.

Q: What happens if my archive_timeout = 60 (which creates a 16mb file every minute) and I flood the database with so much activity that my standby falls behind?

A: This is possible, and you may be interested in trying Streaming Replication. However, for the majority of users, a delay in restoring data is acceptable (and possibly desirable). Eventually the standby server will catch up. You can monitor how delayed the server is using functions like txid_current_snapshot().

Q: Are schema changes (like CREATE TABLE or ALTER TABLE) replicated to the standby?

A: Yes! All changes to the database cluster are copied to the standby. This includes any DDL operations, new rows, the effects of autovacuum — any change to the data store on the master is copied to the standby.

Quick start on Hot Standby

Updated.

We could have some better end-user documentation around creating a warm or hot standby system for basic postgresql replication.

To this end, I created a Quick Start doc on the wiki, but it could use more help. Maybe we should create some setup recipes for common situations?

Also – I wrote the following script during a hot standby bugbash PDXPUG had today:


#!/bin/sh

BINPATH=/usr/local/pg90/bin
CP=/bin/cp
PGCTL=${BINPATH}/pg_ctl
PSQL=${BINPATH}/psql
INITDB=${BINPATH}/initdb

sudo mkdir -p /var/tmp/archive
sudo chown ${USER} /var/tmp/archive

${INITDB} hotstandby1

echo 'wal_level = hot_standby' >> hotstandby1/postgresql.conf
echo 'archive_mode = on' >> hotstandby1/postgresql.conf
echo "archive_command = 'cp %p /var/tmp/archive/%f'" >> hotstandby1/postgresql.conf
echo "archive_timeout = 60" >> hotstandby1/postgresql.conf
echo "port = 6543" >> hotstandby1/postgresql.conf

${PGCTL} -D hotstandby1 start -l hotstandby1.log
sleep 5

${PSQL} -p 6543 postgres -c "select pg_start_backup('backup')"
${CP} -pR hotstandby1/ hotstandby2
${PSQL} -p 6543 postgres -c "select pg_stop_backup()"
rm hotstandby2/postmaster.pid
rm hotstandby2/pg_xlog/*

echo 'hot_standby = on' >> hotstandby2/postgresql.conf
echo 'port = 6544' >> hotstandby2/postgresql.conf
echo "standby_mode = 'on'" >> hotstandby2/recovery.conf
echo "restore_command = 'cp -i /var/tmp/archive/%f %p'" >> hotstandby2/recovery.conf

${PGCTL} -D hotstandby2 start -l hotstandby2.log

* Added port specification in case you’ve already got postgres running. Added a BINPATH for custom install directories.

Starting at Emma

Today, I start at Emma.

I have some clues about what I’ll be up to – working on some big PostgreSQL databases, tearing into the infrastructure and discovering what makes their small company tick. Emma’s work with small businesses, and a focus on humane communication and consensus building completely drew me in.

And I’m looking forward to riding my bike everyday over on the office on Burnside!

PgCon 2010 – PL/Parrot, Simulated Annealing, Exclusion Constraints, Postgres-XC

PgCon this year was full of bold ideas, delivered in the quiet manner typical of the Postgres community. Talks by Jonathan Leto, Jan Urbanski and Jeff Davis all presented new features and ideas that show there is so much room yet in Postgres as a project to contribute, and innovate. I was also delighted to see Postgres-XC (touted as a “Postgres RAC”) release code, and give a great presentation on the high-level details.

Jonathan Leto presented work on PL/Parrot, along with David Fetter. Parrot is a dynamic language virtual machine, allowing implementation of multiple dynamic languages which can then share classes (from the docs: “In theory, you will be able to write a class in Perl, subclass it in Python and then instantiate and use that subclass in a Tcl program.”). The project is to embed Parrot in PostgreSQL, and eventually, implement dynamic languages inside the virtual machine. Advantages to doing this are that it will make implementing new dynamic languages in Postgres much easier, because the language implementers won’t have to learn the PL interface. Another useful feature in PL/Parrot is the implementation of a security opcode in Parrot which essentially controls access to open(), a key to implementing a secure procedural language in Postgres. (I’m sure Jonathan will correct me if I didn’t describe this properly :D)

Jan Urbanski gave a talk on join ordering via Simulated Annealing, called Replacing GEQO. The approach was pretty interesting, involved math that required me to scratch my head a bit, and the initial performance improvements for many-join queries made it seem appealing. The original -hackers posting from Jan, includes a few hairy queries from Andres Freund which confound the GEQO referenced later in the thread. Jan’s posted the code, and I’m looking forward to seeing how it develops this year.

Jeff Davis presented exclusion constraints, which are part of 9.0. He is continuing his work on temporal data types with a clever and very useful generalization of UNIQUE. UNIQUE constrains equality, while exclusion constraints allow other operators (in the most cited example, Jeff demonstrates “overlaps” in the PERIOD datatype).

Postgres-XC was officially presented and released. For efforts in Postgres clustering, releasing the code is a huge step forward toward mainstreaming work in the community on clustering. This release solidifies community work that started last year, with NTT and the support of the Japanese PostgreSQL User Group in having a clustering summit back in November 2009.

I was disappointed to miss a few talks (like hypothetical indexes, pg_statsinfo, CB’s pgMQ) but looking forward to hearing the recordings as they are published!

Lightning talks from PgCon 2010

Thanks again to all the folks that volunteered to give five minute talks during the Lightning talks session at PgCon!

Our lightning talks this year were:

  • PostgreSQL Developer Meeting in Five Minutes – Bruce Momjian
  • Slony 1 => 2.0 – Steve Singer
  • PostgreSQL and Twisted – Jan Urbanski
  • The FlyMine Project – Matthew Wakeling
  • Enhanced Tools for PostgreSQL – Tomonari Katsumata
  • Servoy – Christophe Pettus
  • Tail_n_mail – Greg Sabino Mullane
  • GSOC – Robert Treat
  • Pg Staging – Dimitri Fontaine
  • Serializable Transaction Isolation – Kevin Grittner
  • 10 ways to wreck your database – Josh Berkus

All presentations are downloadable from: http://wiki.postgresql.org/wiki/PgCon_2010_Lightning_talks

Image Copyright © 2010 Richard Guy Briggs from here and used by permission.

Need reviewers – preparing for the first commitfest for 9.1

Now is the perfect time to get involved in Postgres development!

Starting June 15, we’re going to have a “reviewfest”, as a prelude to the first commitfest in July. We’ve already got 28 patches in the queue, and all need reviewers.

Think you’re not qualified to review patches? Think again!

From the Reviewing a Patch documentation:

If you can apply a patch and you can use the new feature, you’re already qualified to start reviewing it.

We will formally kick off a reviewfest on June 15.

We will assign reviewers and get all the patches that are queued up for 9.1 reviewed while the final touches are being applied to the 9.0 release. Have a look at Review A Patch guidelines for information about how to conduct your review. We also have a mailing list to track and recruit reviewers – pgsql-rrreviewers. (The extra R’s are for ’round-robin’)

Please subscribe to the list, and post if there is a particular patch you are interested in reviewing!

PgCon Day 1: Developer meeting

Jan, talking about DDL triggers

I’m back from the developer meeting where we blasted through eight hours of presentations and discussions from key members of the PostgreSQL developer community.

One great outcome from the meeting is that WE ARE MOVING TO GIT.

There’s lots of other great stuff on that wiki notes – like a list of possible features for 9.1, plans for DDL triggers, and a continued discussion of snapshot cloning.

There was also much approval voiced for the alpha process that Peter Eisentraut championed last year.

I volunteered to steer our next “reviewfest” – kicking off June 15! And I’m trying to rope another developer into the next commitfest, which we anticipate starting on July 15. The plan is to stick with our development schedule from last year.

And I’m working with Josh Berkus to put together an open issues list for 9.0. I think I went a bit overboard, and many of the things I listed are probably resolved – hopefully committers will make quick work of the list, and we’ll be speeding along toward releasing 9.0 in June!

I missed out on the group picture, so I look forward to someone’s photo editing efforts in the near future. 🙂

Change in the air

On Tuesday, I’m headed off to PgCon, for three days of intense PostgreSQL conferencing. This conference is PostgreSQL’s major developer (and end user!) conference of the year, held in Ottawa. I’m looking forward to seeing old and new colleagues, spending a day in the developer’s meeting, and hopefully sharing some new projects.

I’m also leading the charge on the lightning talks! If you’d like to give one, find me at the conference, and we’ll see what we can do.

Finally, if you want up-to-date info about what’s happening at the conference, have a look at my twitter list for Postgres. (Right now, looks like we’re trying to track down Oleg and Teodor in the #ashcloud!)

Also, today is my last day at End Point. I’ve really enjoyed my time working for a fantastic PostgreSQL support company. I highly recommend their services – and two members of the team will be presenting at PgCon – Greg Sabino Mullane and Josh Tolley.

I’ll be taking a new position next month, and really looking forward to a few new challenges. I’m definitely staying in Portland, and continuing to work with Postgres.

If you’re headed to PgCon, find me in Ottawa and I’ll tell you all about it!

Cluster Developer Meeting recap

Cluster Developer Meeting

UPDATE: See bottom of post.

We held a PostgreSQL cluster developer meeting on Thursday, November 19, 2009 in Tokyo. About 25 people were in attendance, and seven projects presented status updates. Projects represented included pgCluster, PostgresForest, Postgres-R, Streaming Replication (slated to be included in core for version 8.5), Postgres-2 (not quite available), GridSQL, the Skype Skytools team (Londiste), Bucardo and Slony. Details of our discussions are being documented on the PostgreSQL wiki, and we’ve started a new mailing list.

The group of developers came up with a list of features that they would appreciate being implemented in Postgres soon, and we will be filling out the details in the coming weeks.

Our first milestone as a group is to create a detailed matrix of features to help users more easily navigate between the different solutions. I’m also going to arrange for a documentation sprint, dedicated to creating introductory documentation for new database administrators interested in clustering technology for Postgres. I’ll report out in December about how this work is going!

Josh has also posted a summary of the cluster meeting, and our next actions.

OpenSQLCamp was awesome!

Saturday schedule 11/14/09

Thanks to everyone who attended OpenSQLCamp this past weekend in Portland, OR! More than 100 people participated – developers, DBAs and hobbyists from all over the world. Database developers participated from PostgreSQL, MySQL, MariaDB, Drizzle, TokuDB, LucidDB, MongoDB, Cassandra, CouchDB and many more.

The great thing about these events is the opportunity to trade ideas, code and stories. One project I’m very excited about is coming from some Portland State University students and a capstone project to create a new, interactive database client that works with more than just one DBMS. Igal gave a review of non-relational datastores. We had lightning talks about: open source column store databases, a many-master replication system called Trainwreck, open source at Microsoft, how to translate between NoSQL and SQL and many more.

You can see the full list of talks and notes from sessions as people update the wiki.

Joking about NoSQL aside, I was very happy to see many non-relational database developers in attendance, sharing information and participating in interesting discussions about the data management ecosystem. One meme we were happy to spread is that every tool has a purpose and I was happy to see this tweet:

Best thing I learned at #opensqlcamp today: #nosql vs. #sql is a false duality. Different features for different problem domains.

I hope next time we can get a few more core Postgres developers to a Camp. Mark Callaghan expressed interest in a comparison of backend storage mechanisms, and several people were interested in detailed comparisons of replication strategies across many DBMSes.

Thank you to everyone who participated! (sorry I spelled your name wrong in the email, Mark. And left off your name in the list of GoDaddy road-trippers, Dan.) If you were there, please give us feedback!

We’re already looking forward to the next OpenSQL Camp. Some people thought we should do it again in Portland – and we’d be happy to host again next year! Baron also mentioned running an event in Washington, D.C.