postgres | , Page 7

August 29, 2010 postgres, postgresql hotstandby, logger, pgstandby, pg_standby, postgres, postgresql, syslog 3 Comments

Using logger with pg_standby

Piping logs to syslog is pretty useful for automating log rotation and forwarding lots of different logs to a central log server.

To that end, the command-line utility ‘logger’ is nice for piping output from utilities like pg_standby without having to add syslogging code to the utility itself. Another thing is that logger comes by default with modern packages of syslog.

Here’s an easy way to implement this:

restore_command = 'pg_standby -d -s 2 -t /pgdata/trigger /shared/wal_archive/ %f %p %r 2>&1 | logger -p local3.info -t pgstandby'

August 28, 2010 portland, postgres, postgresql aggregates, online, parallelization, postgres, research 1 Comment

Online aggregation paper from 1997 and PSU’s database reading group

A couple weeks ago, Mark Wong and I took a field trip over to the Database Reading Group at Portland State University. It’s a group of students and professors that meet weekly throughout the school year to go over research papers. The papers are picked by the participants, and vary in topic from obscure to very practical.

This week’s paper reading was led by Professor Len Shapiro, and titled “Online Aggregation“. The paper is considered a foundational paper about SQL aggregates (like COUNT() or AVERAGE), and was published in 1997 by researchers from UC Berkeley and IBM. It’s also precursor to research into query parallelization and streaming databases. It was also awarded the SIGMOD “Test of Time” award in 2007, and is cited by over 170 other papers in the ACM archive.

The basic idea behind the paper centered around how to improve user experience in reporting results of aggregate queries – asking questions about how to solve three key problems when solving aggregates: blocking, fairness and control (from a user’s perspective). Roughly: Blocking is what happens when some part of the system waits and doesn’t return results to the user as a result of the waiting. Fairness concerns whether certain types of operations prevent certain groups of data from being processed (the example given had to do with GROUP BY and groups being processed one at a time). Control concerns whether or not a user can exert control over the speed of computation applied to a group (example given being a lever that “speeds up” processing of a set).

One insight from the paper is how online aggregates should be treated differently than traditional query processing – which might favor expensive plans involving sorts so that the output is ordered. When you’re dealing with online aggregates, you prefer unordered, or ideally random order, because your intermediate results will be more representative of the ultimate result. I guess that’s probably obvious once you think about it, but the paper provided some concrete examples.

Another interesting thought experiment involving the planner is how you pick plans that favor non-blocking, fairness and user control. Each of those properties is not narrowly defined, and changes based on individual user expectation. Professor Kristen Tufte mentioned that she’d be interested in how the ideas presented in this paper would be applied today, and Professor David Meier brought up that we might most be interested in applications involving managing Hadoop.

Prof Meier also brought up an interesting paper involving alternating nested loop joins during a discussion about optimizing JOIN algorithms for online aggregates. Another cool thing about the paper is that it involved modifications to Postgres! Granted, it was Postgres95, which doesn’t resemble the modern PostgreSQL 9.0 very much. But it was nice to revisit research that used Postgres that’s still relevant today.

August 3, 2010 postgres, postgresql, sysadmin pgrpms, postgres, rpm, sysadmin

Customizing the RPMs from pgrpms.org

To pick up where Devrim left off in customizing RPMs, here are some more tips for getting your very own RPMs built:

Create a VM with your favorite operating system (I’m using versions of CentOS). I need both 32-bit OS and 64-bit OS. This is much easier to manage with separate, local VMs.
Install spectool (available here), and SVN
The other dependancies were: gcc glibc-devel bison flex python-devel tcl-devel readline-devel zlib-devel openssl-devel krb5-devel e2fsprocs-devel libxml2-devel libxslt-devel pam-devel
Edit the postgresql-$VERSION.spec file to your liking: If you’re adding patches, you need to add them in TWO places – first in the Patch#: group, and then again below where the %patch# series starts. Finally, if you’re adding an entirely new package (say in 8.2, pg_standby in contrib), you’ll need to also add the binary (or library, or whatever) to the appropriate %files clause later in the spec file. It’s also a good idea to modify ‘Release’. Here’s a sample diff of my spec file:

--- postgresql-8.2.spec (revision 188) +++ postgresql-8.2.spec (working copy) @@ -74,7 +74,7 @@ Summary: PostgreSQL client programs and libraries Name: postgresql Version: 8.2.17 -Release: 1PGDG%{?dist} +Release: 1test%{?dist} License: BSD Group: Applications/Databases Url: http://www.postgresql.org/ @@ -95,7 +95,9 @@ Patch4: postgresql-test.patch Patch6: postgresql-perl-rpath.patch Patch8: postgresql-prefer-ncurses.patch +Patch7: postgresql-pgstat-dir.patch Patch9: postgresql-use-zoneinfo.patch +Patch10: pg_standby.patch


 Buildrequires: perl glibc-devel bison flex

 Requires:      /sbin/ldconfig initscripts

@@ -282,7 +284,9 @@

 %patch4 -p1

 %patch6 -p1

 %patch8 -p1

+%patch7 -p1

 %patch9 -p1

+%patch10 -p1

pushd doc tar -zcf postgres.tar.gz *.html stylesheet.css @@ -604,6 +608,7 @@ %{_bindir}/pg_controldata %{_bindir}/pg_ctl %{_bindir}/pg_resetxlog +%{_bindir}/pg_standby %{_bindir}/postgres %{_bindir}/postmaster %{_mandir}/man1/initdb.*

How have you customized RPMs using this repo? Share your .spec files!

July 17, 2010 open source, oscon, portland, postgres, postgresql postgres

PDXPUG Day 2010: 9.0 and all sorts of good stuff

I’m nearly ready for PDXPUG Day 2010.

We’ve got some fun talks lined up from leaders in the Postgres community in town for OSCON. It’s free to attend. We’ll at the Oregon Convention Center, room D131.

Stop by sometime during the day if you’re in town!

Here’s our schedule:

10am: What’s new in 9.0 – Selena Deckelmann
11am: ORMs and Their Discontents – Christophe Pettus
1:30pm: Mining Your Logs For Fun and Profit – Josh Berkus
2:30pm: PORTAL – Dan Colish
3:30pm: An Introduction to Managing and Troubleshooting PostgreSQL on Windows – Tim Bruce
4:30pm: PostgreSQL in Brazilian Army and Air Force – Luis Dosso

6pm-10pm: Party at the Gotham Tavern!

July 12, 2010 postgres, postgresql 9.0, hot standby, postgres, postgresql, q&a, quickstart

Q&A about Hot Standby

Updated!: See below.

Here are some questions that came up from trying to use the current PostgreSQL hot standby documentation:

Q: If you set hot_standby = off after having it on, what happens?

A: This change requires a database restart on the hot standby (or replica) server. The database goes into “warm standby” mode, and you can no longer issue queries against it. You can change this right back by setting the parameter to ‘on’ and restarting again.

Q: Can you use hot standby with only a single schema or database?

A: No. Hot Standby is all-or-nothing for a particular PostgreSQL database cluster. A cluster is made up of all the databases that live in a particular $PGDATA instance, and Hot Standby is currently not capable of distinguishing between changes occurring on different particular databases or schemas.

Q: Is the process for setting up hot standby any different for empty databases vs. populated databases?

A: No. The setup process is the same – you must create a base backup.

Q: How do I bring my hot standby out of standby mode?

A: If you’re using something like the following in your recovery.conf file:
restore_command = 'cp xxxx' standby_mode = 'on'

Change: standby_mode = 'off' and restart your hot standby postgresql instance.

Q: Where did my recovery.conf file go? (after your database came out of warm/hot standby)

A: PostgreSQL automatically changes the name of the file to recovery.done when recovery completes. This helps prevent accidents.

Q: What happens if my archive_timeout = 60 (which creates a 16mb file every minute) and I flood the database with so much activity that my standby falls behind?

A: This is possible, and you may be interested in trying Streaming Replication. However, for the majority of users, a delay in restoring data is acceptable (and possibly desirable). Eventually the standby server will catch up. You can monitor how delayed the server is using functions like txid_current_snapshot().

Q: Are schema changes (like CREATE TABLE or ALTER TABLE) replicated to the standby?

A: Yes! All changes to the database cluster are copied to the standby. This includes any DDL operations, new rows, the effects of autovacuum — any change to the data store on the master is copied to the standby.

July 12, 2010 postgres, postgresql hot standby, pdxpug, postgres, postgresql, quickstart, recipe, replication 2 Comments

Quick start on Hot Standby

Updated.

We could have some better end-user documentation around creating a warm or hot standby system for basic postgresql replication.

To this end, I created a Quick Start doc on the wiki, but it could use more help. Maybe we should create some setup recipes for common situations?

Also – I wrote the following script during a hot standby bugbash PDXPUG had today:

#!/bin/sh


BINPATH=/usr/local/pg90/bin

CP=/bin/cp

PGCTL=${BINPATH}/pg_ctl

PSQL=${BINPATH}/psql

INITDB=${BINPATH}/initdb
sudo mkdir -p /var/tmp/archive

sudo chown ${USER} /var/tmp/archive
${INITDB} hotstandby1
echo 'wal_level = hot_standby' >> hotstandby1/postgresql.conf

echo 'archive_mode = on' >> hotstandby1/postgresql.conf

echo "archive_command = 'cp %p /var/tmp/archive/%f'" >> hotstandby1/postgresql.conf

echo "archive_timeout = 60" >> hotstandby1/postgresql.conf

echo "port = 6543" >> hotstandby1/postgresql.conf
${PGCTL} -D hotstandby1 start -l hotstandby1.log

sleep 5
${PSQL} -p 6543 postgres -c "select pg_start_backup('backup')"

${CP} -pR hotstandby1/ hotstandby2

${PSQL} -p 6543 postgres -c "select pg_stop_backup()"

rm hotstandby2/postmaster.pid

rm hotstandby2/pg_xlog/*
echo 'hot_standby = on' >> hotstandby2/postgresql.conf

echo 'port = 6544' >> hotstandby2/postgresql.conf

echo "standby_mode = 'on'" >> hotstandby2/recovery.conf

echo "restore_command = 'cp -i /var/tmp/archive/%f %p'" >> hotstandby2/recovery.conf

${PGCTL} -D hotstandby2 start -l hotstandby2.log

* Added port specification in case you’ve already got postgres running. Added a BINPATH for custom install directories.

June 9, 2010 personal, postgres, postgresql emma, new job, postgres, starting 8 Comments

Starting at Emma

Today, I start at Emma.

I have some clues about what I’ll be up to – working on some big PostgreSQL databases, tearing into the infrastructure and discovering what makes their small company tick. Emma’s work with small businesses, and a focus on humane communication and consensus building completely drew me in.

And I’m looking forward to riding my bike everyday over on the office on Burnside!

May 26, 2010 postgres, postgresql highlights, pgcon, postgres, postgresql

PgCon 2010 – PL/Parrot, Simulated Annealing, Exclusion Constraints, Postgres-XC

PgCon this year was full of bold ideas, delivered in the quiet manner typical of the Postgres community. Talks by Jonathan Leto, Jan Urbanski and Jeff Davis all presented new features and ideas that show there is so much room yet in Postgres as a project to contribute, and innovate. I was also delighted to see Postgres-XC (touted as a “Postgres RAC”) release code, and give a great presentation on the high-level details.

Jonathan Leto presented work on PL/Parrot, along with David Fetter. Parrot is a dynamic language virtual machine, allowing implementation of multiple dynamic languages which can then share classes (from the docs: “In theory, you will be able to write a class in Perl, subclass it in Python and then instantiate and use that subclass in a Tcl program.”). The project is to embed Parrot in PostgreSQL, and eventually, implement dynamic languages inside the virtual machine. Advantages to doing this are that it will make implementing new dynamic languages in Postgres much easier, because the language implementers won’t have to learn the PL interface. Another useful feature in PL/Parrot is the implementation of a security opcode in Parrot which essentially controls access to open(), a key to implementing a secure procedural language in Postgres. (I’m sure Jonathan will correct me if I didn’t describe this properly :D)

Jan Urbanski gave a talk on join ordering via Simulated Annealing, called Replacing GEQO. The approach was pretty interesting, involved math that required me to scratch my head a bit, and the initial performance improvements for many-join queries made it seem appealing. The original -hackers posting from Jan, includes a few hairy queries from Andres Freund which confound the GEQO referenced later in the thread. Jan’s posted the code, and I’m looking forward to seeing how it develops this year.

Jeff Davis presented exclusion constraints, which are part of 9.0. He is continuing his work on temporal data types with a clever and very useful generalization of UNIQUE. UNIQUE constrains equality, while exclusion constraints allow other operators (in the most cited example, Jeff demonstrates “overlaps” in the PERIOD datatype).

Postgres-XC was officially presented and released. For efforts in Postgres clustering, releasing the code is a huge step forward toward mainstreaming work in the community on clustering. This release solidifies community work that started last year, with NTT and the support of the Japanese PostgreSQL User Group in having a clustering summit back in November 2009.

I was disappointed to miss a few talks (like hypothetical indexes, pg_statsinfo, CB’s pgMQ) but looking forward to hearing the recordings as they are published!

May 21, 2010 postgres, postgresql lightning talks, postgres, postgresql

Lightning talks from PgCon 2010

Thanks again to all the folks that volunteered to give five minute talks during the Lightning talks session at PgCon!

Our lightning talks this year were:

PostgreSQL Developer Meeting in Five Minutes – Bruce Momjian
Slony 1 => 2.0 – Steve Singer
PostgreSQL and Twisted – Jan Urbanski
The FlyMine Project – Matthew Wakeling
Enhanced Tools for PostgreSQL – Tomonari Katsumata
Servoy – Christophe Pettus
Tail_n_mail – Greg Sabino Mullane
GSOC – Robert Treat
Pg Staging – Dimitri Fontaine
Serializable Transaction Isolation – Kevin Grittner
10 ways to wreck your database – Josh Berkus

All presentations are downloadable from: http://wiki.postgresql.org/wiki/PgCon_2010_Lightning_talks

May 20, 2010 community, postgres, postgresql 9.1, commitfest, postgres, postgresql, reviewfest

Need reviewers – preparing for the first commitfest for 9.1

Now is the perfect time to get involved in Postgres development!

Starting June 15, we’re going to have a “reviewfest”, as a prelude to the first commitfest in July. We’ve already got 28 patches in the queue, and all need reviewers.

Think you’re not qualified to review patches? Think again!

From the Reviewing a Patch documentation:

If you can apply a patch and you can use the new feature, you’re already qualified to start reviewing it.

We will formally kick off a reviewfest on June 15.

We will assign reviewers and get all the patches that are queued up for 9.1 reviewed while the final touches are being applied to the 9.0 release. Have a look at Review A Patch guidelines for information about how to conduct your review. We also have a mailing list to track and recruit reviewers – pgsql-rrreviewers. (The extra R’s are for ’round-robin’)

Please subscribe to the list, and post if there is a particular patch you are interested in reviewing!

« Previous
1
…
5
6
7
8
9
…
15
Next »

Selena Deckelmann's blog about open source and working at Mozilla.

Tag Archives: postgres