PostgreSQL at MySQL Users Conference: the sessions!

You’ve probably seen a few posts about this – from the CFP, to Baron’s recent pointer to the release of the schedule. And now Josh Berkus just posted a Meetup for the event, so that spurred me on for this post…

So, just to make things even easier for you, I thought I’d summarize the awesome talks we’re having at the O’Reilly MySQL Users Conference this year related to PostgreSQL.

We’re also having a Birds of a Feather session, and staffing a booth on the exhibit floor!

If you’re planning to attend, you can use my code & save 25% in addition to early registration savings: mys11fsd: http://oreil.ly/goaqst

Hope to see you there!

Hot Standby features for 9.1, just committed: Pause and Resume

On February 8th, Simon Riggs committed a couple new functions that will allow Hot Standby to be paused and resumed. You can already *read* from the Hot Standby without pausing, but you could never pause the application of changes in the past. You might want to do this if you have a very high-write-volume server, and some very expensive queries that you want to run on a slave.

Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
Status check functions only. Also, new recovery.conf parameter to
pause_at_recovery_target, default on.

The basic idea is that if you have a read-only standby system, you can give it the command: pg_xlog_replay_pause() and the standby will stop applying changes. Then you can use the database in read-only mode without new changes being applied. When you’re done you can issue the command: pg_xlog_replay_resume() and proceed with applying logs.

There are some related features that I can’t wait to test out around named restore points for replay. But the ability to pause replay and run queries is just awesome.

This is a feature that Simon talked about back in 2009 at FOSDEM, and I am very excited to see it implemented.

Offering an Intro to PostgreSQL class

UPDATE: See below for pricing.

I’m working with Code Lesson to offer an Introduction to PostgreSQL class.

Code Lesson is pretty cool – it’s an online course system, and the idea is you get a couple assignments and lessons taught by me each week, and there’s a midterm and final evaluation. I love conferences, but the nice thing about an online course is you don’t have to spend an entire workday taking a tutorial at a conference, or travelling to a particular location, and you can finish assignments when it’s convenient for you.

My current working outline is:

Intro to Postgres

Hello, world!
* History of PostgreSQL project
* Features
* Basic SQL

Usage
* psql
* Drivers: Perl and Python examples
* GUIs
* Documentation

Survey of features
* Full text search
* Built-in functions
* Datatypes
* Indexes
* Transactional DDL

Community
* Mailing lists & IRC
* Asking questions
* Modules, add-ons, tools

Operations
* System and hardware
* Installation and configuration
* Maintenance and operation
* Replication

Our plan is to provide students with login access to a shared database. During the course, I’ll be available to answer questions and I’m considering making short videos to go along with the course material.

We haven’t set the price for it just yet, but should be figuring that out in the next week or so.

Anyway, if you’re interested, sign up and you’ll get an email when we set the price. I’m happy to answer any questions you have about content.

Another thing that was requested in the Hacker News thread was more advanced material. I think the advanced material falls into two categories – PostgreSQL core functionality, and administration/tuning.

Update! Pricing is set at $325/student, with a 10% discount if you register 2 or more students at the same time.

Invited lecture at Oregon State University: PostgreSQL

Oregon State University invited me to give a technical talk yesterday about PostgreSQL and open source community. It was aimed at graduate level students, and an attempt to recruit them.

It’s not in the slides, but I told a short story in the ‘but there’s a disconnect’ slide. I wanted to convey to them that both the free and open source community and the educational system are failing them when it comes to open source. I attended a conference last fall that had nearly 1000 undergraduate and graduate students. I staffed a booth about open source software, and nearly every student that dropped by couldn’t think of a single open source project they’d ever heard of. With prompting, they’d agree that Linux was probably open source.

We had a discussion about how our community manages conflict, who the big production users of PostgreSQL are, and how folks can get involved. I gave out a few tshirts and buttons, and ran into some old friends from Open Source Bridge.

GSoC Mentor Summit: Day 1

Today was the first day of the GSoC Mentor summit. I attended a few sessions and had several interesting hallway conversations with developers and leaders of projects from all over the world.

First, I attended a discussion about book sprinting, and did a recap of how our latest book sprint went (blog post to come!). We discussed the advantages of having the same group of people to two book sprints as a group, and how things seemed so much easier the second time around. We also had copies of the book that we’d hand-bound there to share. Lunch was spent chatting with Noirin and others about food, culture, travelling lots, and the hilariousness of having “Sotomayor” as a surname in Washington DC these days. Happy to have met a couple more Apache Foundation folks, and lovely to talk about names with an OpenNebula contributor. I also spent some time chatting with Greg Stark about the session on retaining students, and go over a few bits of inspiration he had for encouraging students to work on the more mundane aspects of PostgreSQL development.

Next I stopped upstairs to have a chat with Asheesh Laroia about new things he’s been up to around promoting free and open source community. He’d run a class recently to introduce new students to open source (at Penn State), and had some thoughts on what we should do next to make open source communities more welcoming. He also talked to me about Fedora Design Bounty, and how that model might be applied to other projects. Genius idea, and after reflecting on the idea a bit, maybe we could try it in pgsql-advocacy. Maybe. 🙂

I then breezed through the Chocolate session. Yum!

And went off to see about Bradley Kuhn‘s session on options for joining or starting non-profits around free software. He’s now executive director of the Software Freedom Conservancy, and was giving out great advice around picking an umbrella organization, making the right choices early about where to put money (don’t use your personal paypal account!), and notes on where to go for help if you’ve got questions about what to do next. Not sure where the notes are for that session, but I’m sure contacting him for more information about Software Freedom Conservancy if you’re interested is an option.

Then we had the great Git Migration discussion. The notes were wonderful, and it seems like many people were either considering or were in the middle of a git conversion process. Two PostgreSQL developers were there, including Magnus Hagander, whose voice wasn’t working so well. I helped out a bit by giving a rough overview of how our process had worked, and pointing people at the many resources and tools Magnus and others who worked on the conversion made available.

Afterward, I sat down for a bit with Zooko to talk about Tahoe-LAFS, which appears to be an encrypted, distributed document store database with a http interface. Sounds really cool, and I’m interested in trying it out.

Now, I’m getting ready to head off to the party for the evening. Great day!

Thoughts on PostgreSQL 9.0 release

Something I wrote for a press contact last month that I wanted to share:

We started the process toward 9.0 last year when we added new committers and invited many new people into the commitfest process (our way of getting lots of patches reviewed, approved and committed every two months). What we’ve found is that we can engage new developers by providing a clear way for them to help in small, well-defined ways.

As a group, we work really hard to recruit and maintain long-term relationships with developers. And that investment in people has paid off really well in 9.0. We have long term commitments from volunteers and independent businesses to implement features that take multiple years to see through to completion. The binary replication is a clear example of that, and we have many other projects underway that are only possible because developers trust our core development team to see them through.

It’s not the most headline-grabbing thing that we do. But it is pretty amazing that a group of people, with no central authority, “benevolent dictator” or business driving it, continue every year to produce a trustworthy, stable and feature-rich database that rivals what’s produced by the best-funded enterprises in the world.

What do you think the best part of the 9.0 release was?

The problems with copyright re-assignment

While I was in NYC (eating awesome food, riding my bike across the brooklyn bridge in the rain!), I spent time catching up with free software advocates. One issue that we talked about was copyright assignment. H Online recently published an article about this. Their description of the Linux kernel’s policy pretty much matches PostgreSQL’s policy:

Ownership of free software is a difficult area, and one that is resolved simply by the Linux kernel project. The code belongs to everyone and no-one, and the copyright for each individual piece of code belongs to the original coder, so that any future reassignment of the licence or the code for the Linux kernel requires the agreement of every other contributor.

I haven’t contributed code to projects other than PostgreSQL in a long time, but an important aspect of contribution that I used to not think very much about is copyright assignment. Now that I have spent a little time thinking about it, my preference is to contribute to projects which do not require copyright re-assignment.

Copyright came up in a conversation about dual-licensing, because it is the copyright assignment which provides the opportunity for a codebase to be re-licensed. But more important to me than the possibility of re-licensing, is the chilling effect copyright re-assignment agreements have on communities. The intent can be to be to hedge a company’s bets against contributor interference, and ultimately be able to assert complete control over a codebase. If we agree that the collaborative production of software is a social good, this type of hedging can only be seen as anti-social, and ultimately, destructive to a software community. In practice, I’ve seen projects which require contributor agreements effectively shun all non-corporate contributions, or actively engage in “ornamental sourcing“.

For a business owner who invests in free and open source software, this is an unsustainable position. The advantage of accessing source code is not just the code, but the people who know the code. And while I’m sure there are some exceptions, I doubt most people consider themselves experts in a codebase without having contributed significant patches to it.

Given all that, copyright assignment to the Free Software Foundation or to Canonical has been a contentious issue. But maybe if you have an organization which is committed in its charter to maintaining software freedom, then the copyright assignment serves a social good and gives an organization like the FSF the legal authority to pursue legal action if the terms of a license are violated.

Training lessons learned: Code dojo, whiteboards, interactivity

Training can be an incredibly boring, frustrating exercise. Often, I have friends who don’t bother to attend sessions or tutorials during conferences. Instead, they cherry-pick friends and colleagues that they can work on code, gossip or brainstorm with in the hall while others sit passively in lectures. When I think about it now, knowing this about my friends is what motivated me to start Open Source Bridge.

The PostgreSQL training I gave to Ondo State was specifically targeted at developers. I used material End Point had from previous trainings, and added few new things designed to meet the needs of fledgling database developers. The high points I wanted to hit were: schema design basics, user defined function development and highlight developer-friendly features of Postgres that they should be aware of.

One big obstacle for me was that they would all be using Windows as their primary operating system. I develop exclusively on UNIX-based platforms, and so I had to spend a little time getting re-acquainted with Windows tools. pgAdmin III was essential, and I was happy that a new version was released along with version 8.4 of Postgres.

Also, while the concepts are the same, the built-in monitoring tools for Windows are quite a bit different, and I used freely available material from my Postgres colleagues who support Windows for a couple hour tutorial on interactive troubleshooting.

When trying to explain concepts – like replication, or basic database terms – it really helps to have a whiteboard. I was working with a group of people with diverse IT backgrounds, and often, I asked individuals to try to explain in their own words various terms (like “transaction”). This helped engage the students in a way that simply stating definitions can’t. Observing their fellow students struggling with terminology helped them generate their own questions, and I saw the great results the next day – when students were able to define terms immediately, that took five minutes the day before to work through.

Finally, one important request from the client was that some time be spent mentoring developers on standards, best practices for development and coding style. To accomplish this task with fourteen students in such a limited period of time, I decided to conduct a series of coding sessions where students and I took turns at the keyboard programming as a group. We call this coding dojo, a concept built on the Coding Katas from Dave Thomas.

Overall, I prefer interactive training, where students are not only encouraged, but forced to interact with each other and the instructor.

When I sent out the CFP for MySQL Conf yesterday, lots of people asked me for suggestions on talk topics. In general, I recommend that speakers focus on a particular take-away for the audience, and mention specifically what a person sitting in is going to learn *and* apply immediately. Not every talk can be interactive, or give people chunks of code. But *every* talk should have a clear goal, and leave the audience educated. The best leave them inspired!

PostgreSQL at MySQL Users’ Conference 2011

So, I’m on the committee for MySQL Conf this year, and the committee is specifically seeking talks about PostgreSQL. The idea is to broaden the scope of the conference to include a lot of different open source database technology, including a bunch more about Postgres.

The theme of the conference is “the ecosystem and beyond”, which was chosen specifically because the open source database world has exploded and grown so much in the last three years. Below is a slide from a presentation I made last year at LinuxConf AU about the growth in free and open source (FOSS) databases:

We’ve seen a half-dozen forks of MySQL appear, exponential growth among “NoSQL” databases, and now, a powerful release from PostgreSQL. It may seem odd that the name of the conference didn’t change to reflect the change in focus – but this is the largest FOSS database conference I know off – weighing in at over 2000 people last year. Given the community that’s grown around it, I understand why they are keeping the name.

The content will still largely focus on MySQL — the core, the many forks, and the community around it. But we’ll also hear from many new, successful database projects, and definitely hear from PostgreSQL. To do that, though, I need you to submit talks!

The submission deadline for all proposals is October 25, 2010!

Topics for consideration include:

  • Innovative uses of Postgres
  • Data warehousing and BI
  • Architectures based on Postgres
  • PostGIS
  • Government + Postgres
  • [your favorite web framework] + Postgres
  • Performance and optimization
  • Security and database administration
  • “In the cloud”
  • Business and case studies

If you’ve got an idea, submit a proposal today!

Please contact me directly for feedback, help with submissions or help generating ideas. And if you’re submitting, please just drop me a line to let me know! I’d love to hear from all that are interested.

PostgreSQL 9.0.1 released, includes security fix & maintenance releases for 6 other versions

The PostgreSQL Global Development group released new maintenance versions today: 9.0.1, 8.4.5, 8.3.12, 8.2.18, 8.1.22, 8.0.26 and 7.4.30. This is the final update for PostgreSQL versions 7.4 and 8.0. There’s a security issue in there involving procedural languages, and a detailed description of the vulnerability is on our wiki. A key thing to remember is that the issue primarily affects people who use SECURITY DEFINER along with a procedural language function. PL/PgSQL is not affected, but any other procedural language with a “trusted” mode is. This includes PL/Perl, PL/tcl, PL/Python (7.4 or earlier) and others. The new versions fix issues in PL/Perl and PL/tcl. A patch for PL/PHP is currently in the works.

Most developers feel that the security issue is relatively obscure. If you aren’t using a procedural language with some mechanism for altering privileges (SET ROLE or SECURITY DEFINER, for example), you aren’t vulnerable to the security issue and can upgrade Postgres during your next regularly scheduled downtime. If you *are* vulnerable, we recommend investigating the use of the functions that may be vulnerable, and taking steps to prevent their exploitation by upgrading as soon as you can.

From the FAQ:

What is the level of risk associated with this exploit?

Low. It requires all of the following:

  • An attacker must have an authenticated connection to the database server.
  • The attacker must be able to execute arbitrary statements over that connection.
  • The attacker must have an strong knowledge of PostgreSQL.
  • Your application must include procedures or functions in an external procedural language.
  • These functions and procedures must be executed by users with greater privileges than the attacker, using SECURITY DEFINER or SET ROLE, and using the same connection as the attacker.

This was also the first release for which I generated release notes! 😀

Here was my list of interesting changes for the announcement:

  • Prevent show_session_authorization() from crashing within autovacuum processes, backpatched to all supported versions;
  • Fix connection leak after duplicate connection name errors, fix handling of connection names longer than 62 bytes and improve contrib/dblink’s handling of tables containing dropped columns, backpatched to all supported versions;
  • Defend against functions returning setof record where not all the returned rows are actually of the same rowtype, backpatched to 8.0;
  • Fix possible duplicate scans of UNION ALL member relations, backpatched to 8.2;
  • Reduce PANIC to ERROR on infrequent btree failure cases, backpatched to 8.2;
  • Add hstore(text, text) function to contrib/hstore, to support migration away from the => operator, which was deprecated in 9.0. Function support backpatched to 8.2;
  • Treat exit code 128 as non-fatal on Win32, backpatched to 8.2;
  • Fix failure to mark cached plans as transient, causing CREATE INDEX CONCURRENTLY to not be used right away, backpatched to 8.3;
  • Fix evaluation of inner side of an outer join is a sub-select with non-strict expressions in its output list, backpatched to 8.4;
  • Allow full SSL certificate verification to succeed in the case where both host and hostaddr are specified, backpatched to 8.4;
  • Improve parallel restore’s ability to cope with selective restore (-L option), backpatched to 8.4 with caveats;
  • Fix failure of “ALTER TABLE t ADD COLUMN c serial” when done by non-owner, 9.0 only.
  • Several bugfixes for join removal, 9.0 only.

If you have a look at a new tool that Robert Haas and Tom Lane commited to the repo called git_changelog, you can use it to find the commit IDs for the various features (you need the whole source tree to do it :)).

You’ll find that there are a lot of commits in these sets. We haven’t had a minor release since May 2010, so they kind of added up.

Any other changes in there you think we should have mentioned in the announcement? Let me know in the comments.

Download new versions now: