Google Summer of Code 2011 application started! Looking for mentors.

PostgreSQL is applying for GSoC again this year. We’re looking for:

* Mentors
* Project ideas

Are you a PostgreSQL community member, and would you like to mentor? Please let me know! Our application deadline is Friday, March 11, 2011 so please contact me *before* Friday.

I’ve started a wiki page: http://wiki.postgresql.org/wiki/GSoC_2011

It’s seeded with last year’s todo lists and information. We need to add project ideas for students to it.

The wiki pages for 2008 and 2010 are available, including links to the original student proposals:

http://wiki.postgresql.org/wiki/GSoC_2010
http://wiki.postgresql.org/wiki/GSoC_2008

Broken windows, broken code, broken systems

A few days ago, I asked:

I spend a lot of time thinking about the little details in systems – like the number of ephemeral ports consumed, number of open file descriptors and per-process memory utilization over time. Small changes across 50 machines can add up to a large overall change in performance.

And then, today, I saw this article:

One of the more telling comments I received was the idea that since the advent of virtualization, there’s no point in trying to fix anything anymore. If a weird error pops up, just redeploy the original template and toss the old VM on the scrap heap. Similar ideas revolved around re-imaging laptops and desktops rather than fixing the problem. OK. Full stop. A laptop or desktop is most certainly not a server, and servers should not be treated that way. But even that’s not the full reality of the situation.

I’m starting to think that current server virtualization technologies are contributing to the decline of real server administration skills.

There definitely has been a shift – “real server administration skills” are now more about packaging, software selection and managing dramatic shifts in utilization. It’s less important know to know exactly how to manage M4 with sendmail, and more important that you know you should probably use postfix instead. I don’t spend much time convincing clients that they need connection pooling; I debug the connection pooler that was chosen.

The available software for web development and operations is quite broad – the version of Linux you select, whether you are vendor supported or not, and the volume of open source tools to support applications.

Inevitably, the industry has shifted to configuration management, rather than configuration. And, honestly, the shift started about 15 years ago with cfengine.

Now we call this DevOps, the idea that systems management should be programmable. Burgess called this “Computer Immunology”. DevOps is a much better marketing term, but I think the core ideas remain the same: Make programmatic interfaces to manage systems and automate.

But, back to the broken window thing! I did some searching for development and broken windows and found that in 2007, a developer talked about Broken Window Theory:

People are reluctant to break something that works, but not so much when it doesn’t. If the build is already broken, then people won’t spend much time making sure their change doesn’t break it (well, break it further). But if the build is pristine green, then they will be very careful about it.

In 2005, Jeff Atwood mentioned the original source, and said “Maybe we should be sweating the small stuff.”

That stuck with me because I admit that I focus on the little details first. I try to fix and automate where I can, but for political or practical reasons, I often am unable to make the comprehensive system changes I’d like to see.

So, given that most of us live in the real world where some things are just left undone, where do we draw the line? What do we consider a bit of acceptable street litter, and what do we consider a broken window? When is it ok to just reboot the system, and when do you really need to figure out exactly what went wrong?

This decision making process is often the difference between a productive work day, and one filled with frustration.

The strategies that we use to make this choice are probably the most important aspects of system administration and devops today. There, of course, is never a single right answer for every business. But I’m sure there are some themes.

For example:

James posted “Rules for Infrastructure” just the other day, which is a repost of the original gist. What I like about this is that they are phrased philosophically: here are the lines in the sand, and the definitions that we’re all going to agree to.

Where do you draw the line? And how do you communicate to your colleagues where the line is?

Intro to PostgreSQL class starts March 7!

Remember that class I announced about a month ago?

Well, it’s happening for real. We’re starting March 7th and going for 6 weeks. Sign up now if you’re want to join us for this first edition of the class.

I’m planning to do screen casts for a lot of the content, and have just started playing around with Screenflow.

The first couple weeks are primarily about using psql and learning key features of PostgreSQL, with some history sprinkled in. The next two weeks dive into features like: full text search, built-in functions, our many datatypes, indexing and transactional DDL. I’ll be surveying students as we go along to add detail where I can on key features they’re interested in. The last few weeks go into administration, maintenance and configuration. I’ll also be throwing in details about the PostgreSQL community – people, the best places to go for help, and hopefully some cameos from Postgres community members.

So, don’t forget to sign up today! Especially because this pudding says so:

Image courtesy of @thesethings

PL Developer Summit at PgCon, May 21!

UPDATE: We have 18 PLs. Added to the list from comments. 🙂

You’re probably aware that PostgreSQL supports a few procedural languages, PL/PgSQL being the most well-known for compatibility with Oracle’s PL/SQL.

Interest in PostgreSQL Procedural Languages (PLs) has grown significantly in the last few years and so PgCon is hosting a special PL summit on Saturday May 21, 2011.

Did you know that there are 17 other procedural languages are currently implemented?

  1. PL/Tcl and PL/Tclu
  2. PL/Perl and PL/Perlu
  3. PL/Python and PL/Pythonu
  4. PL/Ruby
  5. PL/Java
  6. PL/Lua
  7. PL/LOLCODE
  8. PL/Js
  9. PL/Proxy
  10. PL/PHP
  11. PL/sh
  12. PL/R
  13. PL/Parrot
  14. PL/scheme
  15. PL/Perl6
  16. PL/PSM
  17. PL/XSLT

And we have at least one proprietary PL from EnterpriseDB:

We invite PL developers, PostgreSQL core hackers, those interested in future PL development and PgCon attendees interested in learning more to attend!

Before we decided to create this summit, I put together a survey for PL developers. All survey respondents wanted a summit to happen!

The most popular topics were:

  • Postgres PL Interface Improvements
  • Connecting with other PL developers
  • New features in PLs
  • Hacking together
  • State of PLs
  • Distributions and builds
  • PG9.1 extensions vs PL languages
  • Security (pl vs plu)
  • PGXN

The most popular PLs were:

  • PL/PgSQL
  • PL/Perl
  • PL/Python
  • PL/R

The summit is open to attendees of PgCon and special guests. Please RSVP and help set the agenda.

The agenda and any results of the summit will be published on the wiki.

PostgreSQL at MySQL Users Conference: the sessions!

You’ve probably seen a few posts about this – from the CFP, to Baron’s recent pointer to the release of the schedule. And now Josh Berkus just posted a Meetup for the event, so that spurred me on for this post…

So, just to make things even easier for you, I thought I’d summarize the awesome talks we’re having at the O’Reilly MySQL Users Conference this year related to PostgreSQL.

We’re also having a Birds of a Feather session, and staffing a booth on the exhibit floor!

If you’re planning to attend, you can use my code & save 25% in addition to early registration savings: mys11fsd: http://oreil.ly/goaqst

Hope to see you there!

Hot Standby features for 9.1, just committed: Pause and Resume

On February 8th, Simon Riggs committed a couple new functions that will allow Hot Standby to be paused and resumed. You can already *read* from the Hot Standby without pausing, but you could never pause the application of changes in the past. You might want to do this if you have a very high-write-volume server, and some very expensive queries that you want to run on a slave.

Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
Status check functions only. Also, new recovery.conf parameter to
pause_at_recovery_target, default on.

The basic idea is that if you have a read-only standby system, you can give it the command: pg_xlog_replay_pause() and the standby will stop applying changes. Then you can use the database in read-only mode without new changes being applied. When you’re done you can issue the command: pg_xlog_replay_resume() and proceed with applying logs.

There are some related features that I can’t wait to test out around named restore points for replay. But the ability to pause replay and run queries is just awesome.

This is a feature that Simon talked about back in 2009 at FOSDEM, and I am very excited to see it implemented.

Offering an Intro to PostgreSQL class

UPDATE: See below for pricing.

I’m working with Code Lesson to offer an Introduction to PostgreSQL class.

Code Lesson is pretty cool – it’s an online course system, and the idea is you get a couple assignments and lessons taught by me each week, and there’s a midterm and final evaluation. I love conferences, but the nice thing about an online course is you don’t have to spend an entire workday taking a tutorial at a conference, or travelling to a particular location, and you can finish assignments when it’s convenient for you.

My current working outline is:

Intro to Postgres

Hello, world!
* History of PostgreSQL project
* Features
* Basic SQL

Usage
* psql
* Drivers: Perl and Python examples
* GUIs
* Documentation

Survey of features
* Full text search
* Built-in functions
* Datatypes
* Indexes
* Transactional DDL

Community
* Mailing lists & IRC
* Asking questions
* Modules, add-ons, tools

Operations
* System and hardware
* Installation and configuration
* Maintenance and operation
* Replication

Our plan is to provide students with login access to a shared database. During the course, I’ll be available to answer questions and I’m considering making short videos to go along with the course material.

We haven’t set the price for it just yet, but should be figuring that out in the next week or so.

Anyway, if you’re interested, sign up and you’ll get an email when we set the price. I’m happy to answer any questions you have about content.

Another thing that was requested in the Hacker News thread was more advanced material. I think the advanced material falls into two categories – PostgreSQL core functionality, and administration/tuning.

Update! Pricing is set at $325/student, with a 10% discount if you register 2 or more students at the same time.

Invited lecture at Oregon State University: PostgreSQL

Oregon State University invited me to give a technical talk yesterday about PostgreSQL and open source community. It was aimed at graduate level students, and an attempt to recruit them.

It’s not in the slides, but I told a short story in the ‘but there’s a disconnect’ slide. I wanted to convey to them that both the free and open source community and the educational system are failing them when it comes to open source. I attended a conference last fall that had nearly 1000 undergraduate and graduate students. I staffed a booth about open source software, and nearly every student that dropped by couldn’t think of a single open source project they’d ever heard of. With prompting, they’d agree that Linux was probably open source.

We had a discussion about how our community manages conflict, who the big production users of PostgreSQL are, and how folks can get involved. I gave out a few tshirts and buttons, and ran into some old friends from Open Source Bridge.

GSoC Mentor Summit: Day 1

Today was the first day of the GSoC Mentor summit. I attended a few sessions and had several interesting hallway conversations with developers and leaders of projects from all over the world.

First, I attended a discussion about book sprinting, and did a recap of how our latest book sprint went (blog post to come!). We discussed the advantages of having the same group of people to two book sprints as a group, and how things seemed so much easier the second time around. We also had copies of the book that we’d hand-bound there to share. Lunch was spent chatting with Noirin and others about food, culture, travelling lots, and the hilariousness of having “Sotomayor” as a surname in Washington DC these days. Happy to have met a couple more Apache Foundation folks, and lovely to talk about names with an OpenNebula contributor. I also spent some time chatting with Greg Stark about the session on retaining students, and go over a few bits of inspiration he had for encouraging students to work on the more mundane aspects of PostgreSQL development.

Next I stopped upstairs to have a chat with Asheesh Laroia about new things he’s been up to around promoting free and open source community. He’d run a class recently to introduce new students to open source (at Penn State), and had some thoughts on what we should do next to make open source communities more welcoming. He also talked to me about Fedora Design Bounty, and how that model might be applied to other projects. Genius idea, and after reflecting on the idea a bit, maybe we could try it in pgsql-advocacy. Maybe. 🙂

I then breezed through the Chocolate session. Yum!

And went off to see about Bradley Kuhn‘s session on options for joining or starting non-profits around free software. He’s now executive director of the Software Freedom Conservancy, and was giving out great advice around picking an umbrella organization, making the right choices early about where to put money (don’t use your personal paypal account!), and notes on where to go for help if you’ve got questions about what to do next. Not sure where the notes are for that session, but I’m sure contacting him for more information about Software Freedom Conservancy if you’re interested is an option.

Then we had the great Git Migration discussion. The notes were wonderful, and it seems like many people were either considering or were in the middle of a git conversion process. Two PostgreSQL developers were there, including Magnus Hagander, whose voice wasn’t working so well. I helped out a bit by giving a rough overview of how our process had worked, and pointing people at the many resources and tools Magnus and others who worked on the conversion made available.

Afterward, I sat down for a bit with Zooko to talk about Tahoe-LAFS, which appears to be an encrypted, distributed document store database with a http interface. Sounds really cool, and I’m interested in trying it out.

Now, I’m getting ready to head off to the party for the evening. Great day!

Thoughts on PostgreSQL 9.0 release

Something I wrote for a press contact last month that I wanted to share:

We started the process toward 9.0 last year when we added new committers and invited many new people into the commitfest process (our way of getting lots of patches reviewed, approved and committed every two months). What we’ve found is that we can engage new developers by providing a clear way for them to help in small, well-defined ways.

As a group, we work really hard to recruit and maintain long-term relationships with developers. And that investment in people has paid off really well in 9.0. We have long term commitments from volunteers and independent businesses to implement features that take multiple years to see through to completion. The binary replication is a clear example of that, and we have many other projects underway that are only possible because developers trust our core development team to see them through.

It’s not the most headline-grabbing thing that we do. But it is pretty amazing that a group of people, with no central authority, “benevolent dictator” or business driving it, continue every year to produce a trustworthy, stable and feature-rich database that rivals what’s produced by the best-funded enterprises in the world.

What do you think the best part of the 9.0 release was?