The future of free and open source support models

I attended the MySQL Conference all last week, and am feeling very excited about the future of open source databases. I had many interesting discussions and met a ton of Drizzle hackers I was lucky enough to spend Friday with, digging through code.

I was talking with Paul VallΓ©e of the Pythian Group Thursday about Postgres and the future of enterprise support. And he showed me this great graph from indeed.com. It’s acceleration here, not the raw numbers – but still, a neat graph πŸ™‚

We discussed the issues that enterprise customers with certain types of regulatory obligations encounter — such as contractual obligations for PCI-compliant credit card storage or outsourced management of sensitive data. The standard response developers might give for this is “read the spec, and make sure you implement it properly”. But the truth is, for larger companies, that may not be enough.

So, assuming for a moment that the Postgres community would even want to address this problem as a group — could it be possible for the Postgres community to provide the legal and financial assurances that an incredibly huge corporation (ahem – Sun/Oracle) can?

The short answer for Postgres right now is “no”.

Originally, I had thought just in term of liability, but Paul clarified:

The liability is just one component of what gives the guarantee meaning because there is a consequence to failed delivery. An SLA can also do this. As can a simple lucrative contract that can be lost, or canceled early if delivery does no take place. The key here is to ensure that the technology adopter can legitimately be confident that they are provably being responsible by adopting the platform. “I trusted” doesn’t cut it for many.

My view was that this type of agreement helps to determine who exactly is to blame (and who can be sued) in the event of a software failure. But, Paul said, “It’s more about assurance (with evidence) that obligations realistically will be met.”

I sometimes think that this system of liability and assurances is just ultimately broken. But it is a reality. So, would it be possible for us to come up with a new legal framework for community-driven software?

Paul brought up the idea of a cooperative, and that maybe such a legal entity could provide protection for individuals involved in supporting Postgres, and also shoulder some or all of the liability that a corporation using Postgres would want. I’m not sure that core developers of Postgres would join such a thing, or whether they would be allowed to given existing agreements they have with their own companies. But it is an interesting idea.

Creating a blueprint for this type of organization – hackers cooperatives – could be a way for truly community software to be developed across companies and among individuals in a sustainable, and “trustable” way. Maybe?

Continuing this train of thought – maybe these are non-governmental organizations, whose main purpose is to create and maintain infrastructure software for the good of the world.

Funding for mid-sized free and open source projects seems to be a consistent problem. Perhaps NGOs are a fair model for us.

I am curious about what effort may have already been made in this direction. My next step will be to contact Bradley Kuhn and see if there’s something out there that might address this.

Greg’s THREE talks at PostgreSQL Conference East

Greg Sabino Mullane will be presenting three talks at PostgreSQL Conference East this weekend in Philadelphia, at Drexel University. The talks are listed on the site, and here’s what he’ll be speaking about:

Bucardo
April 5, Sunday, 10am
Bucardo is a replication system for Postgres that uses triggers to asynchronously copy data from one server to many others (master-slave) or to exchange data between two servers (master-master). We’ll look at replication in general and where Bucardo fits in among other solutions, we’ll take a look at some of its features and use-cases, and discuss where it is going next. We’ll setup a running system along the way to demonstrate how it all works.

Monitoring Postgres with check_postgres.pl
April 4, Saturday, 2:30pm
What should you monitor? And how? We’ll look at the sort of things you should care about when watching over your Postgres databases, as well as ways to graph and analyze metadata about about your database, with a focus on the check_postgres.pl script.

The Power of psql
April 4, Saturday 10:30am
All about everyone’s favorite Postgres utility, psql, the best command-line database interface, period. We’ll cover basic and advanced usage.

I’ve seen a few of Greg’s talks — The Magic of MVCC, Cloning an elephant and a few others. He’s a great speaker and cool guy. And he’s my boss. But I’m not just saying that because he’s my boss! Really!

He doesn’t like to brag about himself, so I’m gonna help him out. He maintains DBD::Pg, check_postgres.pl, Bucardo and has had MANY patches committed to PostgreSQL. He’s also a volunteer for the PostgreSQL sysadmins team, and specifically helps maintain the git repo box. He’s a contributor to the MediaWiki project. He’s on the board of the United States PostgreSQL Association. He’s basically awesome.

If you’re gonna be there, you should check out his talks. And if you can’t make it, here’s hoping Josh Drake records the talks and shares them with us all! πŸ™‚

Twitter and PostgreSQL!

Twitter: What are you doing?
Uploaded with plasq‘s Skitch!

On pgsql-general, Doug Hunley mentioned he’d created a twitter account for pgsql-announce! Way cool.

I’d written during last PgCon about Postgres and Twitter, and I figured it was time for a new list of Postgres-related people who I follow! Especially since a few people commented that Twitter was a waste of time last year πŸ˜‰

If you’re on twitter (or identi.ca), and I missed you — please comment below!

Here we go (in no particular order):

  • Selena Deckelmann (me!)
  • Gabrielle Roth, member of PDXPUG, main force behind Code-N-Splode
  • Mark Wong, performance expert, leading the Portland PostgreSQL Performance Pad and associated projects to bring regular performance testing back to PostgreSQL
  • Francisco Figueiredo Jr., developer maintainer of Npgsql, speaker, member of PostgreSQL.Br
  • Magnus Hagander President of Pg.EU – the European Union non-profit organization dedicated to PostgreSQL and supporting user groups in the region
  • Josh Berkus, pgsql-advocacy leader, Member of the PostgreSQL core team
  • Jean-Paul Argudo, leader/member of PostgreSQL.Fr and Treasurer of Pg.EU
  • Hubert Lubaczewski , author of a great technical blog about PostgreSQL http://www.depesz.com/
  • Nikolay Samokhvalov, leader of the Moscow PostgreSQL Users Group, and consultant in Russia
  • Kristin Tufte, Postgres user, member of PDXPUG and assistant professor at Portland State University
  • Satoshi Nagayasu, member of the Japanese PostgreSQL Users Group, and spearheading meetups in Tokoyo
  • Brenda Wallace, moble gadget fetishist, Drupalista and Wellington, NZ PostgreSQL User Group wrangler
  • Isis Borges, Postgres enthusiast, works in the fashion industry in Puerto Alegre, Brazil
  • Dan Langille, DBA and organizer behind PgCon
  • Michael Brewer, DBA and board member of the United States PostgreSQL Association
  • Joshua Drake, business owner, board member of the United States PostgreSQL Association
  • FΓ‘bio Telles Rodriguez, active member of the PostgreSQL.Br (Brazil) and PgDay Brazil organizer. If you speak Portuguese, you can check out Planet Postgres Br here – http://planeta.postgresql.org.br/
  • Fernando Ike, member of PostgreSQL.Br
  • Ed Borasky, PhD, analytics nerd, PDXPUG member
  • Robert Treat, author of PHP and PostgreSQL book, speaker, on the board of the United States PostgreSQL Association
  • David Wheeler, contributed citext most recently to PostgreSQL, consultant, maintainer of Bricolage, formerly of I Want Sandy
  • Greg Sabino Mullane, author of Bucardo and check_postgres.pl, maintainer of DBD::Pg, recently contributed patches to psql, on the board of the United States PostgreSQL Association, my boss πŸ™‚
  • Christophe, volunteer at OSCON for PostgreSQL booth, DBA
  • Aaron Thul, DBA, developer, speaker on PostgreSQL on Drugs πŸ™‚
  • David Fetter, DBA, maintainer of the PostgreSQL Weekly News
  • Elein Mustain, DBA, speaker, maintainer of http://varlena.com
  • Chris May, DBA, member of PDXPUG
  • Jason Kirtland, developer, maintainer of SQLAlchemy, Pythonista
  • Josh Tolley, developer, DBA, statistics nerd, author of PL/LOLCODE and pgsnmpd
  • Erik Jones, Portland resident, Pythonista, made a cool python-based partitioning tool (pgpartitioner)
  • Nicholas Kreidberg, Nevada resident, PostgreSQL user
  • Gavin Roy, DBA, Business dude, Myyearbook.com, speaker, on the board of of United States PostgreSQL Association
  • Chris Browne, Slony maintainer
  • Douglas Hunley, creator of pgsql_announce on twitter πŸ™‚
  • Larry Rosenman, PostgreSQL supporter, help with DNS for PostgreSQL.org, contributor (some of the syslog* stuff in version 7.0)

Organizations:

Report from SCaLE 7x

DSC_0079
Awesome booth volunteers Noel and Erez!

In the elevator this morning, a person asked me if I was with the SCaLE conference. He started by saying that he was really happy that we (the people attending the conference) were there, and that he hoped for Linux to be successful. And then he said, “I’m a linux supporter, but I’m a windows captive.”

Because I work every day with free software, I lose touch with people who feel trapped by their operating system. That moment in the elevator reminded me that not everyone is as lucky as I am!

This is my second year attending SCaLE. I’m just as excited as I was last year about the number of end users, systems administrators, and enthusiastic supporters of free and open source software. During Joe Brockmeier’s keynote, he asked the crowd to raise their hands if they were already contributing to an open source project, and less than 1/3 of the crowd raised their hands. Other conferences I attend seem to attract mostly people who are already contributors. I’m very happy to see SCaLE having a wider reach.

My favorite event was definitely the PostgreSQL LAPUG birds of a feather session on Saturday evening. We filled the room and had to fetch chairs from outside! Josh Berkus and Magnus Hagander provided some great slides that I used for a quick tour through new SQL programming, administrative and security features in the upcoming release of 8.4. This presentation was basically a tag team effort between myself and Josh.

More than 3/4 of the people had never attended a PostgreSQL user group meeting before, and I hope to hear that they all subscribe to the mailing list and attend some meetings!

We had great traffic at the Postgres booth. There were a surprising number of people who asked about migrating from MSSQL to Postgres. Fortunately, we had at least one person with a fair amount of Windows experience at the booth (Thanks, John!). I also was grateful that many people stopped by with follow up questions about the filesystems I/O talk I gave. I really felt like it was well received, and I hope that we end up with a few new recruits to our testing.

Great show! And now I’m off to relax downstairs before I work a downtime this evening πŸ™‚

FSM, visibility map and new VACUUM awesomeness


Heikki Linnakangas, listening as Simon Riggs sketches on the chalkboard.

Update: Heikki’s slides are here!

Heikki Linnakangas gave a presentation this past Sunday at FOSDEM about the improved free space map (FSM), which tracks unused space inside the database, and new visibility map, a bitmap which will indicate which data pages can be skipped during a partial VACUUM. This performance enhancement will affect all users of the upcoming 8.4 software release. You can see what the new FSM implementation looked like back in October from depesz’s blog.

Despite Heikki’s modest claim during the talk that the performance tests were inconclusive, the consensus among Postgres contributors is that this feature will result a substantial improvement in the performance of VACUUM for tables that are large, but have few UPDATEs.

The new free space map and Visibility map (in 8.4) and autovacuum (enabled by default starting in version 8.2) are huge administrative usability improvements to version 8 of Postgres. Prior to version 8.1, VACUUM had to be scheduled outside of database system. Autovacuum has been part of the core Postgres distribution for over two years, and is tunable via several global configuration parameters.

The visibility map enables partial VACUUMs — meaning that VACUUM no longer has to examine every tuple to update the FSM. The new FSM implementation eliminates two configuration parameters, effectively automating a formerly manual configuration process.

The new FSM is stored on disk in seperate files inside of $PGDATA/base/, and is cached in shared_buffers. The result is that the max_fsm_* configuration parameters are no longer in 8.4 — Postgres is able to track and adjust this data structure without user intervention.

A few critical features of the new FSM are:

* Now a binary tree structure
* Constructed using 1 byte per heap page
* The top level shows the maximum amount of contiguous space available
* The data structure is auto-repairing and can be reconstructed from the bottom

Previously, every time that VACUUM was run, the free space map had to be reconstructed from scratch. Now, individual nodes in the map may be updated (aka “retail” updates).

Visibility map is a bitmap of heap pages which tracks which tuples on pages are visible to transactions, and therefore not available for VACUUMing.

Previously, when VACUUM ran, it *had* to look at every tuple in a table, because there was no information about which pages may not have been updated since the last VACUUM. With the visibility map, VACUUM will now be able to perform partial scans of table data, skipping pages which are marked as fully visible. Partial scans means fewer I/O operations for VACUUM, and happier database administrators.

Simon Riggs just rocked my world.

I’m in Brussels for the FOSDEM conference, hanging out at the PostgreSQL booth, meeting my European colleagues, and running into friends.

PostgreSQL has a developer’s room and Simon Riggs just wrapped up a talk about Replication. I sincerely hope that the video of the talk turned out well, because it was the most inspiring and technically interesting talk I have seen in a very long time. Unfortunately, I don’t have a copy of the slides at the moment, but word is that they will be posted on the BSD wiki soon.

Simon focused on new features in 8.4 that affect file-based replication, also mentioning streaming, synchronous replication — which will not be included in 8.4, but is being actively worked on. He explained his rationale for objecting to the inclusion of the synchronous replication patches, mostly, I think, based on the complexity of the WAL archiving required as it was implemented.

Then, Simon launched into an in-depth tour of the issues and solutions brought about during his team’s work on Hot Standby. Hot Standby allows read-only queries to be made against a file-based replication enabled Postgres server, known as Point-in-time recovery and WAL Shipping in the Postgres documentation.

Simon started work on PITR-related patches about five years ago, and continues that work with others today.

One fascinating aspect of the hot standby patches is that they ultimately caused performance improvements in sub-transactions across the board – and will likely cause up to 5% improvement in that code path. There were other performance improvements, but I’ll wait for the slides to mention those. At several times during the talk, Simon pointed out features that Postgres has that no other database has — such as multiple options for dealing with conflicts in hot standby (freezing, conflict resolution and timeout).

At the end of the talk, Simon spent a few minutes talking about how Postgres is capable of being the best database, not just the best open source database. And how all the people in the room were capable of contributing as he had. He claimed that prioritization and aiming to work on the biggest, most interesting problem you can are all you need. And he claimed that all that made him different was that he was a little more persistent about solving problems.

Rock on, Simon.

What are you waiting for? Get your PgCon talks in now!


Yes, that’s me, with Tom Lane. You, too, might be able to get your picture with Tom!

Like Josh Berkus said yesterday:

As of today, you have 2 weeks left to submit talk proposals to PGCon.

You know you want to. PGCon is the international conference for PostgreSQL hackers, sysadmins, application developers, SQL geeks and other Smart People. Submit your talk! Be a Smart Person too!

PGCon will be happening May 21-22 in Ottawa, Canada, with tutorials on May 19 and 20. Some financial help is often available for speakers, but none is available for non-speakers. So submit, submit!

We particularly could use some talks on the new 8.4 features, really creative PostgreSQL applications, massive Postgres scaling, PostGIS, BioPostgres, and a few case studies. This means you.

I attended PgCon last year for the first time. Not only were the presentations top notch, but Dan Langille‘s hospitality set the groundwork for yet another fantastic community-building experience PostgreSQL community members experienced during the 2006 Anniversary summit in Toronto, again in 2007 at the first PgCon.

We had plenty of outstanding socializing and hacking opportunities. Last year’s conference started with a gathering of committers that was fodder for great pub and hallway track conversation all week. Great talks I saw included Andrew Sullivan’s Idle thoughts on PostgreSQL Project Management, Greg Sabino Mullane’s Bucardo talk about this multi-master replication tool, and Magnus Hagander’s walk through how search.postgresql.org was implemented.

Ottawa was beautiful last year, and I can’t wait to go back this May!

A year of PDXPUG

Last year was the third year that PDXPUG has been operating in Portland, and I decided to look back at our year of meetings. Here goes:

January 11 – 10 things you can use in PostgreSQL 8.3
February 26 – Extreme Database Makeover: RT
March 20 – Managing Internet Services: Using the right tool for the job
April 17 – Rails on PostgreSQL
May 15 – PostgreSQL for Pythoneers
June 19 – The relational model
July 20 – PDXPUG DAY!, and the schedule
August 21 – Tsearch2 and Materialized Views (Guest speaker from Seattle!!)
September 18 – The Visual Planner
October 16 – Point In Time Recovery
November 20 – Reviewed 8.4 features with the help of depesz’s blog
December – Coder’s Social

Thanks everyone who gave talks and attended meetings! User groups are only as good as the people who participate in them, and this list shows just how talented, diverse and fun the Postgres community is in Portland. I love you guys!

Looking forward – once again, we’ve already scheduled talks through the next four months! I feel like the group is running on its own momentum, and that is a fabulous feeling. We have a data visualization talk, another Extreme Database Makeover, and hopefully a presentation about teaching database theory with PostgreSQL.

Our next meeting is on January 15, 7pm with Stephen Jazdzewski traveling all the way from Eugene to present SplendidCRM, a formerly Microsoft SQL-only system that is now compatible with PostgreSQL. I am happy to see more of our Microsoft colleagues joining and presenting to the user group communities, as I’ve always felt they are underrepresented in our groups. Also, I’m happy to host another out-of-town presenter here in Portland! Hope to see you on the 15th.

Mentor Summit Report for PostgreSQL

mentor summit

Update: Fixed the etherboot wiki link.

I attended the Google Summer of Code Mentor Summit this past weekend on behalf of PostgreSQL. We met at the Google campus in Mountain View.

This event was an unconference and so, none of the sessions were determined in advance.

Some of the highlights were:

  • Leslie Hawthorn and Chris DiBona went into some detail with the whole group about the selection process for GSOC. This session made me feel as though PostgreSQL had relatively good chances for being accepted again next year. Google, however, does not pre-announce projects/products, so there is no sure thing about our (or any other project’s) involvement.
  • I met MusicBrainz guys and was pleased to receive many bars of chocolate they requested to be distributed to SFPUG and PDXPUG members as thanks for making an great database.
  • Attended three sessions concerning recruitment and retention of students. This is a topic that many people were interested in, but that few people feel they have a proper strategy for.

I also led a session on recruitment and retention of students to open source projects. Some of the ideas that came out of that and the related sessions were:

  • Determine what makes you personally need to be part of Postgres (joy of learning, scratching a technical itch, making a tool for your job, fame). Find out which of those things your student also needs or wants and try to give that or help your student achieve that thing.
  • Have a clearly defined method for students to keep journals. Several projects simply used MediaWiki and templates.
  • Use git (or other distributed revision control), and have students commit early and often to a branch that mentors have access to.
  • The Etherboot project has a great system: http://etherboot.org/wiki/soc/2008/start
  • Hold weekly meetings over IRC. These can be brief, but help get students accustomed to your project’s culture and way of doing things.
  • Ask the student: “are you on track?”, ask the mentor: “do you think the student is on track?” on a weekly basis
  • If you want students to stick around, find incremental responsibilities to assign that are driven by their enthusiasm.
  • Interview on the phone all your students ahead of time, not just the ones you think might be a problem.
  • Require a phone number on the application for the student.
  • Require a secondary contact so that if the student “disappears” there’s a backup person to contact. (and contact that person BEFORE SoC starts)

I made good connections with members of Git, Parrot, WorldForge, Ruby and many other community leaders. I was particularly impressed by the ideas and stories from the current Debian project leader, Steve McIntyre and Gentoo council member Donnie Berkholz. Donnie recommended some books about recruitment that I plan to read and review in the next few weeks.

The issue of mailing list moderation and the number of people required to keep mailing lists functioning properly came up frequently. If you know a moderator for a Postgres mailing list, please consider thanking them for doing a very tedious, extremely important and often thankless job.

I also spent some time discussing with Leslie Hawthorn and Cat Allman how to increase the total number of women mentors and students next year. Leslie and I shared some ideas and I offered to help implement them next year. One thing the crowd asked for was explicit training on how to recruit and manage female students. Realistically, this information will apply to all students, and I hope this training helps us recruit more students overall.

I thought the conference went quite well. I hope PostgreSQL is accepted next year, and that one of our mentors is able to attend this conference. And, if you go, be sure to register for the hotel early, and stay at the Wild Palms.

PgDay EU – Best conference ever?

PgEU conference Photo
Smiling, PostgreSQL-using conference goers

Anyone who went will likely tell you how much they enjoyed PGDay EU 2008. My biggest regret this year is that I was unable to attend.

Magnus Hagander’s Planet blog is still missing, so I’m blogging the link to his fabulous photos for him. If you see any pictures with missing names or labels, please let him know at magnus -at- hagander (dot) net.

This conference was once again led by Gabriele Bartolini, and a large supporting team – including the other board members of PgEU – Magnus Hagander, Jean-Paul Argudo and Andreas Scherbaum. PgEU (for those that don’t know) is a non-profit dedicated to promoting PostgreSQL in European nations. Next year, this conference will be held somewhere in France! Stay tuned to planet.postgresql.org for details!

Update:

My favorite photo:
lolcats and greg stark