postgresql | , Page 3

March 24, 2011 postgres, postgresql foreign data wrappers, plpgsql, postgres, postgresql 1 Comment

Report from first day at PgEast and hoping for another tool to be opened up

I wrote up some quick notes from talks and conversations over at the Emma Tech blog.

The most exciting talk I sat in today so far was about an Oracle PL/SQL to Postgres PL/PgSQL translation tool that I’m hoping the company who created it will open source. We’ll see. Fortunately, a fellow conference-goer had an inspirational story to share about open sourcing another tool for Postgres, which meant incredible adoption in just a few months in our community.

Not every project will see that kind of immediate benefit and growth from open sourcing, but there is a certain class of project – where most people can complete 80% of a useful tool, but don’t bother to put in the additional effort to get the remaining 20% of the features that they’d really like to have.

But, when someone does finally release a tool that provides that extra 20% of features, adopting the new tool is a no-brainer.. particularly if it is open source. I think this PL/SQL conversion tool falls into this sweet spot.

Now I’m sitting in the Foreign Data Wrappers talk and very excited to see what Andrew is announcing. Great to see people creating things that make the crowd here clap, smile and celebrate.

March 19, 2011 postgres, postgresql gsoc, gsoc2011, postgres, postgresql

GSoC 2011, accepting submissions starting March 28!

The PostgreSQL project has been accepted into the Google Summer of Code 2011.

Students may begin submitting proposals starting March 28, concluding
on April 8.

Development work runs from May 23 through August 15. For students,
suggested projects, ideas and details are at:
http://wiki.postgresql.org/wiki/GSoC_2011
Our GSoC landing page is at:
http://www.google-melange.com/gsoc/org/show/google/gsoc2011/postgresql

We encourage students to contact project admins – me, Josh Berkus and
Robert Treat this year – if they have questions. Once students have a
proposal in mind, we will encourage them to engage with pgsql-hackers
to flesh out their proposals and get feedback the same way that all
contributors do. For those of you who have been around for previous
GSoCs, this should be familiar to you. 🙂

Many thanks to the 15 volunteer mentors and admins this year (in no
particular order):

Dave Page – Past mentor – pgAdmin, Windows, Packaging, Infrastructure
Heikki Linnakangas – Postgres Committer
Magnus Hagander – Postgres Committer, pgAdmin
Guillaume Lelarge – pgAdmin
Jehan-Guillaume de Rorthais – phpPgAdmin
Joe Abbate – Python-related, catalog-related projects
David E. Wheeler – Perl-related, extensions, PGXN
Mark Wong – benchmarking, monitoring, performance
Tatsuo Ishii – Postgres Committer, pgpool-II
Stephen Frost – Postgres contributor
Devrim Gündüz – Administration related software (dashboard)
Josh Berkus – auto-configuration, performance testing
Selena Deckelmann – configuration, testing
Andreas Scherbaum – performance, configuration, testing
Robert Treat – Past mentor 2x, co-admin, Mentor Summit attendee.

We can always accept more mentors! Actual assignment to projects
depends greatly on the proposals from students. Please contact me if
you are interested.

March 1, 2011 postgres class, codelesson, introtopgsql, postgres, postgresql 6 Comments

Intro to PostgreSQL class starts March 7!

Remember that class I announced about a month ago?

Well, it’s happening for real. We’re starting March 7th and going for 6 weeks. Sign up now if you’re want to join us for this first edition of the class.

I’m planning to do screen casts for a lot of the content, and have just started playing around with Screenflow.

The first couple weeks are primarily about using psql and learning key features of PostgreSQL, with some history sprinkled in. The next two weeks dive into features like: full text search, built-in functions, our many datatypes, indexing and transactional DDL. I’ll be surveying students as we go along to add detail where I can on key features they’re interested in. The last few weeks go into administration, maintenance and configuration. I’ll also be throwing in details about the PostgreSQL community – people, the best places to go for help, and hopefully some cameos from Postgres community members.

So, don’t forget to sign up today! Especially because this pudding says so:

Image courtesy of @thesethings

February 28, 2011 postgres, postgresql pgcon, pgcon2011, pl, plsummit, postgres, postgresql, putaconferenceonit 7 Comments

PL Developer Summit at PgCon, May 21!

UPDATE: We have 18 PLs. Added to the list from comments. 🙂

You’re probably aware that PostgreSQL supports a few procedural languages, PL/PgSQL being the most well-known for compatibility with Oracle’s PL/SQL.

Interest in PostgreSQL Procedural Languages (PLs) has grown significantly in the last few years and so PgCon is hosting a special PL summit on Saturday May 21, 2011.

Did you know that there are 17 other procedural languages are currently implemented?

And we have at least one proprietary PL from EnterpriseDB:

EDB-SPL

We invite PL developers, PostgreSQL core hackers, those interested in future PL development and PgCon attendees interested in learning more to attend!

Before we decided to create this summit, I put together a survey for PL developers. All survey respondents wanted a summit to happen!

The most popular topics were:

Postgres PL Interface Improvements
Connecting with other PL developers
New features in PLs
Hacking together
State of PLs
Distributions and builds
PG9.1 extensions vs PL languages
Security (pl vs plu)
PGXN

The most popular PLs were:

PL/PgSQL
PL/Perl
PL/Python
PL/R

The summit is open to attendees of PgCon and special guests. Please RSVP and help set the agenda.

The agenda and any results of the summit will be published on the wiki.

February 14, 2011 postgres, postgresql conference, discount, mysqlconf, postgres, postgresql 2 Comments

PostgreSQL at MySQL Users Conference: the sessions!

You’ve probably seen a few posts about this – from the CFP, to Baron’s recent pointer to the release of the schedule. And now Josh Berkus just posted a Meetup for the event, so that spurred me on for this post…

So, just to make things even easier for you, I thought I’d summarize the awesome talks we’re having at the O’Reilly MySQL Users Conference this year related to PostgreSQL.

Building Data Warehouses with PostgreSQL, Josh Berkus (PostgreSQL Experts, Inc.)
Has your database grown to hundreds of gigabytes in size, with no limit in sight? Are you considering moving to an expensive proprietary database system do deal with your huge database? PostgreSQL is an excellent database for small to medium sized data warehouses in the 0.5 to 5 terabyte range.
Bottom-up Database Benchmarking, Greg Smith (2ndQuadrant US)
While databases are increasingly being distributed across multiple nodes, the performance of every node still matters–especially if you’re considering virtualized or cloud deployments that have their own specific trade-offs. Memory performance scaling as core count changes, all aspects of disk performance, and using sysbench to benchmark both MySQL and PostgreSQL are all topics covered here.
An Introduction to PostGIS – the PostgreSQL spatial extension, Ragi Burhum (Burhum LLC – GIS Consulting)
PostGIS is an extension to the PostgreSQL object-relational database system which allows GIS (Geographic Information Systems) objects to be stored in the database. It includes support for spatial indexes, and functions for analysis and processing of GIS objects.
Securing PostgreSQL From External Attack, Bruce Momjian (EnterpriseDB)
This talk explores the ways attackers with no authorized database access can steal Postgres passwords, see database queries and results, and even intercept database sessions and return false data. Postgres supports features to eliminate all of these threats, but administrators must understand the attack vulnerabilities to protect against them.
Introduction to PostgreSQL Configuration, Robert Haas (EnterpriseDB)
PostgreSQL is highly customizable, but which settings are most important and what values are most appropriate for a typical installation? This talk will explain the basics of how to configure PostgreSQL for reliability and good performance.
Mixed MySQL/PostgreSQL environments, Jeff Davis (Aster Data)
Mixed SQL system environments are a reality for most organizations. MySQL and PostgreSQL are a natural combination — both are open source, and they complement each other nicely. See how to improve data consolidation, increase confidence in query results, and analyze data across applications.
Maintaining Terabytes: 10 Things to Watch Out For When PostgresSQL Bets Big, Selena Deckelmann (PostgreSQL)
Size can creep up on you. Some day you may wake up to a multi-terabyte Postgres system handling over 3000 tps staring you down. Learn the best ways to manage these systems as they grow, and find out what new features in 9.0 have made life easier for administrators and application developers working with big data.
Openstreetmap -> (PostGIS|MySQL|SpatiaLite) -> OpenLayers: From Map to Web, Hartmut Holzgraefe (…???…)
OpenStreetMap raw data for any non-trivial area comes as a massive amount of XML data. Processing that XML data directly is possible, importing it into into a spatial database provides for much more interesting processing options though, especially when it comes to producing on demand map data for web applications with acceptable performance.
War Stories and Solutions: Operational Fun with PostgreSQL and PostGIS in the Cloud, Andy Parsons (Obikosh.com)
As CTO of Outside.in, and in my new stealth company, I’ve seen my share of challenging scenarios keeping a very busy PostgreSQL-based startup online and responsive during tremendous growth. EC2 + PostgreSQL + PostGIS + no downtime. Others can probably learn from my battle scars!
Replace phpMyAdmin with Something Better, Jakub Vrana (Self-employed)
phpMyAdmin is a well-known PHP application for managing MySQL database. What’s wrong with it? It is big, slow and it misses support for many advanced features like stored procedures or triggers. Its free alternative Adminer provides user-friendly interface, requires no setup, is lightning fast and highly customizable. Adminer is available for MySQL, PostgreSQL, SQLite, MS SQL and Oracle.

We’re also having a Birds of a Feather session, and staffing a booth on the exhibit floor!

If you’re planning to attend, you can use my code & save 25% in addition to early registration savings: mys11fsd: http://oreil.ly/goaqst

Hope to see you there!

February 10, 2011 postgres, postgresql 9.1, commits, features, hot standby, postgres, postgresql, simon riggs 1 Comment

Hot Standby features for 9.1, just committed: Pause and Resume

On February 8th, Simon Riggs committed a couple new functions that will allow Hot Standby to be paused and resumed. You can already *read* from the Hot Standby without pausing, but you could never pause the application of changes in the past. You might want to do this if you have a very high-write-volume server, and some very expensive queries that you want to run on a slave.

Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
Status check functions only. Also, new recovery.conf parameter to
pause_at_recovery_target, default on.

The basic idea is that if you have a read-only standby system, you can give it the command: pg_xlog_replay_pause() and the standby will stop applying changes. Then you can use the database in read-only mode without new changes being applied. When you’re done you can issue the command: pg_xlog_replay_resume() and proceed with applying logs.

There are some related features that I can’t wait to test out around named restore points for replay. But the ability to pause replay and run queries is just awesome.

This is a feature that Simon talked about back in 2009 at FOSDEM, and I am very excited to see it implemented.

February 10, 2011 postgres, postgresql class, codelesson, postgres, postgresql 7 Comments

Offering an Intro to PostgreSQL class

UPDATE: See below for pricing.

I’m working with Code Lesson to offer an Introduction to PostgreSQL class.

Code Lesson is pretty cool – it’s an online course system, and the idea is you get a couple assignments and lessons taught by me each week, and there’s a midterm and final evaluation. I love conferences, but the nice thing about an online course is you don’t have to spend an entire workday taking a tutorial at a conference, or travelling to a particular location, and you can finish assignments when it’s convenient for you.

My current working outline is:

Intro to Postgres

Hello, world!
* History of PostgreSQL project
* Features
* Basic SQL

Usage
* psql
* Drivers: Perl and Python examples
* GUIs
* Documentation

Survey of features
* Full text search
* Built-in functions
* Datatypes
* Indexes
* Transactional DDL

Community
* Mailing lists & IRC
* Asking questions
* Modules, add-ons, tools

Operations
* System and hardware
* Installation and configuration
* Maintenance and operation
* Replication

Our plan is to provide students with login access to a shared database. During the course, I’ll be available to answer questions and I’m considering making short videos to go along with the course material.

We haven’t set the price for it just yet, but should be figuring that out in the next week or so.

Anyway, if you’re interested, sign up and you’ll get an email when we set the price. I’m happy to answer any questions you have about content.

Another thing that was requested in the Hacker News thread was more advanced material. I think the advanced material falls into two categories – PostgreSQL core functionality, and administration/tuning.

Update! Pricing is set at $325/student, with a 10% discount if you register 2 or more students at the same time.

October 5, 2010 postgres, postgresql postgres, postgresql, release, security announcement 3 Comments

PostgreSQL 9.0.1 released, includes security fix & maintenance releases for 6 other versions

The PostgreSQL Global Development group released new maintenance versions today: 9.0.1, 8.4.5, 8.3.12, 8.2.18, 8.1.22, 8.0.26 and 7.4.30. This is the final update for PostgreSQL versions 7.4 and 8.0. There’s a security issue in there involving procedural languages, and a detailed description of the vulnerability is on our wiki. A key thing to remember is that the issue primarily affects people who use SECURITY DEFINER along with a procedural language function. PL/PgSQL is not affected, but any other procedural language with a “trusted” mode is. This includes PL/Perl, PL/tcl, PL/Python (7.4 or earlier) and others. The new versions fix issues in PL/Perl and PL/tcl. A patch for PL/PHP is currently in the works.

Most developers feel that the security issue is relatively obscure. If you aren’t using a procedural language with some mechanism for altering privileges (SET ROLE or SECURITY DEFINER, for example), you aren’t vulnerable to the security issue and can upgrade Postgres during your next regularly scheduled downtime. If you *are* vulnerable, we recommend investigating the use of the functions that may be vulnerable, and taking steps to prevent their exploitation by upgrading as soon as you can.

From the FAQ:

What is the level of risk associated with this exploit?

Low. It requires all of the following:

An attacker must have an authenticated connection to the database server.

The attacker must be able to execute arbitrary statements over that connection.

The attacker must have an strong knowledge of PostgreSQL.

Your application must include procedures or functions in an external procedural language.

These functions and procedures must be executed by users with greater privileges than the attacker, using SECURITY DEFINER or SET ROLE, and using the same connection as the attacker.

This was also the first release for which I generated release notes! 😀

Here was my list of interesting changes for the announcement:

Prevent show_session_authorization() from crashing within autovacuum processes, backpatched to all supported versions;
Fix connection leak after duplicate connection name errors, fix handling of connection names longer than 62 bytes and improve contrib/dblink’s handling of tables containing dropped columns, backpatched to all supported versions;
Defend against functions returning setof record where not all the returned rows are actually of the same rowtype, backpatched to 8.0;
Fix possible duplicate scans of UNION ALL member relations, backpatched to 8.2;
Reduce PANIC to ERROR on infrequent btree failure cases, backpatched to 8.2;
Add hstore(text, text) function to contrib/hstore, to support migration away from the => operator, which was deprecated in 9.0. Function support backpatched to 8.2;
Treat exit code 128 as non-fatal on Win32, backpatched to 8.2;
Fix failure to mark cached plans as transient, causing CREATE INDEX CONCURRENTLY to not be used right away, backpatched to 8.3;
Fix evaluation of inner side of an outer join is a sub-select with non-strict expressions in its output list, backpatched to 8.4;
Allow full SSL certificate verification to succeed in the case where both host and hostaddr are specified, backpatched to 8.4;
Improve parallel restore’s ability to cope with selective restore (-L option), backpatched to 8.4 with caveats;
Fix failure of “ALTER TABLE t ADD COLUMN c serial” when done by non-owner, 9.0 only.
Several bugfixes for join removal, 9.0 only.

If you have a look at a new tool that Robert Haas and Tom Lane commited to the repo called git_changelog, you can use it to find the commit IDs for the various features (you need the whole source tree to do it :)).

You’ll find that there are a lot of commits in these sets. We haven’t had a minor release since May 2010, so they kind of added up.

Any other changes in there you think we should have mentioned in the announcement? Let me know in the comments.

Download new versions now:

September 30, 2010 postgres, postgresql 9.0, custom aggregates, features, order by, postgres, postgresql 11 Comments

Custom aggregates: a couple tips and ORDER BY in 9.0

A friend asked about a way to report the first three semesters that a group of students were documented as being present, and report those values each in a column.

The tricky thing is that the semesters students attend are rarely the same. I started out with a very naive query (and sorry for the bad formatting that follows.. i need to find some good SQL formatting markup) just to get some initial results:

select student, (SELECT semester as sem1 FROM assoc a2 WHERE a2.student IN (a1.student) ORDER BY sem1 LIMIT 1) as sem1, (SELECT semester as sem1 FROM assoc a2 WHERE a2.student IN (a1.student) ORDER BY sem1 LIMIT 1 offset 1) as sem2, (SELECT semester as sem1 FROM assoc a2 WHERE a2.student IN (a1.student) ORDER BY sem1 LIMIT 1 offset 2) as sem3 FROM assoc a1 WHERE student IN ( select student from assoc group by student HAVING count(*) > 2) GROUP BY student;

That query pretty much sucks, requiring five sequential scans of ‘assoc’:

                                     QUERY PLAN                                     
 HashAggregate  (cost=3913.13..315256.94 rows=78 width=2)
   ->  Hash Semi Join  (cost=1519.18..3718.08 rows=78017 width=2)
         Hash Cond: (a1.student = assoc.student)
         ->  Seq Scan on assoc a1  (cost=0.00..1126.17 rows=78017 width=2)
         ->  Hash  (cost=1518.20..1518.20 rows=78 width=32)
               ->  HashAggregate  (cost=1516.26..1517.42 rows=78 width=2)
                     Filter: (count(*) > 2)
                     ->  Seq Scan on assoc  (cost=0.00..1126.17 rows=78017 width=2)
   SubPlan 1
     ->  Limit  (cost=1326.21..1326.22 rows=1 width=3)
           ->  Sort  (cost=1326.21..1328.71 rows=1000 width=3)
                 Sort Key: a2.semester
                 ->  Seq Scan on assoc a2  (cost=0.00..1321.21 rows=1000 width=3)
                       Filter: (student = a1.student)
   SubPlan 2
     ->  Limit  (cost=1331.22..1331.22 rows=1 width=3)
           ->  Sort  (cost=1331.21..1333.71 rows=1000 width=3)
                 Sort Key: a2.semester
                 ->  Seq Scan on assoc a2  (cost=0.00..1321.21 rows=1000 width=3)
                       Filter: (student = a1.student)
   SubPlan 3
     ->  Limit  (cost=1334.14..1334.14 rows=1 width=3)
           ->  Sort  (cost=1334.14..1336.64 rows=1000 width=3)
                 Sort Key: a2.semester
                 ->  Seq Scan on assoc a2  (cost=0.00..1321.21 rows=1000 width=3)
                       Filter: (student = a1.student)

So, he reminded me about custom aggregates! I did a little searching and found an example function that I added an extra CASE statement that stops the aggregate from adding more than three items to the array returned:

CREATE FUNCTION array_append_not_null(anyarray,anyelement) RETURNS anyarray AS ' SELECT CASE WHEN $2 IS NULL THEN $1 WHEN array_upper($1, 1) > 2 THEN $1 ELSE array_append($1,$2) END ' LANGUAGE sql IMMUTABLE RETURNS NULL ON NULL INPUT;

And finally, I declared an aggregate:

CREATE AGGREGATE three_semesters_not_null ( sfunc = array_append_not_null, basetype = anyelement, stype = anyarray, initcond = '{}' );

One problem though – we want the array returned to be only the first three semesters, rather than any three semesters a student has a record for. Meaning, we need to sort the information passed to the aggregate function. We could do this inside the aggregate itself (bubble sort, anyone?) or we can presort the input! I chose presorting, to avoid writing a real ugly case statement.

My query (compatible with 8.3 or higher):

SELECT sorted.student, three_semesters_not_null(sorted.semester) FROM (SELECT student, semester from assoc order by semester ) as sorted WHERE sorted.student IN (select a.student from assoc a group by a.student HAVING count(*) > 2) GROUP BY sorted.student;

Which yields the much nicer query plan, requiring just two sequential scans:

                                      QUERY PLAN                                      
 HashAggregate  (cost=11722.96..11725.46 rows=200 width=64)
   ->  Hash Semi Join  (cost=10052.32..11570.82 rows=30427 width=64)
         Hash Cond: (assoc.student = a.student)
         ->  Sort  (cost=8533.14..8728.18 rows=78017 width=5)
               Sort Key: assoc.semester
               ->  Seq Scan on assoc  (cost=0.00..1126.17 rows=78017 width=5)
         ->  Hash  (cost=1518.20..1518.20 rows=78 width=32)
               ->  HashAggregate  (cost=1516.26..1517.42 rows=78 width=2)
                     Filter: (count(*) > 2)
                     ->  Seq Scan on assoc a  (cost=0.00..1126.17 rows=78017 width=2)

I ran my queries by Magnus, and he reminded me that what I really needed was ORDER BY in my aggregate! Fortunately, 9.0 has exactly this feature:

SELECT student, three_semesters_not_null(semester order by semester asc ) as first_three_semesters FROM assoc WHERE student IN (select student from assoc group by student HAVING count(*) > 2) GROUP BY student;

Which results in the following plan:

                                        QUERY PLAN                                        
 GroupAggregate  (cost=11125.05..11711.15 rows=78 width=5)
   ->  Sort  (cost=11125.05..11320.09 rows=78017 width=5)
         Sort Key: public.assoc.student
         ->  Hash Semi Join  (cost=1519.18..3718.08 rows=78017 width=5)
               Hash Cond: (public.assoc.student = public.assoc.student)
               ->  Seq Scan on assoc  (cost=0.00..1126.17 rows=78017 width=5)
               ->  Hash  (cost=1518.20..1518.20 rows=78 width=32)
                     ->  HashAggregate  (cost=1516.26..1517.42 rows=78 width=2)
                           Filter: (count(*) > 2)
                           ->  Seq Scan on assoc  (cost=0.00..1126.17 rows=78017 width=2)

A final alternative would be to transform the IN query into a JOIN:

SELECT a.student, three_semesters_not_null(a.semester order by a.semester asc ) as first_three_semesters FROM assoc a JOIN (select student from assoc group by student HAVING count(*) > 2) as b ON b.student = a.student GROUP BY a.student;

And the plan isn’t much different:

                                        QUERY PLAN                                        
 GroupAggregate  (cost=11125.05..11711.15 rows=78 width=5)
   ->  Sort  (cost=11125.05..11320.09 rows=78017 width=5)
         Sort Key: a.student
         ->  Hash Join  (cost=1519.18..3718.08 rows=78017 width=5)
               Hash Cond: (a.student = assoc.student)
               ->  Seq Scan on assoc a  (cost=0.00..1126.17 rows=78017 width=5)
               ->  Hash  (cost=1518.20..1518.20 rows=78 width=32)
                     ->  HashAggregate  (cost=1516.26..1517.42 rows=78 width=2)
                           Filter: (count(*) > 2)
                           ->  Seq Scan on assoc  (cost=0.00..1126.17 rows=78017 width=2)

Any other suggestions for this type of query?

I’ve attached the file I was using to test this out.
custom_aggregates.sql

August 29, 2010 postgres, postgresql hotstandby, logger, pgstandby, pg_standby, postgres, postgresql, syslog 3 Comments

Using logger with pg_standby

Piping logs to syslog is pretty useful for automating log rotation and forwarding lots of different logs to a central log server.

To that end, the command-line utility ‘logger’ is nice for piping output from utilities like pg_standby without having to add syslogging code to the utility itself. Another thing is that logger comes by default with modern packages of syslog.

Here’s an easy way to implement this:

restore_command = 'pg_standby -d -s 2 -t /pgdata/trigger /shared/wal_archive/ %f %p %r 2>&1 | logger -p local3.info -t pgstandby'

« Previous
1
2
3
4
5
…
8
Next »

Selena Deckelmann's blog about open source and working at Mozilla.

Tag Archives: postgresql