- listening to carla gomes talk about computational sustainability – aim to apply tech from compsci to sustainability research. #ghc10 #
- Big challenge: establish interdisciplinary relationships and collaborate across fields (computational sustainability) #ghc10 #
- loving carla gomes' use of smilie and frownie faces on slides about "diffusion process as local stochastic activation rules" #ghc10 #
- Referencing this paper: http://bit.ly/aF5BbH in solving problem of land buying to increase population of endangered birds #ghc10 #
- Building poverty maps is similar to species maps, and modeling process of migration – influencing cascades, picking best strategy #ghc10 #
- Leadership: always think about how to generalize what you are doing. -Carla Gomes #ghc10 #
- Imposter syndrome thought to be esp common wih women, but found to occur equally with men. #ghc10 #
- "'You have just processed a petabyte of data.' Oops." #ghc10 #mapreduce #
- PSA: biotch-wings http://www.flickr.com/photos/gailcarmichael/5041609463/in/set-72157624932418433/ #ghc10 #
- In a talk on mentoring challenges and they are acting out a skit. Awesome!! #ghc10 #
- OH: you should focus on the lack of spatial awareness as a commonality #ghc10 #
- Enjoying the #ghc10 afterparty #
Thoughts on Grace Hopper

I’ve been at Grace Hopper Celebration of Women in Computing for the past two days – soaking in the presence of over 2000 women in computing at a sprawling conference here in Atlanta.
The interesting thing about this conference is how much the same it feels to me as any other large conference I attend, and a couple small ways that it is very different. I realized while I was here how I have spent the last few years surrounding myself with accomplished, amazing women like Jen Redman, Leslie Hawthorn, Claire McCabe and Sarah Sharp. What’s funny is that we’re connected by Portland (although Claire is down in Oakland… for now…), and we’re all at Grace Hopper this week. They, among many others, made me feel right at home.
I feel the dislocation of being at a conference comprised 95% (or more) of women. There’s an odd politeness that I’m not used to. There are a lot of people who are in academia or industry who wear suits and use words like ‘leverage’ without irony. There were tons of students – over 900 of them, and an incredible job fair. And I was shocked at the number of people who asked me: What exactly is free and open source software?
As congratulatory as those of us who are “in” the free software world about having essentially won out over proprietary software, there is a huge, mainstream portion of the computing world who are not aware. I’m not saying that a person needs to understand the minutia of license differences, or have even read one. But wow, there is an incredible missed opportunity when a computer science student can graduate without knowing what open source even *is*.
So, congratulations to the women who put the first ever Open Source Track at Grace Hopper together: Jen Redman, Cat Allman, Sandra Covington, Sara Ford, Jenny Han Donnelly, Leslie Hawthorn, Avni Khatri, Stormy Peters, Hilary Pike, and Natalia Vinnik. I was very happy to participate in the “getting started in open source” panel. And many thanks to the NSA for sponsoring the hackathon with Sahana, a very worthy project, and one that I hope is infused with new excitement and contribution from the 200 people who signed up to participate. I hear that we’ll be having a hackathon again next year in Portland — when Grace Hopper comes to our very own city!
twittering on 2010-10-01
- Experiencing opening remarks – giant video! http://twitpic.com/2tbt7k #ghc10 #
- More than 2000 people attending #ghc10 #
- OH: "All day long, I'm surrounded by men. And I get tired of looking at them." -Duy-Loan T. Le #
- Duy-Loan T. Le at http://twitpic.com/2tc448 #ghc10 #
- "Relationship building requires face-to-face connection." #ghc10 #preachit #
- "what is considered excellence in one culture doesn't necessarily translate into another culture" #ghc10 — so true in FOSS cultures #
- "plausable promise" – learn to release before things are completely done – @saraford — cool term #
- "plausable promise" – learn to release before things are completely done – @saraford — cool term #ghc10 #fosstrack #
- My business card at Grace Hopper #ghc10 http://twitpic.com/2tde6y #
- About to participate in the panel on getting started in free and open source software. #ghc10 #
- "a simple little fortran do loop. you don't know what that is either." -Carol Bartz #ghc10 #
- dude. Carol Bartz just said 'biotch' and 'biotch wings' #ghc10 #
- "You have to manage your own career… Volunteer for things." -Carol Bartz #ghc10 #
- "Don't think of your career as a ladder – ladders are very unstable." -Carol Bartz #ghc10 #
- Thanks so much to @lhawthorn for organizing the starting in FOSS panel with @PINguAR @terriko, Deb Nicholson and Greg Hislop and me! #ghc10 #
- Wow, Carol Bartz's keynote was epic. #ghc10 #
- Some Android apps caught covertly sending GPS data to advertisers arst.ch/mmq via @arstechnica #noyoudint #
- Listening to Gayatri Buragohain, the founder of http://www.fat-net.org/ talk about how she got started. She just won an award at #ghc10 #
- heartfelt speach from Tayana Etienne, who was crucial in developing NGO collaboration in Haiti after the earthquake in January. #ghc10 #
- Laura Haas now accepting an award for technical leadership.. cites collaboration, apprenticeship as the foundation of her success 🙂 #ghc10 #
- "I just think about how to get people to play with me on my next project." -Laura Haas #ghc10 #
- Omg. Dance party started at #ghc10 Headed back to the hackathon. #immanerd #
- Guess who rules? @claire_mccabe with her bringing me a glass of wine 🙂 in reply to claire_mccabe #
- Fran and others hacking http://flic.kr/p/8FrBKQ #
- Fran, louiqa and pat http://flic.kr/p/8FrJmj #
Custom aggregates: a couple tips and ORDER BY in 9.0
A friend asked about a way to report the first three semesters that a group of students were documented as being present, and report those values each in a column.
The tricky thing is that the semesters students attend are rarely the same. I started out with a very naive query (and sorry for the bad formatting that follows.. i need to find some good SQL formatting markup) just to get some initial results:
select student,
(SELECT semester as sem1 FROM assoc a2 WHERE a2.student IN (a1.student) ORDER BY sem1 LIMIT 1) as sem1,
(SELECT semester as sem1 FROM assoc a2 WHERE a2.student IN (a1.student) ORDER BY sem1 LIMIT 1 offset 1) as sem2,
(SELECT semester as sem1 FROM assoc a2 WHERE a2.student IN (a1.student) ORDER BY sem1 LIMIT 1 offset 2) as sem3
FROM assoc a1
WHERE
student IN ( select student from assoc group by student HAVING count(*) > 2)
GROUP BY student;
That query pretty much sucks, requiring five sequential scans of ‘assoc’:
QUERY PLAN HashAggregate (cost=3913.13..315256.94 rows=78 width=2) -> Hash Semi Join (cost=1519.18..3718.08 rows=78017 width=2) Hash Cond: (a1.student = assoc.student) -> Seq Scan on assoc a1 (cost=0.00..1126.17 rows=78017 width=2) -> Hash (cost=1518.20..1518.20 rows=78 width=32) -> HashAggregate (cost=1516.26..1517.42 rows=78 width=2) Filter: (count(*) > 2) -> Seq Scan on assoc (cost=0.00..1126.17 rows=78017 width=2) SubPlan 1 -> Limit (cost=1326.21..1326.22 rows=1 width=3) -> Sort (cost=1326.21..1328.71 rows=1000 width=3) Sort Key: a2.semester -> Seq Scan on assoc a2 (cost=0.00..1321.21 rows=1000 width=3) Filter: (student = a1.student) SubPlan 2 -> Limit (cost=1331.22..1331.22 rows=1 width=3) -> Sort (cost=1331.21..1333.71 rows=1000 width=3) Sort Key: a2.semester -> Seq Scan on assoc a2 (cost=0.00..1321.21 rows=1000 width=3) Filter: (student = a1.student) SubPlan 3 -> Limit (cost=1334.14..1334.14 rows=1 width=3) -> Sort (cost=1334.14..1336.64 rows=1000 width=3) Sort Key: a2.semester -> Seq Scan on assoc a2 (cost=0.00..1321.21 rows=1000 width=3) Filter: (student = a1.student)
So, he reminded me about custom aggregates! I did a little searching and found an example function that I added an extra CASE statement that stops the aggregate from adding more than three items to the array returned:
CREATE FUNCTION array_append_not_null(anyarray,anyelement)
RETURNS anyarray
AS '
SELECT CASE WHEN $2 IS NULL THEN $1 WHEN array_upper($1, 1) > 2 THEN $1 ELSE array_append($1,$2) END
'
LANGUAGE sql IMMUTABLE RETURNS NULL ON NULL INPUT;
And finally, I declared an aggregate:
CREATE AGGREGATE three_semesters_not_null (
sfunc = array_append_not_null,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
One problem though – we want the array returned to be only the first three semesters, rather than any three semesters a student has a record for. Meaning, we need to sort the information passed to the aggregate function. We could do this inside the aggregate itself (bubble sort, anyone?) or we can presort the input! I chose presorting, to avoid writing a real ugly case statement.
My query (compatible with 8.3 or higher):
SELECT sorted.student, three_semesters_not_null(sorted.semester)
FROM (SELECT student, semester from assoc order by semester ) as sorted
WHERE
sorted.student IN (select a.student from assoc a group by a.student HAVING count(*) > 2)
GROUP BY sorted.student;
Which yields the much nicer query plan, requiring just two sequential scans:
QUERY PLAN HashAggregate (cost=11722.96..11725.46 rows=200 width=64) -> Hash Semi Join (cost=10052.32..11570.82 rows=30427 width=64) Hash Cond: (assoc.student = a.student) -> Sort (cost=8533.14..8728.18 rows=78017 width=5) Sort Key: assoc.semester -> Seq Scan on assoc (cost=0.00..1126.17 rows=78017 width=5) -> Hash (cost=1518.20..1518.20 rows=78 width=32) -> HashAggregate (cost=1516.26..1517.42 rows=78 width=2) Filter: (count(*) > 2) -> Seq Scan on assoc a (cost=0.00..1126.17 rows=78017 width=2)
I ran my queries by Magnus, and he reminded me that what I really needed was ORDER BY in my aggregate! Fortunately, 9.0 has exactly this feature:
SELECT student,
three_semesters_not_null(semester order by semester asc ) as first_three_semesters
FROM assoc
WHERE student IN (select student from assoc group by student HAVING count(*) > 2)
GROUP BY student;
Which results in the following plan:
QUERY PLAN GroupAggregate (cost=11125.05..11711.15 rows=78 width=5) -> Sort (cost=11125.05..11320.09 rows=78017 width=5) Sort Key: public.assoc.student -> Hash Semi Join (cost=1519.18..3718.08 rows=78017 width=5) Hash Cond: (public.assoc.student = public.assoc.student) -> Seq Scan on assoc (cost=0.00..1126.17 rows=78017 width=5) -> Hash (cost=1518.20..1518.20 rows=78 width=32) -> HashAggregate (cost=1516.26..1517.42 rows=78 width=2) Filter: (count(*) > 2) -> Seq Scan on assoc (cost=0.00..1126.17 rows=78017 width=2)
A final alternative would be to transform the IN query into a JOIN:
SELECT a.student,
three_semesters_not_null(a.semester order by a.semester asc ) as first_three_semesters
FROM assoc a
JOIN (select student from assoc group by student HAVING count(*) > 2) as b ON b.student = a.student
GROUP BY a.student;
And the plan isn’t much different:
QUERY PLAN GroupAggregate (cost=11125.05..11711.15 rows=78 width=5) -> Sort (cost=11125.05..11320.09 rows=78017 width=5) Sort Key: a.student -> Hash Join (cost=1519.18..3718.08 rows=78017 width=5) Hash Cond: (a.student = assoc.student) -> Seq Scan on assoc a (cost=0.00..1126.17 rows=78017 width=5) -> Hash (cost=1518.20..1518.20 rows=78 width=32) -> HashAggregate (cost=1516.26..1517.42 rows=78 width=2) Filter: (count(*) > 2) -> Seq Scan on assoc (cost=0.00..1126.17 rows=78017 width=2)
Any other suggestions for this type of query?
I’ve attached the file I was using to test this out.
custom_aggregates.sql
twittering on 2010-09-30
- On the train to EWR. Had my last bike ride in brooklyn and tasty doubles from A&A for a while. #
twittering on 2010-09-29
- Moon cake http://flic.kr/p/8EP3az #
twittering on 2010-09-28
- Doubles from A & A http://flic.kr/p/8EsmJa #
- Butter (new chicken) http://flic.kr/p/8EsQdV #
- Placement http://flic.kr/p/8EyEKM #
twittering on 2010-09-28
- Doubles from A & A http://flic.kr/p/8EsmJa #
- Butter (new chicken) http://flic.kr/p/8EsQdV #
- Placement http://flic.kr/p/8EyEKM #
Last week: to Maine for a wedding
I was in Maine last week for the first time, attending the wedding of Scott’s youngest uncle Dwight, and last member of his generation of Deckelmanns to get married, and Kevan.
I took a few photos, made a couple of the family cakes (Viennese Speckled Sponge Cake), helped out with all sorts of last minute preparations and had a great time with everyone.
Twenty of us travelled to Maine for the wedding, and we all stayed in a farmhouse sitting near a pond, and overlooking an inlet leading to the ocean. Walls were paper thin, and most of us slept dormitory style, and we shared a single shower between us all. It was beautiful, and the weather was perfect for the wedding – 72F and a slight breeze.
Now I’m in NYC for a few days before heading down to Atlanta for Grace Hopper.