4.1 Looks Sweet

Just downloaded the 4.1 Release Notes and looking them over. There are a ton of things in there we could make immediate use of:

    Enhanced Workload Management: Statement-Level Memory

    Tight Hadoop Integration

    Simpler, Faster Installation: Parallel Installation Utility

    Better In Database Analytics

    Enhanced Support of Joins for UPDATE and DELETE

    Ordered Aggregates and Ordered Reduce Specifications in
    Greenplum MapReduce

    Transaction Performance Enhancement

    Enhanced gpload Performance to Avoid Catalog Bloat

    Online Segment Rebalancing Following a Recovery

    Performance Monitor Enhancements

    PostgreSQL 9.0 Client Compatibility

    Oracle Unbreakable Linux 5.5 Support

Posted in Version Release | Tagged , | Leave a comment

GP is out

Greenplum rolled out this last week and fixes many network communication and Append Only table issues that we’ve been running into. The caveat here is that GP has had some additional things crop up so I would recommend waiting until is released before making any updates.

GPDB Release Notes

Posted in Uncategorized | Leave a comment

standby masters are good

We had a meeting last week where we were talking about Greenplum and we got on the topic of the standby masters. Which are really just sitting out there pulling transaction logs from the active master and hanging around. The consensus was that the need for the standby masters would be very few and far between and we haven’t needed them for the entire time we’ve been running GP thus far. That was last week. This week we’ve had two cases where we have needed the standby masters. One was a hardware issue with the 10G nic in the master of one cluster and the other was a table issue during recovery of another cluster. So yeah, the standby masters are needed.

Posted in Disaster Recovery | Tagged | Leave a comment

Greenplum plugin accepted

My Greenplum plugin was accepted, it can be found here. It’s not anything all the special yet. Written in perl it relies on DBD::Pg to do most the heavy lifting. Currently it has four basic functions

1) DB login check. This is a very simple check just sees if it can make an actual connection to the GPDB. There are instances where using a tcp port check you could see the that it’s up but the login for your user is prevented. A good a example of this would be removing remote access from pg_hba.conf in order to do maintenance and then forgetting to enable remote users and/or access from specific network locations.

2) SELECT test. This test goes in and does a SELECT COUNT(1), gp_segment_id FROM schema.table GROUP BY gp_segment_id. The idea behind this check is to make sure a table is responding on all segments. This could also be used as a SLA check to make sure you aren’t surpassing certain time constraints to pull results from tables. Currently I do this again a small 1000 row table I generated in our systems.

3) WRITE test. Here the plugin logs into GPDB and attempts to create a temp table. We’ve had instances where GPDB has been up and you are able to log in and do select queries yet any query that requires a write just hangs. This check it to make sure that issue does not come up. I have yet to test on a 3.x system if this check will fail should the system go into “read-only” mode.

4) A very simple segment status check. This is the base for a more extensive check that I will build up. Currently it goes into gp_configuration or gp_segment_configuration depending upon your choice of 3.x or 4.x and looks to make sure all the segments are online. It sends back a crit status if any segments show offline. I plan to do a lot of tweaking to this test in the near future. Being able to specific a number of segments online warn and crit threshold as well as checking to see if any two segments containing the same content are down.

The timeout on all the test are configurable and default to 300 seconds. I wouldn’t suggest setting up any of these test to repeat more often than 5 minutes, except for possibly the login test. Doing something like forcing you GP cluster to do a select test against a multi-TB table every minute would probably be a bad idea.

On our system our current check setup look like this:

This being the first plugin I’ve submitted feedback on code cleanup or different additions that should be incorporated into a Greenplum plugin would be appreciated.


A nagios check to go in and check various Greenplum availability pieces


-t, --timeout Plugin timeout in seconds [default=300]
-U, --username Username to connect with (mandatory)
-P, --password Password to connect with (mandatory)
-H, --dbhost Database Hostname to connect to (mandatory)
-D, --db Database to connect to (mandatory)

Check to see if Greenplum accepts a connection
This is the default check


Check to see if getting data from a table works
The check executes a SELECT count by segment_id query
for the specified table. This could also be used to
setup SLA checks for getting data from the db

--select-schema Schema for Select check (mandatory)
--select-table Table for Select check (mandatory)

Check to see if table creation works
The check creates a temp table with and id(int) and
vlas(char) columns and set the distributed by to id.
This helps to monitor if the catalog queries and table
creation are happening in a reasonable amount of table

--create-table Table for Create check (mandatory)

Check to see if GP considers any segments offline
Query the gp_configuration (3.x) and gp_segment_configuration (4.x)
tables to see if any segments are marked down at the master level.
Currently this will crit if any show down.


Posted in Monitoring | Tagged | Leave a comment

plugin action

Just uploaded a Greenplum plugin to the Nagios plugin exchange. It’s my first go at an official Nagios plugin. I’ve created checks for a simple login, reading from a table, writing to a temp table and checking for segment status in 3.x and 4.x installs. Once it’s accepted I’ll publish links here.

Posted in Uncategorized | 2 Comments

move data from db to db

We had two Greenplum instances running and we needed to copy one big table to the other. There’s a variety of ways to dump and import but we were moving a large amount of data, so the thought of dropping it to a local file was not a very good option.

One of the ways to move a table from one instance to the other using COPY commands. If you are on the destination database you would use:

psql -h remotegreenplumtopulldatafrom -U me -c "COPY myschema.mytable to STDOUT" | psql -c "COPY myschema.mytable FROM STDIN"

We were going to move a lot of data though. So I came up with a quick script to do it a partition at a time

# database_dump
# Simple program do go out and copy dated partitions of
# a greenplum db to another db where the table exists
# it starts at start_time and goes incrementally backwards
# a day at a time until it gets to end_time
# 2010-03-16 SHK

use strict;
use warnings;
use POSIX;
use Time::Local;

# Fields are S,M,H,D,M,Y - Note month is 0-11
my $start_time = timelocal(0,0,0,1,2,2011);
my $end_time = timelocal(0,0,0,1,2,2010);
my $cmd;

my $running_date = $start_time;
until ( $running_date < $end_time ) { my $part_date = strftime("%Y%m%d",localtime($running_date)); $running_date -= 86400 $cmd = qq~psql -h remotegreenplumtopullfrom -U skahler -c "COPY myschema.mytable_1_prt_$partdate to STDOUT" | psql -c "COPY myschema.mytable FROM STDIN"~; print "$cmd\n"; system($cmd); } exit;

In our case the new db didn't have much action going on while I was moving the data in. I think if it was an active table that I was moving things into I'd create create a table on the destination target, load the data into it and then exchange that partition into the my target table.

Posted in Uncategorized | Leave a comment

gp_segment_id helps find skew

Finding table skew in Greenplum is very important. If you end up with a bad distribution of records across segments you end up with one node doing much more work than the others. Unfortunately Greenplum firmly supports No Child Left Behind, it will only be as fast as it’s slowest member. Thus it is extremely important to have an even distribution across all the segments. A good way to check this is to use the hidden column gp_segment_id. A simple count query with this parameter will let you know how well your data is spread across nodes.

So I’m pulling a sample of 500 tuples from an example advertising impression data set.

First I think I’ll distribute it on the business unit. The results are:

db=# select gp_segment_id, count(1) from skahler.demo group by gp_segment_id;
gp_segment_id | count
5 | 500
(1 row)

Ouch, that was not the right choice. Obviously the set of data I got is all from one unit. So let’s try something different. How about distributing by the campaign they are associate with.

db=# select gp_segment_id, count(1) from skahler.demo group by gp_segment_id;
gp_segment_id | count
1 | 43
7 | 33
0 | 58
5 | 54
4 | 39
6 | 46
3 | 50
2 | 177
(8 rows)

A little bit better, but that doesn’t work so well either. The data is spread across the segments but segment 2 is holding much more data than everybody else. This will make for some hot spotting when I query the data. Next up maybe I can try by ip.

db=# select gp_segment_id, count(1) from skahler.demo group by gp_segment_id;
gp_segment_id | count
7 | 62
4 | 68
1 | 56
6 | 63
3 | 64
0 | 66
5 | 58
2 | 63
(8 rows)

Ah, that looks much better. Of course that this could be skewed in this dataset if we were heavily from a certain country or data came much more often from specific network segment. So I would need to continue to watch it and see if skew develops over time. It looks like this will work for now.

Posted in Uncategorized | 1 Comment

Game over try again

Tonights Greenplum error was

ERROR: could not find segment file to use for inserting into relation table (64749). (appendonlywriter.c:569) SQL state: XX000

Which essentially means game over. Dump your table and recreate it, because you won’t be able to put any more data into it. Luckily you can still pull it out. This seems to only happen to append only compressed tables in both 3.x and 4.x. It’s supposed to be fixed in an upcoming patch release. It’s still enough to make George Bush sad.

Posted in Errors | Tagged , , | Leave a comment

Getting Distinct counts on keys

It seems that someone always wants to know how much of X do I have in the db

Good Idea:

SELECT COUNT(1) from ( SELECT distribkeyvalue FROM db.really_big_table GROUP BY distribkeyvalue) a

Bad Idea:

SELECT COUNT(DISTINCT distribkeyvalue ) FROM db.really_big_table

In the first case the Greenplum optimizer will realize that it can do all of the work on the nodes and just forward the final counts to be aggregated where in the second case it’s going to try to bring all the data back to a central location in order to attempt establish uniqueness in the dataset, ouch.

Posted in Uncategorized | 1 Comment

Disk performance and disk fragmentation

My last post had some statistics for a C2100 cluster we were running. Last night I did maintenance on a cluster that is running on R710 attached via PERC6/E controllers to a MD1120 array filled with 24 300GB disks (10k 2.5″). These are split into 4 arrays with 6 disks in each setup RAID5. The gpcheckperf at the start of my recent maintenance

gpadmin@mdw:~> gpcheckperf -f hosts.seg -d /data/vol1 -d /data/vol2 -d /data/vol3 -d /data/vol4 -r d -D

disk write min bandwidth (MB/s): 888.01 [sdw14-1]
disk write max bandwidth (MB/s): 968.73 [ sdw4-1]

disk read min bandwidth (MB/s): 1592.66 [ sdw7-1]
disk read max bandwidth (MB/s): 1941.55 [sdw13-1]

one of the next things I do is take a look at disk defragmentation using “xfs_db -c frag -r /dev/X” where X is one of my four arrays. In this case I came up with about 35% fragmentation across all of our arrays.

to clean this up I do a run of xfs_fsr across the disks which got them all down to less than 1% fragmentation.

the next disk test produced similar write speeds but increased read speed

disk write min bandwidth (MB/s): 872.72 [ sdw8-1]
disk write max bandwidth (MB/s): 960.32 [sdw15-1]

disk read min bandwidth (MB/s): 1975.79 [ sdw8-1]
disk read max bandwidth (MB/s): 2052.40 [ sdw2-1]

Up until the last couple of months it was not uncommon for us to hit 80%+ fragmentation on all of our nodes in the Greenplum cluster. Our recent switch from Suse to Redhat should help fix this, there was apparently a bug fix that RHEL implements in a recent kernel release to clean this up. I’ve noticed that in this cluster fragmentation can have a significant impact on our reported speeds. Oddly on clusters with a single controller running 12 600GB disks ( 15k 3.5″ ) split into two arrays that I see very little change in these io reports, even when stepping down from 95% fragmentation to 1%.

Posted in Benchmarking | Tagged , | Leave a comment