Yearly Archives

19 Articles

Figuring out what’s running on nodes

Posted by skahler on

I use this to figure out what’s going happening on the nodes and if I might have hanging queries on one of them

gpssh -f hosts.seg
=> ps -ef | grep postgres | grep con | awk '{print $12}' | sort | uniq -c; echo "===="
[node02]       8 con5397
[node02]      12 con5769
[node02]       9 con5782
[node02]       4 con5989
[node02] ====
[node04]       8 con5397
[node04]      12 con5769
[node04]       8 con5782
[node04]       4 con5989
[node04] ====
[node03]       8 con5397
[node03]      12 con5769
[node03]       8 con5782
[node03]       4 con5989
[node03] ====
[node01]       8 con5397
[node01]      12 con5769
[node01]       8 con5782
[node01]       4 con5989
[node01] ====

GP out

Posted by skahler on

GP 4.05.4 just rolled out and it looks like it addresses some bugs we’ve hit. We had upgraded from to and had to revert back because we started seeing segments popping offline. The README notes for the new release says it addresses this.


Greenplum HD Announced

Posted by skahler on

Reading the news on the Greenplum HD announcement. I find it especially interesting because one of the main reasons I had an initial flurry of posts here and then trailed off was that I got heavily involved in our Hadoop installation and restructuring it. We’re currently using Cloudera‘s Hadoop packages and the way they handle distribution their software is about as good as you can get. I’m interested to see how Greenplum’s version of the software works. I’d heard talk of a couple of Map Reduce implementations at startups that were seeing impressive performance improvements. In a large enterprise there is definitely a place for both Hadoop and an MPP database and the trick is getting them to share data easily, which is why I was very impressed to see the 4.1 version of Greenplum with the ability to read from HDFS.

The big question is how well is Greenplum going to be able to support the release going forward. Greenplum is based off an older version of Postgres and I get a monthly question from someone about some feature that is in a later version of Postgres that doesn’t seem to be in Greenplum. Is their Hadoop implementation going to get the same treatment? Will Greenplum be able to keep up with the frequent changes to the Hadoop codebase and keep their internal product up to date, will it even really matter?

One extremely interesting thing we should see over the next year is a push on how to integrate EMC SAN architecture into both Greenplum and Hadoop. The old pre-EMC Greenplum sounded much like what I’ve heard from my Cloudera interactions, “Begrudgingly we see a use case for SAN storage, but might I suggest instead you cut off your left arm and beat yourself to death with it first.” I realize we’re working with really big data here so looking at SAN storage seems insane at first. Once you get into managing site to site interactions, non-interruptive backups and attempting to keep consistent IO through put across dozens if not hundreds of not only servers but different generations of servers, you can see the play.

I’m looking really long look at Flume right now and that’s a key feature that Cloudera implementation will have over what I’ve seen from Greenplum. The fault tolerant Name Node and Job Tracker look interesting but I don’t see these as very high risks in the current Hadoop system and as I understand they are already in the process of being addressed in core Hadoop. Performance promises are “meh”, I don’t take any bullet point that says X times speedup seriously. It could be true, but you really need a good whitepaper to backup a speed improvement boast.

So for me the jury is still out. It looks cool and I can’t wait to actually use it, but I said the same thing about Chorus a year ago and we still haven’t been approached with a production ready version of it.

Controller Setting for Greenplum

Posted by skahler on

I brought another node into one of our clusters yesterday and it made me things of the controller setting I put on the system. In various systems we’ve used the PERC6/E, H700 and LSI-9260-8i controllers and I’ve found I on all of them I see best disk performance and reliability if I set

  • Read Policy: No Read Ahead – Having the controller do read ahead dramatically increases the io done on my servers and I’ve seen no benefit
  • Write Policy: Force Write Back – This is playing with a little bit of fire because I’m telling the server that even if the battery isn’t full or it’s going through a charging cycle to go ahead use the battery backed write cache. The fact that Greenplum data is duplicated on another server gets me past the small amount of edge cases where the server will be without power long enough that the lack of juice in the battery is going to come into play. The issue is that when the controller goes to charge the batter it will stop using the cache and force everything to write to disk. This has a huge impact on io speed and cause the whole cluster to grind to a halt while the one server struggles with io.

What had started the need for me to bring this other node into our cluster is that every outage I do a io check on the clusters using gpcheckperf and I see one array is under performing all the others

disk write bandwidth (MB/s): 620.81 [sdw11]
disk write bandwidth (MB/s): 365.38 [sdw09]
disk write bandwidth (MB/s): 621.01 [sdw08]

It’s an issue we’ve had before where one disk in the array starts to under perform but doesn’t fail out. At this point I’ll need to go in and break the RAID5 array into direct access for each disk individually and run benchmarks against them to see if I can figure out who the bad boy is and eject him from class.

4.1 Looks Sweet

Posted by skahler on

Just downloaded the 4.1 Release Notes and looking them over. There are a ton of things in there we could make immediate use of:

    Enhanced Workload Management: Statement-Level Memory

    Tight Hadoop Integration

    Simpler, Faster Installation: Parallel Installation Utility

    Better In Database Analytics

    Enhanced Support of Joins for UPDATE and DELETE

    Ordered Aggregates and Ordered Reduce Specifications in
    Greenplum MapReduce

    Transaction Performance Enhancement

    Enhanced gpload Performance to Avoid Catalog Bloat

    Online Segment Rebalancing Following a Recovery

    Performance Monitor Enhancements

    PostgreSQL 9.0 Client Compatibility

    Oracle Unbreakable Linux 5.5 Support

GP is out

Posted by skahler on

Greenplum rolled out this last week and fixes many network communication and Append Only table issues that we’ve been running into. The caveat here is that GP has had some additional things crop up so I would recommend waiting until is released before making any updates.

GPDB Release Notes

standby masters are good

Posted by skahler on

We had a meeting last week where we were talking about Greenplum and we got on the topic of the standby masters. Which are really just sitting out there pulling transaction logs from the active master and hanging around. The consensus was that the need for the standby masters would be very few and far between and we haven’t needed them for the entire time we’ve been running GP thus far. That was last week. This week we’ve had two cases where we have needed the standby masters. One was a hardware issue with the 10G nic in the master of one cluster and the other was a table issue during recovery of another cluster. So yeah, the standby masters are needed.

Greenplum plugin accepted

Posted by skahler on

My Greenplum plugin was accepted, it can be found here. It’s not anything all the special yet. Written in perl it relies on DBD::Pg to do most the heavy lifting. Currently it has four basic functions

1) DB login check. This is a very simple check just sees if it can make an actual connection to the GPDB. There are instances where using a tcp port check you could see the that it’s up but the login for your user is prevented. A good a example of this would be removing remote access from pg_hba.conf in order to do maintenance and then forgetting to enable remote users and/or access from specific network locations.

2) SELECT test. This test goes in and does a SELECT COUNT(1), gp_segment_id FROM schema.table GROUP BY gp_segment_id. The idea behind this check is to make sure a table is responding on all segments. This could also be used as a SLA check to make sure you aren’t surpassing certain time constraints to pull results from tables. Currently I do this again a small 1000 row table I generated in our systems.

3) WRITE test. Here the plugin logs into GPDB and attempts to create a temp table. We’ve had instances where GPDB has been up and you are able to log in and do select queries yet any query that requires a write just hangs. This check it to make sure that issue does not come up. I have yet to test on a 3.x system if this check will fail should the system go into “read-only” mode.

4) A very simple segment status check. This is the base for a more extensive check that I will build up. Currently it goes into gp_configuration or gp_segment_configuration depending upon your choice of 3.x or 4.x and looks to make sure all the segments are online. It sends back a crit status if any segments show offline. I plan to do a lot of tweaking to this test in the near future. Being able to specific a number of segments online warn and crit threshold as well as checking to see if any two segments containing the same content are down.

The timeout on all the test are configurable and default to 300 seconds. I wouldn’t suggest setting up any of these test to repeat more often than 5 minutes, except for possibly the login test. Doing something like forcing you GP cluster to do a select test against a multi-TB table every minute would probably be a bad idea.

On our system our current check setup look like this:

This being the first plugin I’ve submitted feedback on code cleanup or different additions that should be incorporated into a Greenplum plugin would be appreciated.


A nagios check to go in and check various Greenplum availability pieces


-t, --timeout Plugin timeout in seconds [default=300]
-U, --username Username to connect with (mandatory)
-P, --password Password to connect with (mandatory)
-H, --dbhost Database Hostname to connect to (mandatory)
-D, --db Database to connect to (mandatory)

Check to see if Greenplum accepts a connection
This is the default check


Check to see if getting data from a table works
The check executes a SELECT count by segment_id query
for the specified table. This could also be used to
setup SLA checks for getting data from the db

--select-schema Schema for Select check (mandatory)
--select-table Table for Select check (mandatory)

Check to see if table creation works
The check creates a temp table with and id(int) and
vlas(char) columns and set the distributed by to id.
This helps to monitor if the catalog queries and table
creation are happening in a reasonable amount of table

--create-table Table for Create check (mandatory)

Check to see if GP considers any segments offline
Query the gp_configuration (3.x) and gp_segment_configuration (4.x)
tables to see if any segments are marked down at the master level.
Currently this will crit if any show down.


plugin action

Posted by skahler on

Just uploaded a Greenplum plugin to the Nagios plugin exchange. It’s my first go at an official Nagios plugin. I’ve created checks for a simple login, reading from a table, writing to a temp table and checking for segment status in 3.x and 4.x installs. Once it’s accepted I’ll publish links here.

move data from db to db

Posted by skahler on

We had two Greenplum instances running and we needed to copy one big table to the other. There’s a variety of ways to dump and import but we were moving a large amount of data, so the thought of dropping it to a local file was not a very good option.

One of the ways to move a table from one instance to the other using COPY commands. If you are on the destination database you would use:

psql -h remotegreenplumtopulldatafrom -U me -c "COPY myschema.mytable to STDOUT" | psql -c "COPY myschema.mytable FROM STDIN"

We were going to move a lot of data though. So I came up with a quick script to do it a partition at a time

# database_dump
# Simple program do go out and copy dated partitions of
# a greenplum db to another db where the table exists
# it starts at start_time and goes incrementally backwards
# a day at a time until it gets to end_time
# 2010-03-16 SHK

use strict;
use warnings;
use POSIX;
use Time::Local;

# Fields are S,M,H,D,M,Y - Note month is 0-11
my $start_time = timelocal(0,0,0,1,2,2011);
my $end_time = timelocal(0,0,0,1,2,2010);
my $cmd;

my $running_date = $start_time;
until ( $running_date < $end_time ) { my $part_date = strftime("%Y%m%d",localtime($running_date)); $running_date -= 86400 $cmd = qq~psql -h remotegreenplumtopullfrom -U skahler -c "COPY myschema.mytable_1_prt_$partdate to STDOUT" | psql -c "COPY myschema.mytable FROM STDIN"~; print "$cmd\n"; system($cmd); } exit;

In our case the new db didn't have much action going on while I was moving the data in. I think if it was an active table that I was moving things into I'd create create a table on the destination target, load the data into it and then exchange that partition into the my target table.