Disk performance and disk fragmentation

My last post had some statistics for a C2100 cluster we were running. Last night I did maintenance on a cluster that is running on R710 attached via PERC6/E controllers to a MD1120 array filled with 24 300GB disks (10k 2.5″). These are split into 4 arrays with 6 disks in each setup RAID5. The gpcheckperf at the start of my recent maintenance

gpadmin@mdw:~> gpcheckperf -f hosts.seg -d /data/vol1 -d /data/vol2 -d /data/vol3 -d /data/vol4 -r d -D

disk write min bandwidth (MB/s): 888.01 [sdw14-1]
disk write max bandwidth (MB/s): 968.73 [ sdw4-1]

disk read min bandwidth (MB/s): 1592.66 [ sdw7-1]
disk read max bandwidth (MB/s): 1941.55 [sdw13-1]

one of the next things I do is take a look at disk defragmentation using “xfs_db -c frag -r /dev/X” where X is one of my four arrays. In this case I came up with about 35% fragmentation across all of our arrays.

to clean this up I do a run of xfs_fsr across the disks which got them all down to less than 1% fragmentation.

the next disk test produced similar write speeds but increased read speed

disk write min bandwidth (MB/s): 872.72 [ sdw8-1]
disk write max bandwidth (MB/s): 960.32 [sdw15-1]

disk read min bandwidth (MB/s): 1975.79 [ sdw8-1]
disk read max bandwidth (MB/s): 2052.40 [ sdw2-1]

Up until the last couple of months it was not uncommon for us to hit 80%+ fragmentation on all of our nodes in the Greenplum cluster. Our recent switch from Suse to Redhat should help fix this, there was apparently a bug fix that RHEL implements in a recent kernel release to clean this up. I’ve noticed that in this cluster fragmentation can have a significant impact on our reported speeds. Oddly on clusters with a single controller running 12 600GB disks ( 15k 3.5″ ) split into two arrays that I see very little change in these io reports, even when stepping down from 95% fragmentation to 1%.


What kind of disk performance does your GP see?

During our regular maintenance widows I run a gpcheckperf to see where our disk speeds in the Greenplum cluster are coming in. This is a result from an C2100 with a single LSI 9260-8i  controller. There are two virtual disk composed of 6 disks each arranged in a RAID5. For the file system I’m using xfs with the mount options: logbufs=8, logbsize=256k, noatime, attr2, nobarrier and seeing these results.

/usr/local/greenplum-db/./bin/gpcheckperf -f /data/gpadmin/hosts.seg -d /data/gpdb_p1 -d /data/gpdb_p2 -r d -D

disk write min bandwidth (MB/s): 945.25 [sdw15]
disk write max bandwidth (MB/s): 1007.74 [sdw13]

disk read min bandwidth (MB/s): 1239.10 [sdw15]
disk read max bandwidth (MB/s): 1691.65 [sdw12]

Are these similar number to what you are getting in your clusters?


Adding plperlu Language to Greenplum on RHEL5

In order to get plperlu added as a language on our Greenplum 4.0RHEL5 cluster I had to take a couple additional steps. My first unhappy message was

db=# CREATE LANGUAGE plperlu;
ERROR:  could not load library “/usr/local/greenplum-db-”: cannot open shared object file: No such file or directory
db=# CREATE LANGUAGE plperlu;ERROR:  could not load library “/usr/local/greenplum-db-”: cannot open shared object file: No such file or directory

So greenplum can’t find

[gpadmin@mdw ~]$ locate


Looks like it’s on the system so I just need to make it available

[root@mdw ~]# echo “/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/” > /etc/

[root@mdw ~]# ldconfig

Next I went out and did the same thing on all the nodes using gpssh. Then it’s back to the master and we try to create the language again

db=# CREATE LANGUAGE plperlu;


Time: 892.420 ms

I should note that I tried to install plperl on the SLES11 servers we had and ran into an issue because the version of perl on SLES11 is 5.10 and unfortunately the within Greenplum’s distro is looking for 5.8. I didn’t look for a fix because I knew we were going to jump to Redhat in the near future and it would work there.

Alpine Miner

Alpine Miner First Look

Downloaded the new Alpine Miner off of the the Greenplum Community site. Unfortunately for me there is only a Mac and Windows version so I had to fire up a VM to try it out.


Adding Documentation

Doing the most exciting of all tasks, I’m adding documentation to the site.

Greenplum Command references at : GP COMMANDS

Greenplum Documentation at: DOCS

I find myself constantly sharing my latest GP Admin guide with other people so I figured this would be a great spot to drop the pdf documentation that you’ll find in the release packages. It would also be handy to keep a running log the various documentation attached with each release.

Additionally I’m using <command> -? constantly to get the help from the program I want to run. So I put these up on the site too in order to make it easier to get at those outputs.


More Info Soon

Gearing up and planning to put this site into action. The goal to be a repository for Greenplum ideas and tips submitted and maintained by people using the system.