My last post had some statistics for a C2100 cluster we were running. Last night I did maintenance on a cluster that is running on R710 attached via PERC6/E controllers to a MD1120 array filled with 24 300GB disks (10k 2.5″). These are split into 4 arrays with 6 disks in each setup RAID5. The gpcheckperf at the start of my recent maintenance
gpadmin@mdw:~> gpcheckperf -f hosts.seg -d /data/vol1 -d /data/vol2 -d /data/vol3 -d /data/vol4 -r d -D
disk write min bandwidth (MB/s): 888.01 [sdw14-1]
disk write max bandwidth (MB/s): 968.73 [ sdw4-1]
disk read min bandwidth (MB/s): 1592.66 [ sdw7-1]
disk read max bandwidth (MB/s): 1941.55 [sdw13-1]
one of the next things I do is take a look at disk defragmentation using “xfs_db -c frag -r /dev/X” where X is one of my four arrays. In this case I came up with about 35% fragmentation across all of our arrays.
to clean this up I do a run of xfs_fsr across the disks which got them all down to less than 1% fragmentation.
the next disk test produced similar write speeds but increased read speed
disk write min bandwidth (MB/s): 872.72 [ sdw8-1]
disk write max bandwidth (MB/s): 960.32 [sdw15-1]
disk read min bandwidth (MB/s): 1975.79 [ sdw8-1]
disk read max bandwidth (MB/s): 2052.40 [ sdw2-1]
Up until the last couple of months it was not uncommon for us to hit 80%+ fragmentation on all of our nodes in the Greenplum cluster. Our recent switch from Suse to Redhat should help fix this, there was apparently a bug fix that RHEL implements in a recent kernel release to clean this up. I’ve noticed that in this cluster fragmentation can have a significant impact on our reported speeds. Oddly on clusters with a single controller running 12 600GB disks ( 15k 3.5″ ) split into two arrays that I see very little change in these io reports, even when stepping down from 95% fragmentation to 1%.