I brought another node into one of our clusters yesterday and it made me things of the controller setting I put on the system. In various systems we’ve used the PERC6/E, H700 and LSI-9260-8i controllers and I’ve found I on all of them I see best disk performance and reliability if I set
- Read Policy: No Read Ahead – Having the controller do read ahead dramatically increases the io done on my servers and I’ve seen no benefit
- Write Policy: Force Write Back – This is playing with a little bit of fire because I’m telling the server that even if the battery isn’t full or it’s going through a charging cycle to go ahead use the battery backed write cache. The fact that Greenplum data is duplicated on another server gets me past the small amount of edge cases where the server will be without power long enough that the lack of juice in the battery is going to come into play. The issue is that when the controller goes to charge the batter it will stop using the cache and force everything to write to disk. This has a huge impact on io speed and cause the whole cluster to grind to a halt while the one server struggles with io.
What had started the need for me to bring this other node into our cluster is that every outage I do a io check on the clusters using gpcheckperf and I see one array is under performing all the others
disk write bandwidth (MB/s): 620.81 [sdw11]
disk write bandwidth (MB/s): 365.38 [sdw09]
disk write bandwidth (MB/s): 621.01 [sdw08]
It’s an issue we’ve had before where one disk in the array starts to under perform but doesn’t fail out. At this point I’ll need to go in and break the RAID5 array into direct access for each disk individually and run benchmarks against them to see if I can figure out who the bad boy is and eject him from class.