If you are a Greenplum admin you have probably written more than a few scripts to help you keep your tables analyzed. That or you frequently find yourself harassing users to keep their table stats fresh, since having accurate stats is one of the best ways to keep you cluster running in tip top shape. I don’t know about you, but for me this was not one of my favorite things spend my time on, so I love one of the tools Pivotal rolled out recently: analyzedb
analyzedb is a command line tool which analyzes tables concurrently and will capture current metadata for AO tables. Thus on subsequent runs it will refer to this data and only analyze those tables that have been modified since the last analysis. It can be run at a DB, schema and table level and when targeting tables you can even include or exclude specific columns in columnar tables. Fantastic. The parallelism level can be cranked from 1 to 10 depending on how crazy you want to get with you concurrent analyze sessions. I wouldn’t recommend turning it up past the default of 5 unless you are confident your system can handle it.
This is a major time saving tool that replaces a whole list of batch scripts I used previously.