Statistics are the most basic element of DB2 performance. One of the first questions any consultant or support person will (should!) ask you when you come to them with a query performance problem is when the last runstats was run for the tables involved.
What are Runstats?
The RUNSTATS utility collects statistical information about the data in your tables and indexes (and statistical views). This information is used by the DB2 optimizer to select the optimal access plan for every query that is run against a database. Correct statistics nearly always means a better access plan.
One of the most common performance problems is when the DB2 optimizer has chosen an access plan expecting a certain amount of data, and then gets significantly more (or less) data, overflowing memory areas or making the access plan chosen less optimal than an alternate plan.
Missing or incorrect statistics can cause overflows of various memory areas, slow query performance, excessive consumption of temporary tablespaces, and major concurrency problems with lock waits, lock timeouts, and even deadlocks.
How Statistics Should be Collected
There are a lot of options on the RUNSTATS command. The optimal RUNSTATS syntax is:
$ db2 runstats on table db2inst1.sales with distribution and detailed indexes all DB20000I The RUNSTATS command completed successfully.
Where “db2inst1.sales” is replaced with the fully qualified table name.
Distribution statistics are included by using the ‘WITH DISTRIBUTION’ clause. They are very useful for query performance, particularly if your data is not evenly distributed. Even distribution of data sometimes feels rare.
Distribution statistics can only be used where parameter markers are not used and STMT_CONC is off. Even if parameter markers are appropriately used, there are often queries that can make use of distribution statistics in one or more places.
And Detailed Indexes All
In addition to collecting data about the data in a table, RUNSTATS can collect data about the indexes. More detail is better, and except for certain types of partitioning, all indexes must be specified. Usually you cannot specify only one or a subset of indexes.
When to Collect Statistics
RUNSTATS should be done for a table any time after the data in that table has changed significantly. This includes after operations that add, update, or delete data in the table en mass. Data also changes over time, and for most active tables, runstats needs to be done either daily or weekly. Some tables that never or almost never change can get by with less frequent statistics – this is more common in data warehouses than in e-commerce or OLTP databases.
Generally, I prefer to collect full runstats on all tables either weekly or daily. This ensures that important changes in data are not missed.
Use of Automatic Statistics
DB2 offers automated facilities for collecting statistics. DB2 will then decide when to do statistics based on the rate of change for data in the table. My experience with this is mixed. Some clients have had success with it, but I’ve also run into some specific situations where DB2 did not collect statistics on a table even a week after a significant change in the data and a performance problem was quickly resolved by doing runstats on the table.
If I do have a database using automatic statistics, I also run a script once a week to gather statistics on everything, and ensure nothing is missed
Enabling Automatic Statistics
There are two portions to enabling automatic statistics. First, you have to set the appropriate database configuration parameters, then you must set define your online window in which statistics can occur.
To enable automatic statistics, the following parameters should be set to ON:
The syntax to enable these looks like this:
db2 update db cfg for SAMPLE using AUTO_MAINT ON AUTO_TBL_MAINT ON AUTO_RUNSTATS ON
Starting with DB2 9.1, all of these are on by default, so you may not have to explicitly enable them.
The other portion to using automatic statistics collection is to define an online window in which automatic statistics can be collected. By default, there is a 24/7 online window defined, so runstats can occur at any time. If you would like to limit the time when statistics may be collected, perhaps to non-peak times, you can set a specific window. This is done using the AUTOMAINT_SET_POLICYFILE and AUTOMAINT_GET_POLICYFILE stored procedures. You pass in an XML file. A sample file can be found on UNIX/Linux systems in $INSTHOME/sqllib/samples/automaintcfg
In BLU databases, enabling automatic use of real-time statistics is critical for performance, in addition to a standard runstats strategy. Real-Time statistics can be enabled by setting the DB CFG parameter AUTO_STMT_STATS to ON in addition to the other automatic statistics maintenance parameters.