I wanted to share some lessons learned with AIX kernel parameters and DB2 database restore performance.
We have a client with a 21 GB database on AIX version 6.1 and DB2 9.7 fixpack 6. The server has 8 quad-core CPU and 32 GB of memory. This is a very powerful system for a small database so you would think database backups and restores would be very quick. Well, the database backup took about 5 minutes but the database restore was taking about 5 HOURS!!!
After some troubleshooting with system administrators it was obvious that there was disk contention and that jfs logging was thrashing. The first change that was made was to jfs2 logging.
The system admin pointed all jfs2 logging of the db2 filesystems (/db2data, /db2backups, /db2logs) to point to /dev/null. I then created a new folder called /db2data/tmp and made these changes to the instance owners profile.
TMP=/db2data/tmp TMPDIR=/db2data/tmp TEMP=/db2data/tmp export TMP TEMP TMPDIR
These two changes to the jfs2 logging improved the database restore from 5 hours to 2.5 hours. A big improvement but still not good enough.
The next change was to two of the Asynchrounous I/O tunable kernel parameters in AIX. MAXREQS and MAXSERVERS. They are not DB2 specific but IBM does make recommendations for DB2 server workload. Here are the definitions according to the pSeries Information Center:
Purpose: Specifies the maximum number of asynchronous I/O requests that can be outstanding at any one time.
Purpose: Specifies the maximum number of AIO kprocs per processor.
We changed MAXSERVERS from 30 to 40. This may have helped a little but it wasn’t a huge change. We also changed MAXREQS from the default of 4096 to 7936. MAXREQS should be at 256 times the number of IO_CLEANERS set in DB2. NUM_IOCLEANERS was set to AUTOMATIC and a value of 31 (remember 8×4 CPU’s). So, 256 x 31 is 7936, replacing the default of 4096.
This improved restore time to 4 minutes!!. And general database I/O is much improved. For example a large table load operation that took several hours now takes a few minutes.
I believe that tuning the AIX Asynch IO parameters and the previous changes to JFS disk logging were what improved performance. I’m very glad to have learned what a difference these settings can make with a DB2 restore and general I/O.
(Please see my follow up entry called PART 2)