I’m working on a data purging project for one of our clients and I’ll blog as it progresses. The first step was to find the oldest data in the database and let the application team look it over to decide what can be purged.
I came up with the handy SQL below that finds all the columns that have a Date or Timestamp and then builds a script to query for the oldest date for each of those columns.
This is the query:
db2 "select 'select min('||colname||') from '||tabschema||'.'||tabname||';' from syscat.columns where typename = 'DATE' or typename = 'TIMESTAMP' and tabschema not like 'SYS%'" > coldates.ddl
Here is an example of the output: (Table names changed to protect the innocent)
select min(SYSTEM_TRANSFER) from DB2ADMIN .ITEM_PAY;
select min(DEPARTURE) from DB2ADMIN .ITEM_PAY;
select min(DATE_ENTERED) from DB2ADMIN .AREA;
select min(CHECK_DATE) from DB2ADMIN .AREA;
select min(INV_REG_DATE) from DB2ADMIN .CUSTOM;
select min(AR_REG_DATE) from DB2ADMIN .CUSTOM;
As you can imagine this could be a very lengthy script so I ran it against a test database that was recently refreshed from production. It took several hours to run because of all the table scans on non-indexed columns. We found some ancient data from the 1st century (obvious typos turning 2010 into 0210), and some legitimate timestamps that were 10 years old.
So, now I’m waiting for the app team to decide what can be safely purged. I’ll keep you posted.