Every database holds different business data and has its own recovery needs. Unfortunately, there is not a single solution for all databases. There are many possibilities and solutions. For each database you are responsible for supporting you should be asking yourself and the business the following questions.
Throughout this article, we will be using examples from 3 different companies.
Company A has a database that stores different versions of files generated from a CAD application. As the user saves the updated version of the file to their local PC, the application has a background process that transfers the data to the database. The application is designed to keep working even if the connection to the production database is lost. It allows the user to “check out” a drawing and does not allow other users to make modifications to the drawing until the first user has checked it back in. The application has also been configured to keep a copy of all versions created by the user on their local PC for a week. There is a second background process that runs every Sunday to remove the file history from the user’s system. The application is configure to alert the user if the files on their machines cannot be copied to the primary database, but will continue to allow the user to work on the drawings. The database is about 500 GB in size and growing at a fast rate due to the size of the files being stored in the database.
Company B has a database that stores information about park space reservations. The database is about 100 GB in size. There are transactions being executed against the majority of the tables in the database on a continual basis during working hours. Only the park and rec personnel have access to update the data but there is an external website that displays the parks available booking time so citizens can see if the park and resource are available before calling and reserving time. The data is replicated out to the external database server on with transaction replication that is delayed by 2 minutes. The park employees work Monday through Friday 7am to 6pm. A nightly report is generated and stored in SSRS for 6 months of the daily activity of reservations. It includes the park resource reserved and the person’s name and number who made the reservation.
Company C has an online sales database. It is the main money maker for the company. If the system is not available the company takes large revenue losses for every minute it is not available. There are about 10 tables that are updated continually with about 100 support tables that change once a quarter. The database is about 20GBs in size and the data is archived out of the database on a monthly basis based upon a formula of 12 months of customer history to be available in the production system. The company has implemented SQL clustering to insure high availability.
What is your time to recover the database, “Recovery Time Objective” or “RTO”?
For some systems, the longer they are down, the more money and productivity is lost by a business. Yet others have little impact to the bottom line but are still business critical. Understanding how long a database can be down while it is restoring will help you plan for what type of backups you want to put into place. It can also lead to implementing other database technologies like mirroring, clustering, and log shipping to minimize the impact to the business. But those are for a different discussion. And even if you use these database technologies, best practice is to still have database backups of the system in case of catastrophic loss.
The time needed to recover is very much a business decision that has to take into account the limitations of the technology. Most companies will say we want the database back up and running as fast as possible. But there needs to be set expectations on what can truly be achieved.
If you take a look at company A, the employees can continue to work even if the main database system goes down. The application is designed to allow remote work until it becomes available again. Only new designs and designs that need multiple employees making changes will be affected while the system is unavailable. For this company, a day or two to recover the system would have little impact on the employee’s productivity.
For Company B the impact of the system being down would not allow the employees to enter the reservations from the public for the park resources, and how the system is corrupted might impact if the public can see if a resource is available or not, but waiting a day or two to enter the reservations into the system would not be catastrophic to the organization. The worst thing that could happen is there might be conflict on reserving a resource which could be easily handled by the employees keeping track of what time the reservation request came in and it being a first come, first to reserve basis. But you definitely would want the system back up and running within a couple of hours to minimize the data entry your employees would have to complete to get the data accurate again.
For Company C every minute the system is down is costing money. This would truly be a case of having the system back up and running as fast as possible. One reason SQL clustering has been installed. Something to note is the SQL clustering will help protect you from local hardware failures, but it does not protect you from accidental and malicious data updates that corrupt your data.
What is your loss of data capability, “Recovery Point Objective” or “RPO”?
The initial answer is almost always none. But there are cases where if you lose an hour, a day or even a week’s worth of data, the business will continue to prosper. The frequency of the data updates will directly affect how you will impenitent your backup strategies. In some cases, there is a paper trail or other software that can be used to recreate the data that is lost. A company should always weigh the cost of the recreation of information to the storage and time constraints on restoring the system.
For Company A, if the system crashed on a Friday, a full database backup from Monday night could be restored, and the application would insure the data from Monday through Sunday was added to the database during the Sunday night job from the user’s machine. So, technically, this system could afford to “lose” a week’s worth of data from the production database, and the company would not experience any data loss.
Company B can afford a great loss of data in the production system. Because they have a report that is generated nightly and stored as a snapshot report in SSRS, the company would be able to recreate the reservations that were made in the last 6 months. The business need to have the system back up and running in a couple of hours overrides the 6 months loss of data is an ok requirement, but you still need to document that the data could be created if possible going back 6 months.
Company C cannot afford to lose any data in the 10 tables that keep track of the customer’s order. Any data loss could cost them not only the current revenue of the order, but possible future orders from the same customer and any of their friends they share the experience with. The backup strategy must take this business need into account.
What is the estimated size of the full backup file and how long do you need to keep the backup?
After you have some of the business questions answered on recovery time and data loss, then next step is to determine how much space will be required to meet the business requirements. The first step is to know how large your initial full backup data file will be. All databases need at least one full backup no matter if you decide to go with Transactional or Differential backups after the initial backup is taken. This process is made easier if you have an existing database but can be estimated without an existing system. Questions like how much data are you going to have to backup, and how long you are keeping the backup needs to be discussed and decided. For an existing database, using the stored procedure sp_spaceused is a good way to estimate the size of your backups. If you use sp_helpdb ‘Name of DB’ the db_size column included unused space. Backup files contain the “used” space in a database, which means if you have allocated projected growth space in the databases; the backup data files will be smaller than the actual database size.
Let’s look at an example.
|Execute sp_helpdb BackupSize
|db_size – 64300.00 MB
|database_size unallocated space
64300.00 MB 50981.12 MB
|OS file size of a full backup
The db_size from the sp_helpdb store procedure would lead a person to believe they will need a total of 64,300 MB for each full backup. But this is the total allocated space for the database taking into account a 3 year projected growth of the data. It also takes into account the size of the log file.
The sp_spaceused stored procedure comes closer to the actual file size but will normally still be larger than the actual OS file created when the backup command is executed. From the example above, one could expect the largest the OS backup file would be is 13,318 MB. But the actual data stored in the database for a full backup (data, indexes, etc as well as data in the transaction log to recover to the point in time of the backup) is only 72 MBs.
Once you have an idea of the current size of the full backup file, don’t forget to take into account the growth projection of the database. You will need to consider this amount to insure you have enough storage space available to the system until you are ready to purchase more storage. On average many companies plan to replace hardware on a 3 to 5 year schedule. So if you can plan for a 5 year growth you should be covered. Growth planning is something that should be revisited at least on a yearly basis.
Company A decided to go with a full back up every Sunday scheduled after the data load from the user’s PCs. Differently backups would be taken nightly Monday through Saturday. The business decided to have an SLA of 2 weeks of backup data retention knowing the software gives the ability to recover the data throughout the week if there is a system lost and the possibility of a corrupted backup file. The first full backup file was estimated to be 400 GB and the second was estimated at 407 GB. The differential backups were 1 GB in size.
Taking these numbers into account, the company extrapolated that in 5 years, the database full backup file would be 2220GB (7 GB x (52weeksx5year)). It was decided having this much “empty” storage space was not acceptable so instead planned for 1 year growth with purchasing hardware which could be expanded when more storage was needed. The initial allocated space for backups was set to 778 GB. This would cover 2 full backups and a total of 14 differential backup files by the end of the year.
What data is changing in the database?
Not every table in a database is expected to change on a regular basis. If you remember company C, only 10 of the 110 table are subject to data updates throughout the day. The other 100 tables are only updated on a quarterly basis. You could organize the 10 tables and indexes into a single file group. Your backup’s strategy would be to back up the file group on a regular basis of the 10 tables, and only run backups of the look up tables (located in a different file group) on a quarterly base after the new data has been loaded.
When are the Off Peak hours?
Running the backup process does not put a heavy load onto the system, but there is always some overhead. Scheduling backups to run during off peak hours is highly suggested, but is not always an option. If you take company C again, a business will have some understanding of the purchasing habits of their customers. Yet, there is always “surprise” time of heavy system load when customers are purchasing when not expected. It is just something to keep in mind. Company A and B can schedule their backups to run at night when the employees are not using the system.
The first two questions are directed at the business rules which need to be considered when planning your backup strategies. While a DBA can help facilitate the discussion with management and the data owners, it is really a business decision which needs to be made based off of recovery time and data loss. The results should be documented in a service level agreement between the departments and the IT staff to insure everyone is in agreement. The SLA should be reviewed on a regular basis to insure the business needs of the data have not changed since the implementation of the backup strategy.
The technical questions can offer many different solutions and the DBA must take the time to analyze what will work within the budget constraints (like in Company A’s desire to minimize empty storage cost) and still meet the business requirements for the backups to support the recovery plan.
These questions are just the first steps in planning your backup strategy for the databases. Further discovery, and well as examining disaster recovery situations will help you complete the process for designing a backup strategy which will meet all the business requirements.