Backups Are a Matter of Trust

Posted by Jonathan Coupal on December 22, 2015

So what's the point of a backup system? If not as an insurance policy, just in case? Well, a properly designed and managed backup and recovery system does a lot more than that - it provides a high level of trust. This trust in the integrity of their systems gives stakeholders an experience of freedom that they might not otherwise have. Freedom in the knowledge that the worst possible case has been taken care of and will not impact their ability to serve their customers or their employees, and will not impact their own livelihoods.

Of course, an intact backup of any system is an important way to handle the possible "worst case" outcome- the failure of one or more components within the system. It gives compliance for those who are asking for the existence of a backup system and comforts those who might be concerned about possible system failures. The real question is, how to go from setup to trust in any backup system.

First, consider the consequences of a failed backup in the case of an actual disaster, in which a business's primary system has failed. While there are a number of articles stating a high attrition rate for businesses that have a disaster without good backups in place, many, if not all, of these articles fail to provide strong primary sources. It would be reasonable to assume, however, that a business's operations and cash flow would be severely affected by any interruption in their business if there was no data available to support critical business operations, including but not limited to sales, production, and accounts receivable. Imagine the cash consequences of simply not knowing who owes you money.

Engaging the value of business continuity as a conversation with the stakeholders is a useful exercise, but not only to convince them of the investment in business continuity systems, but also to set expectations. Most non-IT professionals will simply assume that if there is a backup system in place, then all data is therefore backed up. And if no one has corrected this assumption, then they are reasonable to assume this. Correcting this assumption requires a conversation about two key metrics: RPO (Recovery Point Objective) and RTO (Recovery Time Objective).

The Recovery Point Objective revolves around a basic question: how much data can the business afford to lose before it is unable to gracefully recover? For instance, most backups run at night, which implies an RPO of up to 24 hours. In other words, if we performed a backup last night at 11:00pm then our worst case scenario is a system failure at 10:30pm tonight, resulting in approximately 24 hours of data loss. Many businesses might not be able to tolerate that much data loss across all of their systems, and would be willing to finance more aggressive scheduling of backups to reduce the 24 hours to 8 hours, or even less.

More important to discuss, however, is the Recovery Time Objective, which is more concerned about how much time it would take to bring Business systems online. If a business is only doing data backups, for instance, it might take days or even weeks to bring a critical system online, as it would require standing up new servers, installing and configuring their operating systems, installing and configuring the business software, then performing a successful restore of the data so that users can interact with it.

Many stakeholders are not clear on this process, and so may be surprised if presented with the possibility that last night's data might not be available for another week, because there are no servers available to install the software onto. Once the stakeholders have clear expectations, they can decide on how to invest in secondary systems or contracts with vendors to have emergency services made available.

Each of the steps involved in this process give the system administrator the opportunity to create the space where backups become an important part of a business continuity plan. This planning, however, needs to be supported after implementation with testing and reporting. Completing the path from setup to trust, requires that the stakeholders understand that their systems are, in fact, capable of meeting both the RPO and RTO's that were negotiated with the IT team. Follow this simple plan and the stakeholders will be happy that their business is safe:

  • Perform regular backups
  • Perform regular test data restores
  • Perform regular system restores
  • Report the results of these tests to stakeholders
  • Repeat.

Turning the recovery of critical business systems from a painful chore to a routine, creates a level of trust in the underlying systems that gives stakeholders along with everyone else the ability to use these systems in comfort. They will know that they can weather any storm that comes at the business, whether it be an actual disaster to the data center or a developer corrupting all of the business data with an accidental semicolon. Knowing that the systems have this level of comprehensive support not only gives the stakeholders the freedom to grow and manage their business, but it establishes the team responsible for managing the systems as worthy of trust and capable of supporting that growth.