Disaster Recovery Planning
As a subset of business continuity planning, disaster recovery planning begins with a business impact analysis. The idea behind this analysis is to work out two key metrics:Design for end-to-end recovery
It isn't enough to simply have a plan for backing up or archiving your data. Make sure your disaster recovery plan addresses the full recovery process, from backups to restores to cleanup.Make your tasks specific
When it's time to run your disaster recovery plan, you don't want to be stuck guessing what each step means. Each task in our defined disaster recovery plan should consist of one or more concrete, unambiguous commands or actions.Implement control measures
Implement measures to minimize the probability of a disaster occurring. Add controls to prevent disaster events from occurring and to detect issues before they occur. For example, you could add a monitor that sends an alert when a data-destructive flow, such as a deletion pipeline, exhibits unexpected spikes or other unusual activity. This monitor could also kill the pipeline processes if a certain deletion threshold is reached, preventing a catastrophic situation.Integrate your standard security mechanisms
Ensure that the security controls that apply to your production environment are factored into your recovery plan as well.Keep your software licenses current
To avoid unpleasant surprises when executing a recovery, ensure that you are properly licensed for any software that you will be deploying as part of your disaster recovery plan. Check with the supplier of the software for guidance.Recovery Time Objective (RTO)
A recovery time objective (RTO), which is the maximum acceptable length of time that your application can be offline. This value is usually defined as part of our larger service level agreement (SLA).Recovery Point Objective
A recovery point objective (RPO), which is the maximum acceptable length of time during which data might be lost from your application due to a major incident. This metric will vary based on the ways that the data is used; for example, frequently modified user data could have an RPO of just a few minutes, whereas less critical, infrequently modified data could have an RPO of several hours. Note that this metric describes the length of time only; it does not address the amount or quality of the data lost.