What should be in my IT disaster recovery plan?

What every business needs in a disaster recovery plan: essential components, testing requirements, and common mistakes that make plans fail.

centrexIT Team • February 16, 2025 • 8 min read

Key Takeaways

A disaster recovery plan defines how you restore IT systems after an incident - it's your playbook for the worst day
Two critical metrics: RTO (how fast you need systems back) and RPO (how much data you can afford to lose)
Essential components: backup strategy, communication plan, roles and responsibilities, vendor contacts, and step-by-step procedures
Plans that are never tested are plans that will fail - test at least annually with quarterly checks for critical systems
Business continuity keeps operations running during disruption; disaster recovery restores systems after - you need both

Most businesses don’t have a disaster recovery plan. Of those that do, most have never tested it. And of those that have tested it, many discovered it didn’t actually work. Here’s how to build one that does.

What Is a Disaster Recovery Plan?

A disaster recovery plan (DRP) is a documented, step-by-step playbook for restoring your IT systems after a disruptive event. It answers a simple question: When something goes terribly wrong, what exactly do we do?

“Something terribly wrong” can mean many things:

Ransomware encrypts all your files and servers
A fire or flood damages your office and equipment
A server fails and takes your core applications offline
A cloud provider has a major outage
A disgruntled employee deletes critical data
A power surge fries your network equipment
A cyberattack compromises your systems

A disaster recovery plan doesn’t prevent these events (that’s what your cybersecurity and maintenance programs do). It ensures that when they happen — and eventually, something will happen — you can recover quickly and with minimal data loss.

Disaster Recovery vs. Business Continuity

These terms are often used interchangeably, but they’re different:

Business Continuity Plan (BCP): How your business keeps operating during a disruption. This covers operations, communications, personnel, and temporary workarounds. Example: if your office is flooded, where do employees work? How do customers reach you?

Disaster Recovery Plan (DRP): How you restore your IT systems and data after a disruption. This is the technical playbook for getting servers, applications, data, and communications back online.

You need both. Business continuity keeps the business running (even in a degraded state) while disaster recovery gets your systems restored. A DRP is typically a subset of a broader BCP.

Aspect	Business Continuity	Disaster Recovery
Focus	The entire business	IT systems and data
Goal	Keep operating	Restore systems
Timeframe	During the event	After the event
Covers	People, processes, facilities, technology	Servers, applications, data, network
Example	Employees work from home during a flood	Servers are restored from backup after a ransomware attack

The Two Most Important Metrics: RTO and RPO

Before you build your plan, you need to define two numbers for every critical system in your business:

RTO (Recovery Time Objective)

How fast do you need this system back?

RTO is the maximum amount of time you can tolerate a system being down before the impact becomes unacceptable.

Examples:

Email: RTO of 4 hours (you can survive a few hours without it, but not a day)
Point-of-sale system: RTO of 1 hour (every minute without it costs you sales)
Accounting system: RTO of 24 hours (not ideal, but you can manage for a day)
Development server: RTO of 48 hours (annoying but not business-critical)

RPO (Recovery Point Objective)

How much data can you afford to lose?

RPO is the maximum amount of data loss you can tolerate, measured in time. If your RPO is 4 hours, you’re saying “we can accept losing the last 4 hours of data.”

Examples:

Financial transactions: RPO of 15 minutes (you can’t lose sales data)
Email: RPO of 1 hour (losing an hour of email is manageable)
File shares: RPO of 24 hours (if you lose a day of file changes, you can recover)
Archive data: RPO of 1 week (changes infrequently)

Why RTO and RPO Matter

Your RTO and RPO directly determine your backup strategy, your infrastructure requirements, and your costs.

RPO	Backup Strategy Needed
Near zero	Real-time replication (most expensive)
15 minutes	Continuous data protection or frequent snapshots
1 hour	Hourly backups
4 hours	Backups every 4 hours
24 hours	Daily backups (most common for SMBs)

RTO	Infrastructure Needed
Minutes	Hot standby (duplicate systems running, ready to take over)
1-4 hours	Warm standby (systems configured, just need to be started and data loaded)
8-24 hours	Cold standby (hardware available, needs full restoration from backup)
24-72 hours	Basic backup restoration (restore to new or repaired hardware)

The faster and tighter your targets, the more it costs. That’s why it’s important to set different RTO/RPO values for different systems based on their actual business impact, rather than making everything “as fast as possible.”

Essential Components of a Disaster Recovery Plan

1. Business Impact Analysis

Before you plan how to recover, you need to understand what matters most. A business impact analysis (BIA) identifies:

Critical systems — which applications and services must be running for your business to operate?
Dependencies — what does each system depend on? (network, internet, power, other applications)
Impact of loss — what happens if each system is down for 1 hour? 4 hours? 24 hours? 1 week?
Priority order — in what sequence should systems be restored?

A practical approach for small businesses:

List every system your business uses. For each one, answer: “If this system were down right now, what would happen?” Rate the impact as Critical (business stops), High (major disruption), Medium (significant inconvenience), or Low (minor impact).

Critical and High systems get the lowest RTO/RPO targets and the most robust backup strategies. Low-impact systems can tolerate longer recovery times.

2. Backup Strategy

Your backup strategy is the foundation of disaster recovery. Without reliable backups, your plan is just a document.

The 3-2-1 Rule

At minimum, follow the 3-2-1 backup rule:

3 copies of your data (original plus two backups)
2 different types of storage media (local plus cloud, or disk plus tape)
1 copy off-site (in case your office is destroyed)

Modern Best Practice: 3-2-1-1-0

Many organizations now follow an enhanced version:

3 copies of your data
2 different storage types
1 off-site copy
1 immutable (unchangeable) copy (protects against ransomware)
0 errors (verified through regular testing)

What to Back Up

Servers (full system images, not just data)
Cloud data (Microsoft 365, Google Workspace — yes, cloud data needs backup too)
Databases
Application configurations
Network device configurations
User data and workstations (if applicable)

Backup Frequency

Data Type	Recommended Frequency
Critical databases	Every 15-60 minutes
File servers	Daily (minimum)
Email / cloud apps	Daily
Full server images	Weekly
System configurations	After every change

3. Roles and Responsibilities

When disaster strikes, everyone needs to know their role. Ambiguity during a crisis leads to wasted time, duplicated effort, and critical tasks falling through the cracks.

Key roles to define:

Disaster Recovery Coordinator — the person in charge of the overall recovery effort. This should be a specific person, not “whoever is available.” Name a primary and a backup.
IT Recovery Team — the technical people (internal or MSP) who will actually restore systems. Define who handles what: servers, network, cloud, workstations, phones.
Communications Lead — who communicates with employees, customers, vendors, and (if needed) media? What do they say? Through what channels?
Business Operations Lead — who manages the business side during the disruption? Workarounds, manual processes, customer handling.
Executive Decision Maker — who has authority to approve emergency spending, activate contracts, or make strategic decisions during the crisis?

4. Communication Plan

Communication during a disaster is critical and frequently mishandled. Your plan should address:

Internal Communication

How will you notify employees? (If email is down, you need alternatives: phone tree, text messaging, personal cell phones)
What will you tell them? (Status, expected duration, what to do, where to go)
How often will you update them? (Set a schedule: hourly during active incidents)

Customer Communication

Who contacts customers? (Sales? Customer service? Executive team?)
What do you tell them? (Honest, concise, with expected resolution time)
Through what channels? (Phone, email, social media, website banner)
When do you escalate? (At what point does a major customer get a personal call from leadership?)

Vendor Communication

Who contacts your IT provider/MSP? (Don’t assume they know — have a direct escalation path)
Who contacts your cloud vendors? (Microsoft, Google, hosting providers)
Who contacts your ISP? (If the outage is network-related)
Who contacts your insurance carrier? (Cyber insurance notification is often time-sensitive)

Pre-Build These Templates

Write communication templates before you need them. During a crisis, you don’t have time to draft carefully worded messages. Have templates ready for:

Employee notification (initial and updates)
Customer notification (initial and updates)
Social media posts
Vendor escalation requests
Insurance carrier notification

5. Vendor and Contact Information

Compile a contact list that includes:

Internal team members (personal cell phones, not just work numbers)
IT provider / MSP (escalation paths, after-hours contacts)
Internet service provider
Cloud service providers
Hardware vendors (for emergency replacement)
Cyber insurance carrier (policy number and claims contact)
Legal counsel
Key customers (who need personal notification)
Building management / facilities

Store this list in multiple locations. If it’s only on the server that just crashed, it’s useless. Keep printed copies, store it in a personal cloud account, save it on your phone.

6. Step-by-Step Recovery Procedures

For each critical system, document the specific steps to restore it. These procedures should be detailed enough that someone unfamiliar with the system could follow them in an emergency.

For each system, document:

What the system is and what it does
Where the backup data is stored and how to access it
The exact steps to restore (not “restore from backup” but specific, numbered steps)
How to verify that the restoration was successful
Who to contact if the procedure doesn’t work
Any dependencies (what must be restored first?)

Recovery order matters. You can’t restore applications before you restore the server they run on. You can’t restore the server before you restore the network. Document the correct sequence:

Network infrastructure (firewalls, switches, routers)
Core servers (Active Directory, DNS, DHCP)
Critical applications (line-of-business software, databases)
Communication systems (email, phones)
Secondary systems (file shares, printers, non-critical apps)
User workstations

7. Alternative Operations Procedures

How does the business operate while systems are being restored? Document workarounds:

If email is down — use personal email or phone for urgent communication
If phones are down — forward to cell phones, use a temporary answering service
If the office is inaccessible — where do people work? Do they have VPN and laptop access?
If the CRM is down — how do you track customer interactions temporarily?
If the payment system is down — can you take manual payments and process later?

These workarounds don’t need to be pretty. They need to keep the business running until systems are restored.

Testing Your Disaster Recovery Plan

An untested plan is not a plan. It’s a guess.

The most common disaster recovery failure is the plan that looked great on paper but fell apart in practice because nobody ever tested it. Backups were corrupt. Recovery steps were outdated. Contact information was wrong. The estimated recovery time was wildly optimistic.

Types of DR Testing

Tabletop Exercise (Quarterly)

Gather your recovery team around a conference table and walk through a scenario verbally. “The server is down. What do we do first? Who do we call? How do we restore?” This identifies gaps in the plan without actually touching any systems.

Time required: 1-2 hours Disruption: None

Checklist Walkthrough (Quarterly)

Go through the plan step by step and verify each component: Are the contact numbers still correct? Are the backup locations still accessible? Are the procedures still accurate? Is the documentation still current?

Time required: 2-4 hours Disruption: None

Backup Verification (Monthly)

Actually restore data from your backups and verify it’s complete and usable. Pick a random server or database and perform a test restore to a non-production environment.

Time required: 2-4 hours Disruption: Minimal (test environment only)

Partial Recovery Test (Semi-Annually)

Restore one or two critical systems from backup to verify the full recovery procedure works. Time the process to validate your RTO estimates.

Time required: 4-8 hours Disruption: Moderate (may need maintenance window)

Full Recovery Test (Annually)

Simulate a complete disaster scenario and restore all critical systems from backup. This is the only way to truly validate your plan and your RTO/RPO targets.

Time required: 1-2 days Disruption: Significant (requires planning and coordination)

What to Document After Every Test

What worked
What didn’t work
Actual recovery times vs. planned RTO
Data integrity verification results
Contact list accuracy
Updated procedures based on findings
Action items for improvement

Common Disaster Recovery Mistakes

1. No Plan at All

The biggest mistake is not having a documented plan. “We know what to do” is not a plan. Knowledge that exists only in someone’s head disappears when that person is on vacation, leaves the company, or is unavailable during the emergency.

2. Backup Without Testing

“We have backups” is meaningless if you’ve never restored from them. We’ve seen businesses discover during an actual disaster that their backups were incomplete, corrupted, or hadn’t been running for months. Test your backups regularly.

3. Single Point of Failure in the Plan

If only one person knows the recovery procedures, or the plan is stored only in one location, or you rely on a single vendor for everything, your plan has a single point of failure. Build redundancy into the plan itself.

4. Unrealistic RTO/RPO Targets

Setting every system to a 15-minute RTO without the budget to support it means your plan is fiction. Be honest about what you can actually achieve with your current infrastructure and budget.

5. Forgetting Cloud Services

Many businesses assume cloud services don’t need disaster recovery. They do. Microsoft 365 doesn’t guarantee your data against accidental deletion, ransomware, or malicious insiders. Back up your cloud data too.

6. Outdated Contact Information

Contact lists that haven’t been updated in two years are useless in an emergency. Review and update quarterly.

7. No Communication Plan

The technical recovery might go perfectly, but if nobody communicated with employees and customers during the outage, you’ve still failed. Communication is as important as restoration.

8. Ignoring Cyber Incidents

Many DRPs are designed around hardware failures and natural disasters but don’t address ransomware or cyberattacks. Given that ransomware is now the most common cause of extended downtime for SMBs, your plan must include cyber incident procedures — and those procedures are different from hardware recovery.

9. Never Updating the Plan

Your IT environment changes constantly: new applications, new servers, new cloud services, new employees. If your plan doesn’t change with it, it becomes increasingly irrelevant. Review and update at least annually, and after any major IT change.

Disaster Recovery and Your IT Provider

Your MSP or IT provider should be deeply involved in your disaster recovery planning. Here’s what to expect from them:

What They Should Provide

Backup management — configuring, monitoring, and testing your backups
Recovery procedures — documented, tested steps for restoring your systems
Monitoring — 24/7 monitoring that detects problems early and triggers the DR process when needed
Infrastructure — redundant systems, failover capabilities, and cloud recovery options
Testing support — helping you conduct regular DR tests and drills
Documentation — keeping technical recovery procedures current

Questions to Ask

What is our current backup strategy, and does it meet our RTO/RPO requirements?
When was the last time you tested a restore from our backups?
What happens if our primary server fails at 2 AM on a Saturday?
Do you have documented recovery procedures for our specific systems?
How do you handle a ransomware scenario versus a hardware failure?
What’s our actual recovery time based on your experience with similar environments?

Red Flags

They can’t tell you when they last tested your backups
Recovery procedures aren’t documented
There’s no after-hours support plan
They don’t differentiate between hardware failure and cyber incident recovery
Your data isn’t stored off-site or in an immutable format

The Bottom Line

A disaster recovery plan is your insurance policy for your technology. You hope you’ll never need it, but when you do, it’s the difference between a bad day and a business-ending event.

The essential elements are straightforward:

Know what matters most (business impact analysis)
Set realistic recovery targets (RTO and RPO)
Build a robust backup strategy (3-2-1 minimum)
Define who does what (roles and responsibilities)
Plan your communication (internal, customer, vendor)
Document step-by-step procedures (specific enough for anyone to follow)
Test regularly (tabletop quarterly, full test annually)
Keep it updated (review after every change and at least annually)

The businesses that recover quickly from disasters aren’t lucky. They’re prepared. Build and test your plan before you need it.

Need help building or testing your disaster recovery plan? centrexIT helps businesses design backup strategies, document recovery procedures, and conduct DR testing. Contact us to review your disaster readiness.

Have More Questions?

Our team is here to help. Whether you're evaluating IT services or have a specific question about your technology, we're happy to have a conversation.