Test Your DR Plan Before It Becomes a DR Situation
We simulate full disaster scenarios — region failover, database corruption, ransomware recovery — and measure your real RTO and RPO against the figures in your compliance documents.
You might be experiencing...
Disaster recovery validation answers the question every CTO dreads: “does our DR plan actually work?” Most DR plans are written during architecture design, updated when someone remembers, and tested annually with a checkbox exercise that bears no resemblance to an actual recovery. The gap between a documented RTO and a measured RTO is almost always significant — and it surfaces at the worst possible moment.
Our validation methodology simulates complete disaster scenarios with your actual on-call team executing real recovery procedures. We introduce realistic complications that table-top exercises miss: a backup that is 6 hours older than expected, a runbook step that requires a database credential stored in the service that just failed, a region failover that takes 20 minutes longer because of an undocumented manual step. These are the gaps that matter.
The output of a DR validation engagement is a compliance-ready test record, a measured RTO/RPO report, and a prioritised gap register with remediation ownership. Teams that complete this engagement typically discover that their real RTO is 2–6x their stated figure — and that the gap can be closed in 2–3 sprints of focused remediation work.
Engagement Phases
DR Plan Review & Scenario Design
We review your existing DR documentation, RPO/RTO commitments, backup configurations, and runbooks. We design 3–5 disaster scenarios covering your highest-risk failure modes: region loss, database corruption, backup failure, and dependency outage.
Simulation Execution
We execute each DR scenario in isolation with your on-call team running the recovery. We measure time-to-detection, time-to-declare-DR, and time-to-recovery at each step. We introduce realistic complications: a backup that is older than expected, a runbook step that requires a permission nobody has.
Gap Analysis & Remediation Planning
We produce a measured RTO/RPO report comparing stated vs actual for each scenario. We identify every gap — missing runbook steps, permission gaps, untested backup paths — and produce a remediation roadmap with effort estimates.
Deliverables
Before & After
| Metric | Before | After |
|---|---|---|
| RTO | 4 hrs (stated) | 47 min (measured) |
| RPO | 1 hr (stated) | 12 min (measured) |
| DR plan gaps identified | 0 known | 8 documented |
Tools We Use
Frequently Asked Questions
Will this cause downtime in production?
DR simulations are run in isolated non-production environments by default. For organisations that want to validate production failover (required for some compliance frameworks), we design a time-windowed test during a low-traffic maintenance window with a clear abort path and rollback procedure.
Our RTO is 4 hours — is that realistic for our architecture?
That is one of the things we measure. RTO commitments are frequently set by contract negotiation, not by engineering measurement. Our simulation typically finds that stated RTO is 2–6x optimistic because it assumes clean execution of runbooks that have never been timed under pressure. Knowing your real RTO is the starting point for either improving your recovery or renegotiating your SLA.
Can this serve as our annual DR test for compliance purposes?
Yes. We produce an audit-ready test record with timestamps, scenario descriptions, measured outcomes, and identified gaps. This satisfies DR testing requirements for SOC 2, ISO 27001, and most financial services regulatory frameworks. We can tailor the evidence package to your specific compliance requirements.
Know Your Blast Radius
Book a free 30-minute resilience scope call with our chaos engineers. We review your architecture, identify your highest-risk failure modes, and recommend the experiments that will give you the most signal.
Talk to an Expert