Data Restore & Recovery: The Most Common Tripwires

This is a collaborative article created by Steve Dance of RiskCentric and Adam Continuity.

Systems and data recovery strategies may harbour several dark corners and tripwires, which can compromise the timeliness and effectiveness of IT recovery activities.

Many organisations have built-in assumptions that are often proven wrong when the recovery process is tested or, even worse when a live restore is attempted to recover from a major IT incident. Some of the most common tripwires are:

Lack of homogeneity in back-up technology and media

A heterogeneous back-up environment can create significant impairment of the effectiveness of the data restoration process. Different media, locations, and restore conventions can create protracted, error-prone outcomes. Maintaining a standardised approach to back-up media and solutions will optimise the overall effectiveness of recovery.  A further tripwire often found in a heterogeneous back-up environment is that esoteric restore techniques, known to only a few individuals, can creep in. If these people are unavailable during a major incident, recovery can be significantly compromised.

Speed and capacity of recovery infrastructure

The infrastructure used to deliver the restore process can have a significant effect on recovery speed. Network bandwidth, server speeds and back-up media transfer rates will have a significant influence on recovery times. It is important that assumptions in this area are proven by testing.

Understanding differences in recovery datasets

The underlying structure of the data to be restored can significantly impact recovery timings. For instance, 10 TB of data in a single file will be restored significantly faster than that same 10 TB in 100,000 discrete files. The structure of the data to be restored needs to be understood and factored into data recovery estimates.

Configuration recovery

It’s not just data – as any ransomware victim will tell you. If your data has been compromised by malicious code, that code will still be present in the IT infrastructure configurations. It may be necessary to reimage servers, networks and endpoints to provide a trusted environment, before data can be restored to it. Maintaining configuration images that can be quickly restored to a target environment is a significant enabler for recovering from ransomware attacks.

Data growth

What looked achievable last year may not as achievable this year. Growth of data and new systems need to be considered when thinking about the ongoing viability of back-up and recovery capabilities.

Knowledge and capability

The knowledge and capability of IT staff will always be a major factor in achieving an effective recovery. Training, testing exercises, up-to-date scripts and run-books will contribute to the knowledge and capability of the team(s) involved in recovery activities.

Ensuring that the organisation’s back-up and recovery capabilities are effective and aligned to the risk management needs of the organisation requires ongoing oversight and monitoring. The main activities to be conducted here should include:

1. Audit and inventory: Know what data you have in terms of structure, back-up media and the underlying infrastructure for delivering data and system restoration.

2. Test: Regular tests of data restoration are the best way to find unidentified dependencies and confirm assumptions.

3. Optimise and rationalise: As far as practical, work towards a homogeneous back-up and restore environment. It simplifies and facilitates familiarity with the recovery process.

4. Train: Ensure that those responsible are completely familiar with the recovery process and that there are no single points of reliance on particular individuals.

If you would like to discuss your option for optimising and de-risking your organisation’s back-up and recovery capabilities – please contact or book a free call.

Comments are closed.