The fire that hit some OVH servers has revealed how many companies are still unprepared for disaster recovery measures. Let's see what disaster recovery is and how this strategy, often too much underestimated, is instead crucial to maintain the business continuity and keep companies on their feet.
In this article, Disaster recovery: what it is and why it's so important, we’ll talk about risk factors, strategies used to deal with emergencies and mistakes to avoid. But, first of all, let's see what disaster recovery means.
Table of Contents
Disaster recovery: what is it?
A simple and exhaustive definition of disaster recovery is the one we find on Wikipedia:
With disaster recovery, in computer science and in particular, in the field of information security, we mean the set of technological and logistical/organizational measures designed to restore systems, data and infrastructure necessary for the provision of services [...] in the event of serious emergencies that affect the regular business.WIKIPEDIA
The term disaster recovery refers to the set of procedures to be put in place in the event of unforeseeable events. This strategy, therefore, includes several interconnected aspects. It starts with the identification of risk factors, in order to put in place preventive actions to be able to reduce them.
Since risks, however, can never be reduced to zero, recovery actions must also be taken to restore business continuity.
It is no coincidence that disaster recovery is a strategy that is in turn part of a broader design known as business continuity. Precisely to ensure the continuity of a company's activities, it is necessary to be prepared to deal with disruptions and restore them in the shortest possible time.
But let's come with order and understand first of all what kind of risks are referred to.
Disaster recovery risk factors
The disaster recovery strategy in a company only becomes concrete with the implementation of a disaster recovery plan. We are talking about what is called a Disaster Recovery Plan or even DRP.
In contrast to the strategy in which it is decided, (in advance which measures to put in place in response to a critical situation), the plan defines the precise steps to be followed. It's important to have a plan in place to minimize errors and reduce recovery times. There is no room for improvisation, especially in a sensitive situation like this.
In a disaster recovery strategy, the term disaster refers to unexpected events of a highly diverse nature.
On the one hand, there are hackers that undermine computer security (viruses, phishing or DDoS attacks) not excluding real theft. On the other hand, there are natural events such as floods, earthquakes or fires.
Other types of malfunctions can be associated with hardware failures or blackouts. The possibility of human error must also be considered.
What is the purpose of a disaster recovery plan?
From the definition of the risks, procedures are established to respond to the critical event in an adequate manner. This is the purpose of the disaster recovery plan: to establish how to proceed in order to minimize damage.
Reducing damages means limiting the disruption of business activities to the shortest possible time and therefore being able to limit the economic losses associated with the disruption itself. To do this it is strictly necessary to cooperate to restore the service as soon as possible.
This is why everyone in a company should be informed about the measures to be taken in such cases, so that they can do their part.
If operations cannot be restored to full capacity in a short period of time, a number of alternative procedures must be put in place promptly so that operations are not interrupted completely.
If a firm does not establish a disaster recovery plan, the risk is that business continuity will not be restored. Maintaining data properly so that it can be restored in the event of a disaster is essential to avoid the risk of losing it permanently.
A loss of data can put the entire operation in crisis, bringing the company to a standstill. And it's not certain that a business can recover from such an event.
Disaster recovery strategies
When defining a disaster recovery plan, the company must have two main objectives: Recovery Point Objective (RPO) and Recovery Time Objective (RTO).
The RPO indicates the amount of data that could be lost in the event of a disaster. This parameter depends, therefore, on the frequency with which backups are performed. The data lost will be the data that has been modified or created between the time of the last backup and the time of the disaster.
The RTO is the amount of time it takes to restore the data and infrastructure to become operational again. In other words, it is the amount of downtime that can be tolerated. This parameter will therefore depend on how quickly the recovery takes place, which is usually performed by a systems engineer.
To limit the impact on your business as much as possible, it goes without saying that these two parameters should be kept to a minimum. This is equivalent to losing less data and being able to restore systems in the shortest possible time.
The crucial role of backups
Data is maintained by ensuring that the data is redundant, i.e., by creating multiple copies of the same data so that it is not lost.
It goes without saying that a data retention strategy that is done through backups is inseparable from a disaster recovery strategy. It is important, however, at the same time not to confuse these two concepts with each other.
The goal of backups is to ensure that the data is recovered as fully as possible. On the other hand, however, disaster recovery refers to a broader strategy that includes not only the recovery of data, but of the entire infrastructure.
The OVH case
The fire at the Strasbourg servers of OVHcloud triggers several remarks about data storage methods.
Let's imagine that the backup is stored on the same server as the main site: in this case a damage to the server jeopardizes the original data and the copy itself making the backup useless.
For this reason the best strategy is to use different servers to store the backup copies. If, however, the servers used are within a single datacenter there is another problem.
So let's take a look at the OVH case: a fire broke out in one of the data centers and spread to neighboring data centers. An event of this magnitude involving multiple facilities in the vicinity results in irreparable data loss.
What is the solution to such an event?
Use datacenters located in different locations. An option that many providers make available with separate disaster recovery plans. These are options that OVH also made available, but many users, undervalue the importance of these procedures, therefore they had not activated them.
How we protect your data
SupportHost's practice is to use external servers in datacenters located in a different country than the one where the servers on which the sites hosted are located. Our datacenter is in Germany, but backups are stored in separate datacenters in the Netherlands. Not just in a different datacenter, we use a different country.
In this way we secure our users' data, being able to guarantee the safety of the data due to relocation. If there is a failure of the servers where your site is hosted you don't risk losing your backups as well, but the loss of data is reduced to the time elapsed since the last backup.
To minimize the inconvenience we offer in our hosting services a daily backup (every 24 hours) which is already included in the price of our WordPress hosting, shared hosting, dedicated servers , VPS cloud hosting and reseller hosting packages. The backups are kept for 30 days and you are free to restore them independently. You only need a few clicks from the control panel (cPanel) to manage the restore.
DNS geolocalized cluster
In addition to this we use a geolocated DNS cluster and use 4 different domains. Let me explain.
We have 4 knots in 4 separate datacenters, and they are different from the datacenters we maintain the servers and backup servers on. Each of these knots represents one of the 4 nameservers.
Typically hosting companies use:
And these are made to point directly to the server where the customer's site is hosted.
With our solution we have 4 different nameservers pointing to 4 different nodes, using 4 different domains:
ns.supporhost.com ns.supporhost.net ns.supporhost.eu ns.supporhost.us
This solution allows us to eliminate the risk due to a problem with a domain. For example, if the .com registry suspends the domain, all of our customers' sites remain online.
If a datacenter where one of these 4 knots resides has problems, or if one of these knots has problems, the other 3 continue to keep all of our customers' sites reachable (1 is enough to have the customers' sites online).
If the server where the customer's site is located has a problem (e.g. the datacenter catches fire or simply a disk on the server breaks) we can fix it in a few minutes:
- We restore the backup on a new server, taking it from the automatic backups that reside in a different datacenter.
- We change the IP of that domain in the cluster, in this case we don't have DNS propagation times, the pointing is updated instantly.
Unfortunately, problems are always lurking around, and it's always better to be prepared, after all, prevention is better than cure.
In this article, Disaster recovery: what it is and why it's so important, we’ve seen what disaster recovery is and how being prepared for emergencies is critical for a business today.
Knowing how to keep your data safe is essential as part of a broader strategy to restore it quickly. This is the only way to minimize downtime and, consequently, the financial losses that could be suffered.
Have you ever had to deal with an unplanned event? Did you have a plan to follow and were you able to restore data and business quickly? Share your experience in the comments below.