The Difference Between Fault Tolerance, High Availability, & Disaster Recovery

No Gravatar

Three terms that I hear being misused often by IT professionals new to the industry are “fault tolerance”, “high availability”, and “disaster recovery”. Here are some pictures that can help you visualize the differences between each of these terms.

Fault Tolerant

Fault tolerant solutions have the ability to keep operating even if a component, or multiple components, should fail. In the picture above the aircraft has four engines. The loss of a single engine would not cause the aircraft to fail as the three remaining engines would continue to operate. Take note that the engines are circled, and not the wings. Lose one wing and the aircraft will not fly. It is not enough to have multiple components to achieve fault tolerance. You need to have multiple components capable of maintaining the system if the other components are lost in order to achieve fault tolerance.

High Availability

The spare tire for a vehicle is an example of a high availability design. Losing one of the tires will require the vehicle to stop, but there is a spare tire ready to replace the failed component with. A high availability design does not prevent outages, that is what fault tolerance does. A high availability design means that the outage will be brief because it will not take long to redeploy the required component. Unlike replacing a tire with high availability the redeployment is completely automated following a component failure.

Disaster Recovery

Imagine that the jet in the picture above is your IT infrastructure, and the pilot is your business. The pilot above is ejecting from an aircraft that has failed in some way and the aircraft is now dangerous to remain within. The pilot has activated the ejection system and will soon have a parachute deployed so that the he will survive while the jet will be destroyed. That is the whole point of disaster recovery – you are saving your business by ditching your compromised infrastructure. A good disaster recovery solution is all about getting your business’s valuable data and operations out of the failed infrastructure and running again on new infrastructure. Disaster recovery is not about saving the infrastructure. Disaster recovery is about saving the business.

Keep these images in mind when you are designing your infrastructure. Think through what it is that you need to deliver: multiple engines to keep the business flying with, a spare tire ready to go following an acceptable outage, or a parachute to save what matters most following a disaster. Being able to visualize an objective often helps with the design process, and recalling these images will prevent you from using the wrong term when talking about fault tolerance, high availability, and disaster recovery!

Tagged , , . Bookmark the permalink.

3 Responses to The Difference Between Fault Tolerance, High Availability, & Disaster Recovery

  1. Pingback: The Difference Between Fault Tolerance, High Availability, & Disaster Recovery | TechBits

  2. Pingback: JSJ 411: Unit Testing Jest with Daniel Caldas -

  3. Pingback: 容错,高可用,灾备的区别 – IT汇

Leave a Reply