When deciding on system architecture for an enterprise, there are many factors to consider, such as performance, scalability, availability, reliability, cost, and operation. In particular, ensuring high availability for the system must always be focused on interruption or inactivity) to ensure business continuity. This is always shown with the businesses’ architectures, as every minute lost in system downtime can lead to a huge loss of enterprise’s revenue and reputation.
The glitch of the “giants.”
One of the issues discussed over the past month was an AWS service crash with Kinesis Data Streams errors, which lasted for hours, resulting in a wide range of services being affected; at a low level, the application is slow, there are cases where the application is completely paralyzed, the big names affected can be mentioned as 1Passwords, Adobe Spark, Autodesk, Coinbase, DataCamp, Flickr, Roku, The Washington Post …
Most recently, in the afternoon of December 14, 2020 (Vietnam time), Google services stopped working worldwide, with errors recorded, including Gmail, Google Calendar, YouTube, and a section of Google Search. The problem with Google affected a large number of Internet users globally. This is also the biggest Google crash in 2020.
It is easy to see that big names like Google or AWS are inevitable problems with the system; although the probability is extremely low and rare, the damage is not small.
So, what is High Availability (HA)?
High availability refers to the ability to avoid unplanned outages by eliminating single-point-of-failure (SPOF). High availability systems are understood as capable of continuing to function even when critical components fail, without interrupting service or losing data, and recovering seamlessly from failure. This is a measure of the hardware, operating system, middleware, and database management software reliability.
High availability is usually measured as a percentage of uptime. The amount of “nines” is often used to indicate high availability. For example, “four nines” indicates a system that is active 99.99% of the time, meaning it has only been down for 52.6 minutes for the whole year.
Most of the challenges with implementing high availability in an enterprise environment often involve product vendor(s) or application design – especially when implementing new products. Most small businesses don’t pay too much attention to the stringent requirements of 24/7 application support. The wrong decision-making can lead to a product architecture that is incompatible with the enterprise’s high availability solutions.
Source: VTI Cloud