AWS power failure killed hardware and instances • The registry


A small group of system administrators have disaster recovery work on their hands, in addition to Log4J, thanks to a power outage in the Amazon Web Services USE1-AZ4 Availability Zone in the US-EAST-1 region .

The lack of fun began at 4:35 a.m. PT (PST – aka 12:35 UTC) on December 22, when AWS noticed launch failures and networking issues for some instances of its Elastic Compute Cloud IaaS service.

26 minutes later, the cloud colossus confessed to a power outage and recommended moving workloads to other parts of its cloud that were still receiving power.

Power was restored at 5:39 a.m. PST, and AWS reported a slow recovery of services, but a 6:51 a.m. update admitted that ongoing network issues were hampering full recovery efforts.

At the time of writing this article, AWS still has not fully restored networking.

And rollback may not be possible for some customers: At the time of writing, the most recent update on the AWS status page offers the following grim news:

It’s the digital equivalent of waking up with a lump of coal on Christmas Day.

This is the second AWS outage in a fortnight: on December 15, the operation’s US-WEST-1 disappeared for about 30 minutes. The US-EAST-1 region also went extinct for eight hours in September 2021.

AWS advises customers not to rely on a single Availability Zone (AZ). The architecture of the equipment places two or more Availability Zones within a single region, and each zone is physically distant from the others, so that a single physical infrastructure incident cannot destroy the entire region. Using multiple regions therefore improves resilience – and costs.

Not all users follow AWS’s guidelines for using multiple Availability Zones, so when incidents like this occur, their servers and data will become unavailable.

US-EAST-1 is the largest and oldest region in AWS. Cloud economist Corey Quinn rates its importance as follows:

AWS offers a service level agreement 99.95% uptime for compute instances, or just under 22 minutes per month of downtime. If AWS misses this mark, it offers a ten percent service credit, an amount that rises to thirty percent if uptime drops below 99 percent. If uptime drops below 95%, customers receive 100% of their fees as credits.

AWS also automatically waives the charge if the EC2 instance is unavailable for more than six minutes in a single hour.

Good luck if you are one of the AWS customers facing the need for a sudden rebuild. ®


About Author

Comments are closed.