Site News: December 1st Outage

by Ryan Smith on December 1, 2022 6:00 PM EST

Posted in
Site Updates

31 Comments | Add A Comment

31 Comments

As many of you noticed, AnandTech has spent several hours offline today. We are still in recovery mode at the moment (as I write this, the site has been restored to a copy from November 25^th), but now that our major restoration efforts are completed, I wanted to offer you guys a brief update on the status of AnandTech.

At around 13:00 UTC (5am PT) today, the on-site cloud storage for AnandTech’s hosting provider became corrupted. As a result, AnandTech (and some other sites) were brought offline. Due to the nature of the corruption and the need to begin restoration efforts ASAP, we opted to restore the site from an off-site cold storage backup, rather than trusting the questionable on-site storage.

This is the first time we’ve ever had to execute our off-site data recovery plan before. And while it meant AT took a bit longer to restore than would be ideal, ultimately everything worked out and proved the necessity for off-site backups.

We’re still working to restore content from the last few days. Articles will be back, but we’ve likely lost any comments and user account registrations/updates made since midday Friday. Sorry about that! And thank you for bearing with us during today's outage.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

31 Comments

View All Comments

iq100 - Friday, December 2, 2022 - link
Geoffrey A wrote:
"A design that that has no single point of failure.
Or state there is no such design."

I think it's fair to say the latter wins the prize.
---
Tandem Computers, long ago, had such a design. Not even expensive with today's inexpensive servers.

Tandem's NonStop systems use a number of independent identical processors and redundant storage devices and controllers to provide automatic high-speed "failover" in the case of a hardware or software failure. To contain the scope of failures and of corrupted data, these multi-computer systems have no shared central components, not even main memory. Conventional multi-computer systems all use shared memories and work directly on shared data objects. Instead, NonStop processors cooperate by exchanging messages across a reliable fabric, and software takes periodic snapshots for possible rollback of program memory state.

reference: https://en.wikipedia.org/wiki/Tandem_Computers
"Tandem's NonStop systems use a number of independent identical processors and redundant storage devices and controllers to provide automatic high-speed "failover" in the case of a hardware or software failure. To contain the scope of failures and of corrupted data, these multi-computer systems have no shared central components, not even main memory. Conventional multi-computer systems all use shared memories and work directly on shared data objects. Instead, NonStop processors cooperate by exchanging messages across a reliable fabric, and software takes periodic snapshots for possible rollback of program memory state."

What does Anandtech, and everyone else think? Is it possible? Write up the design, here.
GeoffreyA - Saturday, December 3, 2022 - link
I think it's a brilliant design, this extreme redundancy in the spirit of distribution.

(I get the feeling that even the universe's "data structures" keep track of things in a distributed fashion. When reading current theories, one gets the impression that nothing is global, but the consistent state is built up piece by piece. Perhaps the key is message transfer, rather than storage in some "big table!")
The Von Matrices - Friday, December 2, 2022 - link
If the only data loss is a few of the most recent comments, I would call that a success.

There is always a tradeoff of what you are willing to lose vs. how much you are willing to pay to avoid loss. You certainly can design a system that cannot suffer a data loss event, especially on a news site where there isn't much much data being generated, but whether the company can afford such a system is another issue. News, especially online news, is an extremely low profit business.
Dug - Tuesday, December 6, 2022 - link
Why? Are you paying their salaries? It's just a website that went down and came back up, it's not a big deal. The "hardware/software design that cannot suffer a data loss. A design that that has no single point of failure" does not exist.
The Von Matrices - Friday, December 2, 2022 - link
When I saw the number of comments the "Best CPUs" article decrease, I thought it was due to the release of the long awaited comment editor. Alas, it was only data corruption. We can only dream...
Threska - Saturday, December 3, 2022 - link
Well that "sponsored post" article lost a lot of comments.
ballsystemlord - Sunday, December 4, 2022 - link
Hopefully only the shill posts (ha ha).
fervloka - Saturday, December 3, 2022 - link
All I want to know is why Anandtech is apparently unaware that AMD launched Genoa.
DigitalFreak - Saturday, December 3, 2022 - link
With the frequency articles are posted, was anything really lost?
supdawgwtfd - Sunday, December 4, 2022 - link
+500

Site News: December 1st Outage

Post Your Comment

31 Comments

View All Comments

iq100 - Friday, December 2, 2022 - link

GeoffreyA - Saturday, December 3, 2022 - link

The Von Matrices - Friday, December 2, 2022 - link

Dug - Tuesday, December 6, 2022 - link

The Von Matrices - Friday, December 2, 2022 - link

Threska - Saturday, December 3, 2022 - link

ballsystemlord - Sunday, December 4, 2022 - link

fervloka - Saturday, December 3, 2022 - link

DigitalFreak - Saturday, December 3, 2022 - link

supdawgwtfd - Sunday, December 4, 2022 - link

Log in

Don't have an account? Sign up now