Hey Facebook, where’d ya go?


As we all know the 4th of October 2021, worker productivity soared as Facebook, Instagram, and WhatsApp all went down for most of the day.

 

Well, much to everyone’s surprise, life went on, mass hysteria did not break out, and pretty much everyone carried on without missing a beat.  

 

Turns out some errant network configuration caused some of the busiest sites on the Internet to just disappear like Thanos snapping his fingers.  How does something like this happen?  More importantly, how do you prevent something like this from happening to you?

 

So on the day of the event, FB, Insta, and WhatsApp all disappeared without notice.  Not even a site not found sort of thing, they were just gone.  Basically, the entirety of their existence became unreachable, as if it was one of those dramatic scenes in the movie where someone cuts the internet cable with an ax.

 

This wasn’t nearly as exciting.  This boiled down to a bad configuration change.  Someone modified something internally at Facebook and it had a downstream effect that took out a big chunk of their network backbone. 

While the root cause sounded simple enough, this was an all-hands outage that still took the bulk of the workday to get resolved.

 

Without getting too much into the technical weeds of this, there’s this protocol called BGP.  BGP stands for border gateway protocol.  Much as the name might imply, it tells different networks how to communicate with one another. 

 

Without BGP, there’s no Internet.

 

So in layman’s terms, someone at Facebook borked the BGP.

 

What’s even worse, is there’s a tidal wave effect of Internet traffic when sites of this magnitude go down.  All your mobile devices and web browsers would continually keep trying to reload the various affected services.  Now while those were completely unreachable, the other servers like name servers completely unrelated to Facebook would start seeing up to 30x the normal amount of traffic as a result.

 

After roughly a 6 hour outage, the affected services gradually began to come back online.  Crisis averted, your targeted ads can safely start trying to sell you those delicious-looking desserts you clicked once in a moment of curiosity.

 

So what can we take from this?  Well, regardless of how large an infrastructure you have, a simple misconfiguration can cause quite an unforeseen outcome.  Ideally have some redundancy, but that wasn’t the issue. 

 

For your small business, there are plenty of solutions with redundant Internet access that can fit any budget.

 

I’m sure Facebook has tons of servers and data centers, but the Internet requires everything to be configured in specific ways to maintain harmony between the millions upon millions of devices it connects.  I’d be curious what the actual cause was as well as the resolution, but that’s up to The Zuck and company and if they want to make that information public.

 

On a positive note, at least this wasn’t another major hack.

 

So that’s all I’ve got for you today.  Would love to hear how a day without Facebook, Insta, and WhatsApp was like for you in the comments below.  Please like and subscribe, and if you’d like a consultation as to how your business can avoid costly downtime like this, book an appointment with me via the link in the notes, and I’ll catch you on the next one!