<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body dir="auto">
Two things I gleaned from this article, reading between the lines…
<div><br>
</div>
<div>First, whoever wrote this audit tool wasn’t expecting someone to make such a flagrant error. This is a great example of hubris at work:</div>
<div>
<div><br>
</div>
<div>
<div>
<blockquote type="cite"><span style="caret-color: rgb(103, 120, 138); color: rgb(103, 120, 138); font-family: "Facebook Reader", sans-serif; font-size: 18px; -webkit-tap-highlight-color: rgba(0, 0, 0, 0); -webkit-text-size-adjust: 100%; background-color: rgb(255, 255, 255);">This
was the source of yesterday’s outage. During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network,
effectively disconnecting Facebook data centers globally. Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command. </span></blockquote>
</div>
<div><br>
</div>
<div>Labeling a process oversight as a “bug” is disingenuous and misleading.</div>
<div><br>
</div>
<div>Second, the rumor about using an angle grinder to open server cages was probably true, at least in one of the datacenters. Note the careful wording of “the hardware and routers are designed to be difficult to modify.” I wonder why that would be? In
a dire emergency, a simple text message of “We have access to the routers” would have satisfied the incident management team. They would not care how it happened. </div>
<div><br>
</div>
<div>
<blockquote type="cite"><span style="caret-color: rgb(103, 120, 138); color: rgb(103, 120, 138); font-family: "Facebook Reader", sans-serif; font-size: 18px; -webkit-tap-highlight-color: rgba(0, 0, 0, 0); -webkit-text-size-adjust: 100%; background-color: rgb(255, 255, 255);">Our
primary and out-of-band network access was down, so we sent engineers onsite to the data centers to have them debug the issue and restart the systems. But this took time, because these facilities are designed with high levels of physical and system security
in mind. They’re hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them. So it took extra time to activate the secure access protocols needed to get people onsite
and able to work on the servers. </span></blockquote>
</div>
<div><br>
<div dir="ltr">—Sent from my iPhone</div>
<div dir="ltr"><br>
<blockquote type="cite">On Oct 5, 2021, at 12:44 PM, Anthony Hoppe <anthony@vofr.net> wrote:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr"><span>If that's the case, it's a bit of an oversight. If you're depending on DNS for OOB access, you'd want OOB DNS servers available too, heh...</span><br>
<span></span><br>
<span>Maybe they can go back to the golden days of sticking a modem & POTS line on the console port of their routers/switches. Or get super fancy and deploy terminal servers at each datacenter to conserve on phone lines :-D.</span><br>
<span></span><br>
<span></span><br>
<span></span><br>
<span>----- Original Message -----</span><br>
<span>From: "George Metz" <george.metz@gmail.com></span><br>
<span>To: "Ross Tajvar" <ross@tajvar.io></span><br>
<span>Cc: "Outages List" <outages-discussion@outages.org></span><br>
<span>Sent: Tuesday, October 5, 2021 12:12:23 PM</span><br>
<span>Subject: Re: [Outages-discussion] FB Outage AAR I - Engineering Posts Pabulum</span><br>
<span></span><br>
<span>If I had to guess (because I was wondering about that too), it was</span><br>
<span>because they didn't have the IPs of their out-of-band stuff available</span><br>
<span>and were expecting their DNS to be able to answer... and the DNS</span><br>
<span>servers functionally shut themselves off.</span><br>
<span></span><br>
<span>On Tue, Oct 5, 2021 at 2:14 PM Ross Tajvar <ross@tajvar.io> wrote:</span><br>
<blockquote type="cite"><span></span><br>
</blockquote>
<blockquote type="cite"><span>"Our primary and out-of-band network access was down"</span><br>
</blockquote>
<blockquote type="cite"><span></span><br>
</blockquote>
<blockquote type="cite"><span>Sounds like someone doesn't know what "out-of-band" means.</span><br>
</blockquote>
<blockquote type="cite"><span></span><br>
</blockquote>
<blockquote type="cite"><span>On Tue, Oct 5, 2021, 2:04 PM Jay R. Ashworth <jra@baylink.com> wrote:</span><br>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span></span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>This doesn't say anything we don't already know, except where it conflicts</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>with things we already know. But it's fun to watch, ain't it? ;-)</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span></span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span> https://urldefense.com/v3/__https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/__;!!PIZeeW5wscynRQ!8WP_L-PMYNDWbAaKFWqDxe--oA8PC7mmIUB8fk6ivcTaMBY0VvIxUbJJkOGhmYjLkg$
</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span></span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>Cheers,</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>-- jra</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span></span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>--</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>Jay R. Ashworth Baylink jra@baylink.com</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>Designer The Things I Think RFC 2100</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>Ashworth & Associates https://urldefense.com/v3/__http://www.bcp38.info__;!!PIZeeW5wscynRQ!8WP_L-PMYNDWbAaKFWqDxe--oA8PC7mmIUB8fk6ivcTaMBY0VvIxUbJJkOHOHjOc2Q$ 2000 Land Rover DII</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>_______________________________________________</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>Outages-discussion mailing list</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>Outages-discussion@outages.org</span><br>
</blockquote>
</blockquote>
<blockquote type="cite">
<blockquote type="cite"><span>https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/outages-discussion__;!!PIZeeW5wscynRQ!8WP_L-PMYNDWbAaKFWqDxe--oA8PC7mmIUB8fk6ivcTaMBY0VvIxUbJJkOG-Be-9rw$
</span><br>
</blockquote>
</blockquote>
<blockquote type="cite"><span></span><br>
</blockquote>
<blockquote type="cite"><span>_______________________________________________</span><br>
</blockquote>
<blockquote type="cite"><span>Outages-discussion mailing list</span><br>
</blockquote>
<blockquote type="cite"><span>Outages-discussion@outages.org</span><br>
</blockquote>
<blockquote type="cite"><span>https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/outages-discussion__;!!PIZeeW5wscynRQ!8WP_L-PMYNDWbAaKFWqDxe--oA8PC7mmIUB8fk6ivcTaMBY0VvIxUbJJkOG-Be-9rw$
</span><br>
</blockquote>
<span>_______________________________________________</span><br>
<span>Outages-discussion mailing list</span><br>
<span>Outages-discussion@outages.org</span><br>
<span>https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/outages-discussion__;!!PIZeeW5wscynRQ!8WP_L-PMYNDWbAaKFWqDxe--oA8PC7mmIUB8fk6ivcTaMBY0VvIxUbJJkOG-Be-9rw$
</span><br>
<span>_______________________________________________</span><br>
<span>Outages-discussion mailing list</span><br>
<span>Outages-discussion@outages.org</span><br>
<span>https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/outages-discussion__;!!PIZeeW5wscynRQ!8WP_L-PMYNDWbAaKFWqDxe--oA8PC7mmIUB8fk6ivcTaMBY0VvIxUbJJkOG-Be-9rw$
</span><br>
</div>
</blockquote>
</div>
</div>
</div>
</body>
</html>