[Outages-discussion] CenturyLink Outages this morning

valdis.kletnieks at vt.edu valdis.kletnieks at vt.edu
Fri Dec 28 06:58:19 EST 2018


On Fri, 28 Dec 2018 00:39:14 -0700, "Keith Medcalf" said:
> 
> On Thursday, 27 December, 2018 23:08, valdis.kletnieks at vt.edu wrote:
> >On Thu, 27 Dec 2018 20:58:27 -0700, "Keith Medcalf" said:
>
> >> One wonders why they do not simply "undo" whatever change they made
> >> between when things were "working" and when they "broke" ...

> Are you suggesting that CenturyLink is operating such an ill-conceived network and

tl;dr: <insert "One does not simply..." meme here>

No, just pointing out that "simply undo whatever change" may not be as "simply"
as you are trying to make it sound.

I spent a good chunk of 2 days earlier this week wondering why my Fedora box
got an update for a package called glib2, and the login manager started
crashing inside the glib2 shared library.  Slam dunk, right?  Nope. Even after
rolling back that change, and even rebooting, it was still crashing.  Turned
out to be something else entirely. (The morbidly interested can look at bug
numbers 1661952, 1662168, and 1662080 in RedHat's bugzilla)

And I have a good bar story where me, several other co-workers, IBM, SGI,
Brocade, and DDN spent 18 months tracking down an intermittent data corruption
in a data storage system issue that started showing up after a system software
upgrade on a system that that had been in production for a year. Symptoms were
the system reading data from the wrong LUN in the storage array.  Finally
tracked it down to buggy firmware on a 10G ethernet card in several servers - the
same vendor, card, EC, and firmware level that was working like a champ in
several other servers in the same rack.

So forgive me if I look askance at people who say "Simply undo what you changed".




More information about the Outages-discussion mailing list