[Outages-discussion] [outages] Linode Fremont outage

Jim Popovitch jimpop at gmail.com
Mon Jun 1 14:23:46 EDT 2015


On Mon, Jun 1, 2015 at 2:09 PM, Warren Kumari <warren at kumari.net> wrote:
> On Mon, Jun 1, 2015 at 2:04 PM, Gert Doering <gert at greenie.muc.de> wrote:
>> Hi,
>>
>> On Mon, Jun 01, 2015 at 01:59:31PM -0400, Bill Wichers wrote:
>>> It is unfortunately not possible to achieve 100% uptime. All you can do is
>>> minimize the chances of an outage.
>>
>> ... and it should be pointed out every now and then that adding lots and
>> lots of stuff for "added redundancy" will eventually lead to "oh, the
>> trainee changed something he should not have touched" and the whole house
>> of cards collapses...
>
>
> Yup -- fairly much every network engineer who has deployed HSRP  /
> VRRP / <insert favorite vendor acronym for same> has the story of both
> devices becoming active, and hilarity ensuing. Or STP having a bad day
> and the "redundant" switch link becoming active, etc etc etc.
>
> Often adding redundancy causes more issues than it solves...

Very true.  I've seen that overdone at the netmgmt level so much that
it boggled my mind.  I've seen nth level redundancy, of
non-operational systems, lead to many imbalanced financial sheets.

On the other hand, SPoF is exponential, not additive.   The more SPoF
you have, the greater the odds of you will be having problems.

-Jim P.


More information about the Outages-discussion mailing list