[Outages-discussion] [outages] Azure Services Storage Outage...

Jeremy Chadwick jdc at koitsu.org
Fri Feb 22 22:04:55 EST 2013


(Removed outages@ since this is obviously -discussion material)

Highly, *highly* unlikely, especially that it's Microsoft.  They very
rarely terminate FTEs (full-time employees), while contractors are a
completely different matter.  I used to joke that as an FTE you'd
practically have to murder someone before they'd fire you.  All that
will happen is that they'll be marked down slightly during their
quarterly performance review.  That can be a good and a bad thing, and
I'm not going to go into that here.

I cannot tell you how many times during my professional career I've seen
SSL cert expirations cause outages.  How the outage manifests itself
depends on "where" in the topology the certificate is used (e.g. a
back-end server reliant on a cert may fail, while on the front-end a
customer/user may see this as a vague/ambiguous error).

At a past job, an engineer ended up coding the equivalent of a Nagios
check that examined all HTTPS services periodically *solely because* of
cert expirations recurring so many times over the years.  The check
would notify something like 6 weeks in advance (yes, in big companies it
can take time to get proper certs generated, especially when CA signing
comes into play).  There are many advantages to checking in real-time
against an active host (by "active" I mean, say, "Apache is up and
running") vs. scanning some repository of SSL certs and looking for
expired ones to add to a CRL.  For example, I remember a situation where
some engineers had rolled back a software/system release and thus rolled
back to old certs; real-time monitors caught this immediately,
**before** the system could be put back into service on a load balancer.

And I know there's at least one reader out there thinking this is the
solution, so let me put an end to that:

A longer cert lifetime (e.g. from 2 to 4 years) does nothing -- it just
permits more time will pass before another outage.  It also means the
likelihood of nobody remembering about the SSL cert expiration increases
(given how common it is for good engineers to change jobs, or whoever is
familiar with the systems leaving and not passing on that tribal
knowledge).  Solve the problem the right way, not the lazy way.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Sat, Feb 23, 2013 at 02:15:44AM +0000, David Thompson wrote:
> I'd be laughing out loud but some poor guy just lost his job.
> 
> 
> David Thompson
> Network Services Support Technician
> (O) 858.357.8794
> (F) 858-225-1882
> (E) david.thompson at vintalk.com
> (W) www.vintalk.com<http://www.vintalk.com/>
> 
> 
> 
> ________________________________
> From: outages-bounces at outages.org [mailto:outages-bounces at outages.org] On Behalf Of Network IPdog
> Sent: Friday, February 22, 2013 16:34
> To: OUTAGES Discussion List ; OUTAGES Mailing List
> Subject: [outages] Azure Services Storage Outage...
> 
> 
> Et al,
> 
> You know how we all use cloud services because they're managed by a highly skilled team who have skills that we don't?
> 
> Yeah, not that:
> 
> http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/751c85c5-b3b5-43ba-9d5b-770472ad79e1
> 
> Ruff, Ruff...!
> 
> Network IPdog
> 
> Ephesians 4:32  &  Cheers!!!
> 
> A password is like a... toothbrush  ;^)
> 
> Choose a good one, change it regularly and don't share it.

> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages



More information about the Outages-discussion mailing list