[Outages-discussion] [outages] So, when Twitter goes down

Larry Sheldon LarrySheldon at cox.net
Mon Aug 24 16:53:19 EDT 2009


Jay R. Ashworth wrote:
> where do you announce it?  :-)


One of the great conundra (tm) of our times.

When I was active in the field I lost more arguments with management 
over this kind of issue:

Notify critical people via pagers when:
   The power fails.
   The supervising server fails.
   The campus telephone system fails.
   Any of a number of other things fail, where the system might actually
    work.

The solution to this one?  A system not subject to the failures the 
object system is to report loss of heart-beat data from the object 
system.  Leaves only the question about a failure big enough to ake out 
both systems.[1]

Eliminate paper records by maintaining the indices to the back-up media 
on the machine being backed up.[2]



There are others.

[1] Carry the second system around with you, you say?  Think aboutthat 
for a while.
[2] I know of answers to this, but they involve spending money that 
generates no income, so of course they are not interesting.  (Preventing 
the loss of money is interesting to me--probably why I never became a 
big manager.)
-- 
Requiescas in pace o email              Two identifying characteristics
                                              of System Administrators:
Ex turpi causa non oritur actio        Infallibility, and the ability to
                                              learn from their mistakes.
Eppure si rinfresca

ICBM Targeting Information:
	http://tinyurl.com/4sqczs
	http://tinyurl.com/7tp8ml
	


More information about the Outages-discussion mailing list