[outages] Lessons Learned: RRTB outage

Kevin Kelley xirin6 at yahoo.com
Fri Sep 23 20:44:55 EDT 2011


Wtf is the point you are tying to make ? 

Sent from my iPhone

On Sep 23, 2011, at 7:17 PM, Jay Ashworth <jra at baylink.com> wrote:

> So I had to renumber some servers this afternoon, cause I was expanding to 
> a larger netblock (a 28 instead of a 29).
> 
> I renumbered my servers and my DNS (which I'd set the TTL on to 300 like a
> good boy on Wednesday), and then pulled the trigger with Road Runner.  He
> "rescripted" his SMC router (the likely cause of some standard deviation noted
> by a couple of reporters -- the router, not the rescripting), and I pinged
> it and it was ok, and I mtr'd it and it was ok, so I hit the webserver,
> and that came up fine, too.
> 
> So then my boss calls me 15 minutes later: it's not working.
> 
> "I wonder what that could be", sez I; I'd even traced and hit the webserver
> from my Android phone (Sprint; Opera Mobile 11), and it had worked fine.
> 
> That was Red Herring #1.
> 
> So my boss uses a Mac.  So does my best friend, and while he was on the way
> out the door to a second-anniversary-wake for a guy we went to school with,
> he took a moment to try to hit it as well.  No luck.
> 
> That was Red Herring #2 (both of them use Macs).
> 
> Those of you who've been playing close, careful attention here may have
> noticed by now the thing I did *not* say I'd done: 
> 
> Changing the default gateway on the server.
> 
> My office lan could hit it *because its uplink was in the same network*;
> *it* had a route for that network.  Everyone else... couldn't.
> 
> Apparently, Sprint operates a caching server, even if you're using the 
> version of Opera (Mobile, not Mini) that does *not*, which explains Red
> Herring #1.
> 
> As for Red Herring #2, well... Macs don't, apparently, hard-cache IPs the
> way WinXP does (I'm looking at *you*, "ipconfig/ flushdns"), but I already
> knew that, because boss had the right address.
> 
> Lesson Learned: Make sure you know what your diagnostic tests are telling 
> you, before you use them to rule out possible problems.  Better yet: don't
> rule those potential problems out at all: work your whole diagnostic tree
> every time
> 
> Oh: I forgot Red Herring #3: the traces that broke *didn't hit that carrier
> edge router* for some reason.  No clue why.
> 
> Thanks to the dozen or so people who responded; a couple of whom have
> way too {much time,many servers} on their hands.  :-)
> 
> Followups to -discuss
> 
> Cheers,
> -- jra
> -- 
> Jay R. Ashworth                  Baylink                       jra at baylink.com
> Designer                     The Things I Think                       RFC 2100
> Ashworth & Associates     http://baylink.pitas.com         2000 Land Rover DII
> St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages




More information about the Outages mailing list