[outages] Lessons Learned: RRTB outage
Kevin Kelley
xirin6 at yahoo.com
Fri Sep 23 20:44:55 EDT 2011
Wtf is the point you are tying to make ?
Sent from my iPhone
On Sep 23, 2011, at 7:17 PM, Jay Ashworth <jra at baylink.com> wrote:
> So I had to renumber some servers this afternoon, cause I was expanding to
> a larger netblock (a 28 instead of a 29).
>
> I renumbered my servers and my DNS (which I'd set the TTL on to 300 like a
> good boy on Wednesday), and then pulled the trigger with Road Runner. He
> "rescripted" his SMC router (the likely cause of some standard deviation noted
> by a couple of reporters -- the router, not the rescripting), and I pinged
> it and it was ok, and I mtr'd it and it was ok, so I hit the webserver,
> and that came up fine, too.
>
> So then my boss calls me 15 minutes later: it's not working.
>
> "I wonder what that could be", sez I; I'd even traced and hit the webserver
> from my Android phone (Sprint; Opera Mobile 11), and it had worked fine.
>
> That was Red Herring #1.
>
> So my boss uses a Mac. So does my best friend, and while he was on the way
> out the door to a second-anniversary-wake for a guy we went to school with,
> he took a moment to try to hit it as well. No luck.
>
> That was Red Herring #2 (both of them use Macs).
>
> Those of you who've been playing close, careful attention here may have
> noticed by now the thing I did *not* say I'd done:
>
> Changing the default gateway on the server.
>
> My office lan could hit it *because its uplink was in the same network*;
> *it* had a route for that network. Everyone else... couldn't.
>
> Apparently, Sprint operates a caching server, even if you're using the
> version of Opera (Mobile, not Mini) that does *not*, which explains Red
> Herring #1.
>
> As for Red Herring #2, well... Macs don't, apparently, hard-cache IPs the
> way WinXP does (I'm looking at *you*, "ipconfig/ flushdns"), but I already
> knew that, because boss had the right address.
>
> Lesson Learned: Make sure you know what your diagnostic tests are telling
> you, before you use them to rule out possible problems. Better yet: don't
> rule those potential problems out at all: work your whole diagnostic tree
> every time
>
> Oh: I forgot Red Herring #3: the traces that broke *didn't hit that carrier
> edge router* for some reason. No clue why.
>
> Thanks to the dozen or so people who responded; a couple of whom have
> way too {much time,many servers} on their hands. :-)
>
> Followups to -discuss
>
> Cheers,
> -- jra
> --
> Jay R. Ashworth Baylink jra at baylink.com
> Designer The Things I Think RFC 2100
> Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
> St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages
More information about the Outages
mailing list