[outages] Lessons Learned: RRTB outage

Josh Luthman josh at imaginenetworksllc.com
Fri Sep 23 21:38:36 EDT 2011


We are done with this.  Please drop this topic from the list.

Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373
On Sep 23, 2011 9:36 PM, "kevin kelley" <xirin6 at yahoo.com> wrote:
> Sorry I was interested in outages not Jays mistakes. I really just need a
site that reports outages as it effects business and not the details on how
Jay fixed his issues.
>
>
> From: Jay Ashworth <jra at baylink.com>
> To: outages at outages.org
> Sent: Friday, September 23, 2011 7:17 PM
> Subject: [outages] Lessons Learned: RRTB outage
>
> So I had to renumber some servers this afternoon, cause I was expanding to

> a larger netblock (a 28 instead of a 29).
>
> I renumbered my servers and my DNS (which I'd set the TTL on to 300 like a
> good boy on Wednesday), and then pulled the trigger with Road Runner.  He
> "rescripted" his SMC router (the likely cause of some standard deviation
noted
> by a couple of reporters -- the router, not the rescripting), and I pinged
> it and it was ok, and I mtr'd it and it was ok, so I hit the webserver,
> and that came up fine, too.
>
> So then my boss calls me 15 minutes later: it's not working.
>
> "I wonder what that could be", sez I; I'd even traced and hit the
webserver
> from my Android phone (Sprint; Opera Mobile 11), and it had worked fine.
>
> That was Red Herring #1.
>
> So my boss uses a Mac.  So does my best friend, and while he was on the
way
> out the door to a second-anniversary-wake for a guy we went to school
with,
> he took a moment to try to hit it as well.  No luck.
>
> That was Red Herring #2 (both of them use Macs).
>
> Those of you who've been playing close, careful attention here may have
> noticed by now the thing I did *not* say I'd done:
>
> Changing the default gateway on the server.
>
> My office lan could hit it *because its uplink was in the same network*;
> *it* had a route for that network.  Everyone else... couldn't.
>
> Apparently, Sprint operates a caching server, even if you're using the
> version of Opera (Mobile, not Mini) that does *not*, which explains Red
> Herring #1.
>
> As for Red Herring #2, well... Macs don't, apparently, hard-cache IPs the
> way WinXP does (I'm looking at *you*, "ipconfig/ flushdns"), but I already
> knew that, because boss had the right address.
>
> Lesson Learned: Make sure you know what your diagnostic tests are telling
> you, before you use them to rule out possible problems.  Better yet: don't
> rule those potential problems out at all: work your whole diagnostic tree
> every time
>
> Oh: I forgot Red Herring #3: the traces that broke *didn't hit that
carrier
> edge router* for some reason.  No clue why.
>
> Thanks to the dozen or so people who responded; a couple of whom have
> way too {much time,many servers} on their hands.  :-)
>
> Followups to -discuss
>
> Cheers,
> -- jra
> --
> Jay R. Ashworth                  Baylink
jra at baylink.com
> Designer                    The Things I Think                      RFC
2100
> Ashworth & Associates    http://baylink.pitas.com        2000 Land Rover
DII
> St Petersburg FL USA      http://photo.imageinc.us            +1 727 647
1274
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/outages/attachments/20110923/9b428a63/attachment.htm>


More information about the Outages mailing list