[Outages-discussion] Internet "backbone"

Jeremy Chadwick outages at jdc.parodius.com
Sat Nov 19 09:58:23 EST 2011


On Sun, Nov 20, 2011 at 12:06:34AM +1000, Robert Brockway wrote:
> On Sat, 19 Nov 2011, Joseph Jackson wrote:
> 
> >TL;DR - The community needs to educated each other and our end
> >users so as people better understand how this highly complex
> >network works and is designed.
> 
> Some of us are trying :)  I'm currently writing a series of
> presentations and essays that will all be released under CC-BY-SA.
> I'm currently working on "What is Free and Open Source Software?"
> and the next one is "How the Internet works".  These two are aimed
> primarily at a non-technical audience.  I'm also working on more
> technical presentations[1].
> 
> I think a big part of the problem is that many people who work in
> the technical side of the IT industry are quite shaky on network
> theory and principles.  Most of us have learnt 'on the job' in a
> rapidly changing environment but some of us may not have spent
> enough time filling in the holes in our knowledge.
> 
> A second problem (IMHO) is a lack of community involvement.  The
> Internet and our understanding of it are evolving rapidly[2].
> Involvement in technical communities allows us to see what
> strategies others are following, what has and has not worked, etc.
> Operating in isolation seems like a great way to reinvent the wheel
> to me.
> 
> I've spent a lot of time encouraging education and community
> involvement. Some people get it, some don't.
> 
> [1] I've done presentations in the past too and these are all being
> updated and will be relased under CC-BY-SA as well.  Yes this is a
> lot of work which may explain why I'm doing this at midnight on a
> Saturday night :)
> 
> [2] I expect this to continue for several more decades.  The
> Internet is in its infancy.

More or less a peanut gallery comments in passing, but some advice to
take into consideration when writing your docs and essay:

1. Making your attendees/readers aware that the Internet is broken
24x7x365.  People may laugh at this statement but it's downright true
the majority of the time.  It's not a joke.  There is always something
anomalous going on within the Internet between two or more peering
providers, or something on a larger scale ("oops, some random ISP in a
third-world country let a customer's BGP announcement through which
shouldn't have been").  It's amazing it works as well as it does.

2. I cannot stress this point enough especially if you have CTO/CIO
attendees: the Internet should not be used as a replacement for
dedicated circuits.  I can tell you factually that there are very large,
very important Fortune 500 companies who rely on the Internet for
transit of mission-critical applications/packets.  I cannot name names
(nor will I off-list -- the repercussions would be drastic), but these
companies "cannot justify the cost of dedicated circuits"[1].  Many seem
to think that if you throw a VPN in front of something it suddenly
becomes reliable, only 3 months later do you find your engineers on
3-hour-long bridges trying to explain what happened, the concept of
asymmetric routing, and how when you use the Internet for transit you
really are at the mercy of, well, everyone/everything.

3. What doesn't work: lack of monitoring.  If you have dedicated
circuits, monitor them.  Have devices that accept SNMP traps for links
going down, BFD sessions or BGP/OSPF sessions going down, that sort of
thing.  Take the time to invest in the necessary monitoring bits for all
of your network connectivity, and if possible, necessary equipment for
taps (e.g. Riverbed products, etc.).  Many of the Big Boys(tm) do
absolutely no monitoring because "their networks are too large" and
"they have too many customers", and it never ceases to amaze me how
many companies do not have the capability to capture packets that go out
across a link (whether that be Internet-bound or end-to-end).

4. Please please *PLEASE* make sure you mention the asymmetrical nature
of IP routing.  This is still a topic that isn't commonly discussed or
taught/passed down from generation to generation.  Lots of folks really
do think an packet will take the same logical (and physical!) path going
out as the response packet does coming back.  Related is a good resource
you can point people to: Richard Steenbergen's "A Practical Guide to
(Correctly) Troubleshooting with Traceroute":
http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N45.pdf

<rant>
People really do think IP networks consist of "magic smoke" and "just
magically work".  Basic and mid-level TCP/IP concepts are really not
that hard to grasp; for example, you can understand how IP and TCP
actually work/behave (understand the handshake, understand ACK,
understand FIN, understand RST) without having to understand, say,
RFC1323 or necessarily the intricacies of BGP.  I wish more people
would get their hands dirty with tcpdump/snoop/Wireshark too.
</rant>


[1]: I realise this quickly gets into a discussion of just how
"dedicated" a end-to-end circuit is.  For example, many providers will
happily sell you such a circuit but won't bother to disclose that it's
just a virtual circuit tunnelled across the same physical pipe their use
for their Internet traffic, meaning aside from an SLA (which in my
experience amounts to very little when dealing with The Big Boys(tm)),
you might as well use the Internet.  This is becoming the norm these
days; everyone seems to resell competitors' junk.  "So you're telling me
our OC48 with SevenMileSnakeNet went down because of Level 3?  Last week
it was because of Qwest!  Why shouldn't we just buy a circuit from
them directly?!"  You get the idea.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


More information about the Outages-discussion mailing list