[outages] SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks
Jeremy Chadwick
outages at jdc.parodius.com
Wed Apr 11 00:33:04 EDT 2012
I choose not to do DNS resolution in mtr because otherwise the terminal
width required to see FQDNs has to be >76 characters, which often upsets
mailing list folks. (This for example is one of the few lists on which
I top-post)
I'm not so concerned with the packet loss -- for example in the 2nd mtr
set I showed, the loss only seems to happen at routers, which is almost
certainly the result of ICMP prioritisation.
But the latency is a definite problem and is easily noticeable across
SSH, Remote Desktop, and any other TCP service (i.e. the latency shown
is not a result of ICMP prioritisation). src/dst IPs on both sides are
actual servers/boxes, not routers.
But you're absolutely right -- asymmetric routing is in place here,
which means that everyone has to work together, and simultaneously, to
really figure out where the problem is. I can only do so much when I
have little to no visibility into things (e.g. if I had access to BAIS
and Abovenet and Level 3 and Comcast routers I could figure out where
the problem is... ;-) )
I'm currently engaged in a conversation with Comcast engineers about
this issue. (Seems my DSLR post got proper attention)
So far the statement is that they've looked at the interface for the
Abovenet/Comcast peering point in question, and although it's being
used/busy, it's not oversaturated. They also pointed out that the only
announcements they see for 72.20.96.0/19 are via Level 3 and Cogent,
thus the issue is likely to be on my co-lo providers' side (e.g. the
Level 3 <-> BAIS link). route-views also confirms the same thing, as
does my place of work (who has peering with Abovenet natively).
I have a ticket open with my co-lo provider to investigate this ordeal.
If this does turn out to be a problem with their Level 3 link being
saturated chronically, then I owe Comcast/Abovenet an apology (welcome
to one of the complexities with asymmetric routing!), and I'm going to
have to make some decisions with regards to co-location and so on,
because the chronic nature of this problem is unacceptable for myself as
well as my customers.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
> A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
>
> Feel free to put me in my place, but please do so on -discuss.
>
> On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages at jdc.parodius.com> wrote:
>
> > Hi Ren,
> >
> > The issue with my co-lo pertaining to route announcements has actually
> > been "dealt with", meaning "this is just how it is". I'm wondering if I
> > can go into details without violating contractual obligations, hmm.
> > Yes, I imagine I can, because it becomes quite obvious if I provide
> > traceroutes from both directions, and that's public knowledge.
> >
> > It appears that my co-lo (BAIS) doesn't actually adjust route
> > announcements on a per-IP basis, but they internally have a hashing
> > algorithm in place where on a per-IP basis different addresses utilise
> > different network paths. I still have an open ticket with their senior
> > networking engineer about this, who has been somewhat "careful" in what
> > he tells me, but so far I've basically gotten confirmation that this is
> > indeed how they do their load-balancing for customers to balance out
> > network traffic between all of their peering providers (Level 3,
> > Abovenet, Cogent, and 2-3 others).
> >
> > I can provide those examples (to/from different IPs) if you want to see
> > them, but that is a separate matter. There still seems to be a problem
> > between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g.
> > BAIS/Cogent) show no problems on the ingress or egress path -- the
> > common path seems to be Abovenet/Comcast when there are problems.
> >
> > This is what's presently happening right now:
> >
> > Source IP: 67.180.84.87
> > Dest IP: 72.20.98.124
> >
> > === Tue Apr 10 19:09:00 PDT 2012 (1334110140)
> > HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst
> > 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5
> > 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0
> > 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5
> > 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7
> > 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7
> > 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6
> > 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9
> > 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0
> > 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1
> > 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1
> > 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3
> > === END
> >
> >
> > Source IP: 72.20.98.124
> > Dest IP: 67.180.84.87
> >
> > === Tue Apr 10 19:09:00 PDT 2012 (1334110140)
> > HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst
> > 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6
> > 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5
> > 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8
> > 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6
> > 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7
> > 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7
> > 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5
> > 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2
> > 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3
> > 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8
> > 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8
> > 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5
> > 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9
> > === END
> >
> > $ host 64.125.30.178
> > 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
> >
> > $ host 75.149.228.133
> > 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
> >
> > So Ren, if you can investigate this, I would be appreciative of it.
> >
> > --
> > | Jeremy Chadwick jdc at parodius.com |
> > | Parodius Networking http://www.parodius.com/ |
> > | UNIX Systems Administrator Mountain View, CA, US |
> > | Making life hard for others since 1977. PGP 4BD6C0CB |
> >
> > On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
> >> Hi Jeremy,
> >>
> >> When the issue was raised a week or two ago there seemed to be a route
> >> announcement issue for 72.20.98.67. When your colo provider changed
> >> their policy did they update filters with their upstream?
> >>
> >> Cheers, -ren, who will confirm there is no congestion with Abovenet on
> >> the port in SJC to Comcast.
> >>
> >> On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick
> >> <outages at jdc.parodius.com> wrote:
> >>> There's an issue I've been tracking for a few months now pertaining to a
> >>> network link between Abovenet and Comcast which appears to become
> >>> saturated (or impacted negatively in some way) at nearly the same time
> >>> every night, and lasts for numerous hours, then ceases -- on a
> >>> near-daily basis (no exaggeration).
> >>>
> >>> Latency and packet loss occur during this time, with latency hitting
> >>> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%.
> >>> I've been storing periodic traceroutes/mtrs for over a month showing
> >>> this problem, and been tracking start/end times as well.
> >>>
> >>> Thankfully I own devices/have connectivity on both ends (src and dst,
> >>> thus can provide mtrs/traceroutes from both directions. ?Analysis so
> >>> far, done by myself as well as senior network techs at my co-lo
> >>> provider, confirms this issue is with a link between Abovenet/Comcast,
> >>> likely within the San Jose Great Oaks POP (which I'm familiar with as
> >>> part of my job).
> >>>
> >>> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only
> >>> Comcast employees can respond to/view tickets for) over a month ago.
> >>> Someone has been viewing it, but nobody has replied except me.
> >>>
> >>> I've since made the issue public, where (of course) the general Internet
> >>> community does not quite understand how peering arrangements/contracts
> >>> work (people think that any company who has a contract with Abovenet can
> >>> report issues, but that is simply not the case; you must be a POC for
> >>> the transport to report issues with it), nor do they understand how a
> >>> co-lo provider changing route preferencing can impact the provider
> >>> financially (based on billing metrics, etc.). ?My co-lo provider is very
> >>> strict with their routing policies, and it has to do with financial
> >>> reasons that are their own business, not mine.
> >>>
> >>> The public thread is here, which also includes start/end times,
> >>> traceroutes (both directions), and so on. ?I update it every day when
> >>> the issue happens, and ~90% of the time edit my posts when the issue
> >>> ends.
> >>>
> >>> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-latency-nightly
> >>>
> >>> Anyway, all the technical details aside:
> >>>
> >>> Is there anyone on this list who works for Comcast who can contact me
> >>> off-list who is willing to investigate this and drive it to completion?
> >>>
> >>> An alternative would be for someone to contact me off-list with the name
> >>> or Email address of someone (or division) who handles issues like this
> >>> at Comcast. ?I'd love for Abovenet to get involved, but I have no
> >>> contractual obligation to them. ?(If there is an Abovenet individual who
> >>> is willing to investigate this "pro bono" per se, that would be
> >>> awesome, but I imagine such is often above one's pay grade).
> >>>
> >>> --
> >>> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> >>> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> >>> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> >>> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >>>
> >>> _______________________________________________
> >>> Outages mailing list
> >>> Outages at outages.org
> >>> https://puck.nether.net/mailman/listinfo/outages
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages
More information about the Outages
mailing list