[outages] SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks

Jeremy Chadwick outages at jdc.parodius.com
Wed Apr 11 00:33:04 EDT 2012


I choose not to do DNS resolution in mtr because otherwise the terminal
width required to see FQDNs has to be >76 characters, which often upsets
mailing list folks.  (This for example is one of the few lists on which
I top-post)

I'm not so concerned with the packet loss -- for example in the 2nd mtr
set I showed, the loss only seems to happen at routers, which is almost
certainly the result of ICMP prioritisation.

But the latency is a definite problem and is easily noticeable across
SSH, Remote Desktop, and any other TCP service (i.e. the latency shown
is not a result of ICMP prioritisation).  src/dst IPs on both sides are
actual servers/boxes, not routers.

But you're absolutely right -- asymmetric routing is in place here,
which means that everyone has to work together, and simultaneously, to
really figure out where the problem is.  I can only do so much when I
have little to no visibility into things (e.g. if I had access to BAIS
and Abovenet and Level 3 and Comcast routers I could figure out where
the problem is... ;-) )

I'm currently engaged in a conversation with Comcast engineers about
this issue.  (Seems my DSLR post got proper attention)

So far the statement is that they've looked at the interface for the
Abovenet/Comcast peering point in question, and although it's being
used/busy, it's not oversaturated.  They also pointed out that the only
announcements they see for 72.20.96.0/19 are via Level 3 and Cogent,
thus the issue is likely to be on my co-lo providers' side (e.g. the
Level 3 <-> BAIS link).  route-views also confirms the same thing, as
does my place of work (who has peering with Abovenet natively).

I have a ticket open with my co-lo provider to investigate this ordeal.

If this does turn out to be a problem with their Level 3 link being
saturated chronically, then I owe Comcast/Abovenet an apology (welcome
to one of the complexities with asymmetric routing!), and I'm going to
have to make some decisions with regards to co-location and so on,
because the chronic nature of this problem is unacceptable for myself as
well as my customers.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
> A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples. 
> 
> Feel free to put me in my place, but please do so on -discuss. 
> 
> On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages at jdc.parodius.com> wrote:
> 
> > Hi Ren,
> > 
> > The issue with my co-lo pertaining to route announcements has actually
> > been "dealt with", meaning "this is just how it is".  I'm wondering if I
> > can go into details without violating contractual obligations, hmm.
> > Yes, I imagine I can, because it becomes quite obvious if I provide
> > traceroutes from both directions, and that's public knowledge.
> > 
> > It appears that my co-lo (BAIS) doesn't actually adjust route
> > announcements on a per-IP basis, but they internally have a hashing
> > algorithm in place where on a per-IP basis different addresses utilise
> > different network paths.  I still have an open ticket with their senior
> > networking engineer about this, who has been somewhat "careful" in what
> > he tells me, but so far I've basically gotten confirmation that this is
> > indeed how they do their load-balancing for customers to balance out
> > network traffic between all of their peering providers (Level 3,
> > Abovenet, Cogent, and 2-3 others).
> > 
> > I can provide those examples (to/from different IPs) if you want to see
> > them, but that is a separate matter.  There still seems to be a problem
> > between Abovenet/Comcast.  Alternate links/paths through my co-lo (e.g.
> > BAIS/Cogent) show no problems on the ingress or egress path -- the
> > common path seems to be Abovenet/Comcast when there are problems.
> > 
> > This is what's presently happening right now:
> > 
> > Source IP: 67.180.84.87
> > Dest IP:   72.20.98.124
> > 
> > === Tue Apr 10 19:09:00 PDT 2012  (1334110140)
> > HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> >  1.|-- 192.168.1.1                0.0%    40    40   0.3   0.6   0.2   1.5
> >  2.|-- 67.180.84.1                0.0%    40    40  24.5  22.7  10.4  54.0
> >  3.|-- 68.85.191.253              0.0%    40    40  10.2  11.1   8.4  25.5
> >  4.|-- 68.86.143.98               0.0%    40    40  15.6  16.4  11.1  34.7
> >  5.|-- 68.86.91.5                 0.0%    40    40  14.3  18.7  12.4  49.7
> >  6.|-- 68.86.87.182               0.0%    40    40  17.1  19.4  14.4  51.6
> >  7.|-- 4.71.118.45                0.0%    40    40  14.3  23.7  13.0  77.9
> >  8.|-- 4.69.152.148               0.0%    40    40  67.5  27.1  13.3 128.0
> >  9.|-- 4.53.16.18                 5.0%    40    38 151.6 153.3 133.3 184.1
> > 10.|-- 69.163.65.39               2.5%    40    39 176.3 155.8 135.3 198.1
> > 11.|-- 72.20.98.124               5.0%    40    38 205.3 152.0 129.8 205.3
> > === END
> > 
> > 
> > Source IP: 72.20.98.124
> > Dest IP:   67.180.84.87
> > 
> > === Tue Apr 10 19:09:00 PDT 2012  (1334110140)
> > HOST: isis.parodius.com           Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> >  1.|-- 72.20.98.65                0.0%    41    41   0.4   0.4   0.3   0.6
> >  2.|-- 69.163.64.44               0.0%    40    40   0.4   0.4   0.3   0.5
> >  3.|-- 69.163.65.49               0.0%    40    40   0.6  10.6   0.4  76.8
> >  4.|-- 64.124.65.93               0.0%    40    40  65.6   3.6   0.4  65.6
> >  5.|-- 64.125.28.54               0.0%    40    40   2.8   4.2   0.7  51.7
> >  6.|-- 64.125.30.126              0.0%    40    40   0.8   1.4   0.7  16.7
> >  7.|-- 64.125.30.178              0.0%    40    40   1.1   5.8   1.1  65.5
> >  8.|-- 75.149.228.133             0.0%    40    40 148.7 136.9 117.4 150.2
> >  9.|-- 68.86.85.65                5.0%    40    38 139.3 135.5 119.9 148.3
> > 10.|-- 68.86.90.158               2.5%    40    39 141.6 138.1 120.0 149.8
> > 11.|-- 68.86.143.93               2.5%    40    39 140.3 136.8 120.5 149.8
> > 12.|-- 68.85.191.250              0.0%    40    40 150.8 144.8 128.7 159.5
> > 13.|-- 67.180.84.87               0.0%    40    40 146.7 149.6 132.2 173.9
> > === END
> > 
> > $ host 64.125.30.178
> > 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
> > 
> > $ host 75.149.228.133
> > 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
> > 
> > So Ren, if you can investigate this, I would be appreciative of it.
> > 
> > -- 
> > | Jeremy Chadwick                              jdc at parodius.com |
> > | Parodius Networking                     http://www.parodius.com/ |
> > | UNIX Systems Administrator                 Mountain View, CA, US |
> > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> > 
> > On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
> >> Hi Jeremy,
> >> 
> >> When the issue was raised a week or two ago there seemed to be a route
> >> announcement issue for 72.20.98.67.  When your colo provider changed
> >> their policy did they update filters with their upstream?
> >> 
> >> Cheers, -ren, who will confirm there is no congestion with Abovenet on
> >> the port in SJC to Comcast.
> >> 
> >> On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick
> >> <outages at jdc.parodius.com> wrote:
> >>> There's an issue I've been tracking for a few months now pertaining to a
> >>> network link between Abovenet and Comcast which appears to become
> >>> saturated (or impacted negatively in some way) at nearly the same time
> >>> every night, and lasts for numerous hours, then ceases -- on a
> >>> near-daily basis (no exaggeration).
> >>> 
> >>> Latency and packet loss occur during this time, with latency hitting
> >>> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%.
> >>> I've been storing periodic traceroutes/mtrs for over a month showing
> >>> this problem, and been tracking start/end times as well.
> >>> 
> >>> Thankfully I own devices/have connectivity on both ends (src and dst,
> >>> thus can provide mtrs/traceroutes from both directions. ?Analysis so
> >>> far, done by myself as well as senior network techs at my co-lo
> >>> provider, confirms this issue is with a link between Abovenet/Comcast,
> >>> likely within the San Jose Great Oaks POP (which I'm familiar with as
> >>> part of my job).
> >>> 
> >>> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only
> >>> Comcast employees can respond to/view tickets for) over a month ago.
> >>> Someone has been viewing it, but nobody has replied except me.
> >>> 
> >>> I've since made the issue public, where (of course) the general Internet
> >>> community does not quite understand how peering arrangements/contracts
> >>> work (people think that any company who has a contract with Abovenet can
> >>> report issues, but that is simply not the case; you must be a POC for
> >>> the transport to report issues with it), nor do they understand how a
> >>> co-lo provider changing route preferencing can impact the provider
> >>> financially (based on billing metrics, etc.). ?My co-lo provider is very
> >>> strict with their routing policies, and it has to do with financial
> >>> reasons that are their own business, not mine.
> >>> 
> >>> The public thread is here, which also includes start/end times,
> >>> traceroutes (both directions), and so on. ?I update it every day when
> >>> the issue happens, and ~90% of the time edit my posts when the issue
> >>> ends.
> >>> 
> >>> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-latency-nightly
> >>> 
> >>> Anyway, all the technical details aside:
> >>> 
> >>> Is there anyone on this list who works for Comcast who can contact me
> >>> off-list who is willing to investigate this and drive it to completion?
> >>> 
> >>> An alternative would be for someone to contact me off-list with the name
> >>> or Email address of someone (or division) who handles issues like this
> >>> at Comcast. ?I'd love for Abovenet to get involved, but I have no
> >>> contractual obligation to them. ?(If there is an Abovenet individual who
> >>> is willing to investigate this "pro bono" per se, that would be
> >>> awesome, but I imagine such is often above one's pay grade).
> >>> 
> >>> --
> >>> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> >>> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> >>> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> >>> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >>> 
> >>> _______________________________________________
> >>> Outages mailing list
> >>> Outages at outages.org
> >>> https://puck.nether.net/mailman/listinfo/outages
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages



More information about the Outages mailing list