[outages] SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks

Jeremy Chadwick outages at jdc.parodius.com
Wed Apr 11 03:12:27 EDT 2012


I guess there's no need for anyone to do this.  I completely forgot that
Abovenet has a looking glass.

They absolutely see a route announcement for 72.20.96.0/19 from AS7151,
including from mpr4.sjc7.us.above.net (keep reading):

Per http://lg.above.net/lg.cgi --

Router: mpr4.sjc7.us.above.net
Command: show route protocol bgp table inet.0 72.20.96.0/19 terse exact

inet.0: 404034 destinations, 2133715 routes (403943 active, 108 holddown, 1638 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

A Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
* 72.20.96.0/19      B 170        200          0 >64.125.27.94    7151 I
                                                  64.125.27.85

Peering point confirmation (egress traceroute run from 72.20.98.124
destined to 67.180.84.87):

traceroute to 67.180.84.87 (67.180.84.87), 64 hops max, 52 byte packets
 1  72.20.98.65 (72.20.98.65)  0.354 ms  0.232 ms  0.362 ms
 2  er1sc2.bayarea.net (69.163.64.44)  0.363 ms  0.258 ms  0.243 ms
 3  er2sc2.bayarea.net (69.163.65.49)  0.489 ms  0.438 ms *
 4  xe-7-1-0.er1.sjc2.above.net (64.124.65.93)  0.527 ms  0.476 ms  0.488 ms
 5  xe-4-0-0.cr1.sjc2.us.above.net (64.125.28.54)  1.650 ms  0.711 ms  1.087 ms
 6  xe-0-0-0.cr2.sjc2.us.above.net (64.125.30.126)  0.879 ms  0.876 ms  0.735 ms
 7  xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178)  1.156 ms  1.104 ms  1.121 ms
 8  be-10-403-pe01.11greatoaks.ca.ibone.comcast.net (75.149.228.133)  7.601 ms  11.717 ms  11.968 ms
 9  pos-2-1-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.65)  6.686 ms  3.310 ms  3.851 ms
10  pos-0-14-0-0-ar01.sfsutro.ca.sfba.comcast.net (68.86.90.158)  8.716 ms  7.414 ms  8.343 ms
11  te-9-8-ur03.santaclara.ca.sfba.comcast.net (68.86.143.93)  5.722 ms  5.975 ms  5.698 ms
12  68.85.191.250 (68.85.191.250)  10.868 ms  13.714 ms  7.968 ms
13  c-67-180-84-87.hsd1.ca.comcast.net (67.180.84.87)  16.969 ms  48.121 ms  15.588 ms

So what Comcast's "backbone team" told me appears to be incorrect (we're
all human), or there are route filters being applied, or they don't get
a full routing table from Abovenet -- unknown which.  I'm still talking
to them about that, but probably won't get an answer until later
tomorrow.

I still have a ticket open with my co-lo provider to investigate the
Level 3 link they have.  That's just as much of a possibility of an
saturation point as the Abovenet/Comcast link is.

Abovenet's LG also offers ping capability, so I should be able to use
that as a way to narrow down/confirm if the problem is there or with the
Level 3<->BAIS link.  Will find out tomorrow...

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Tue, Apr 10, 2012 at 10:35:31PM -0700, Jeremy Chadwick wrote:
> Following up to my own post (in bad habit):
> 
> Can some folks here who have peering with Abovenet (preferably with a
> full routing table) verify that you see an announcement for
> 72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?
> 
> I've confirmed this is the case at my workplace, but I want extra
> eyes/verification.
> 
> Thanks.
> 
> -- 
> | Jeremy Chadwick                              jdc at parodius.com |
> | Parodius Networking                     http://www.parodius.com/ |
> | UNIX Systems Administrator                 Mountain View, CA, US |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
> 
> On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
> > I choose not to do DNS resolution in mtr because otherwise the terminal
> > width required to see FQDNs has to be >76 characters, which often upsets
> > mailing list folks.  (This for example is one of the few lists on which
> > I top-post)
> > 
> > I'm not so concerned with the packet loss -- for example in the 2nd mtr
> > set I showed, the loss only seems to happen at routers, which is almost
> > certainly the result of ICMP prioritisation.
> > 
> > But the latency is a definite problem and is easily noticeable across
> > SSH, Remote Desktop, and any other TCP service (i.e. the latency shown
> > is not a result of ICMP prioritisation).  src/dst IPs on both sides are
> > actual servers/boxes, not routers.
> > 
> > But you're absolutely right -- asymmetric routing is in place here,
> > which means that everyone has to work together, and simultaneously, to
> > really figure out where the problem is.  I can only do so much when I
> > have little to no visibility into things (e.g. if I had access to BAIS
> > and Abovenet and Level 3 and Comcast routers I could figure out where
> > the problem is... ;-) )
> > 
> > I'm currently engaged in a conversation with Comcast engineers about
> > this issue.  (Seems my DSLR post got proper attention)
> > 
> > So far the statement is that they've looked at the interface for the
> > Abovenet/Comcast peering point in question, and although it's being
> > used/busy, it's not oversaturated.  They also pointed out that the only
> > announcements they see for 72.20.96.0/19 are via Level 3 and Cogent,
> > thus the issue is likely to be on my co-lo providers' side (e.g. the
> > Level 3 <-> BAIS link).  route-views also confirms the same thing, as
> > does my place of work (who has peering with Abovenet natively).
> > 
> > I have a ticket open with my co-lo provider to investigate this ordeal.
> > 
> > If this does turn out to be a problem with their Level 3 link being
> > saturated chronically, then I owe Comcast/Abovenet an apology (welcome
> > to one of the complexities with asymmetric routing!), and I'm going to
> > have to make some decisions with regards to co-location and so on,
> > because the chronic nature of this problem is unacceptable for myself as
> > well as my customers.
> > 
> > -- 
> > | Jeremy Chadwick                              jdc at parodius.com |
> > | Parodius Networking                     http://www.parodius.com/ |
> > | UNIX Systems Administrator                 Mountain View, CA, US |
> > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> > 
> > On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
> > > A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples. 
> > > 
> > > Feel free to put me in my place, but please do so on -discuss. 
> > > 
> > > On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages at jdc.parodius.com> wrote:
> > > 
> > > > Hi Ren,
> > > > 
> > > > The issue with my co-lo pertaining to route announcements has actually
> > > > been "dealt with", meaning "this is just how it is".  I'm wondering if I
> > > > can go into details without violating contractual obligations, hmm.
> > > > Yes, I imagine I can, because it becomes quite obvious if I provide
> > > > traceroutes from both directions, and that's public knowledge.
> > > > 
> > > > It appears that my co-lo (BAIS) doesn't actually adjust route
> > > > announcements on a per-IP basis, but they internally have a hashing
> > > > algorithm in place where on a per-IP basis different addresses utilise
> > > > different network paths.  I still have an open ticket with their senior
> > > > networking engineer about this, who has been somewhat "careful" in what
> > > > he tells me, but so far I've basically gotten confirmation that this is
> > > > indeed how they do their load-balancing for customers to balance out
> > > > network traffic between all of their peering providers (Level 3,
> > > > Abovenet, Cogent, and 2-3 others).
> > > > 
> > > > I can provide those examples (to/from different IPs) if you want to see
> > > > them, but that is a separate matter.  There still seems to be a problem
> > > > between Abovenet/Comcast.  Alternate links/paths through my co-lo (e.g.
> > > > BAIS/Cogent) show no problems on the ingress or egress path -- the
> > > > common path seems to be Abovenet/Comcast when there are problems.
> > > > 
> > > > This is what's presently happening right now:
> > > > 
> > > > Source IP: 67.180.84.87
> > > > Dest IP:   72.20.98.124
> > > > 
> > > > === Tue Apr 10 19:09:00 PDT 2012  (1334110140)
> > > > HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> > > >  1.|-- 192.168.1.1                0.0%    40    40   0.3   0.6   0.2   1.5
> > > >  2.|-- 67.180.84.1                0.0%    40    40  24.5  22.7  10.4  54.0
> > > >  3.|-- 68.85.191.253              0.0%    40    40  10.2  11.1   8.4  25.5
> > > >  4.|-- 68.86.143.98               0.0%    40    40  15.6  16.4  11.1  34.7
> > > >  5.|-- 68.86.91.5                 0.0%    40    40  14.3  18.7  12.4  49.7
> > > >  6.|-- 68.86.87.182               0.0%    40    40  17.1  19.4  14.4  51.6
> > > >  7.|-- 4.71.118.45                0.0%    40    40  14.3  23.7  13.0  77.9
> > > >  8.|-- 4.69.152.148               0.0%    40    40  67.5  27.1  13.3 128.0
> > > >  9.|-- 4.53.16.18                 5.0%    40    38 151.6 153.3 133.3 184.1
> > > > 10.|-- 69.163.65.39               2.5%    40    39 176.3 155.8 135.3 198.1
> > > > 11.|-- 72.20.98.124               5.0%    40    38 205.3 152.0 129.8 205.3
> > > > === END
> > > > 
> > > > 
> > > > Source IP: 72.20.98.124
> > > > Dest IP:   67.180.84.87
> > > > 
> > > > === Tue Apr 10 19:09:00 PDT 2012  (1334110140)
> > > > HOST: isis.parodius.com           Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> > > >  1.|-- 72.20.98.65                0.0%    41    41   0.4   0.4   0.3   0.6
> > > >  2.|-- 69.163.64.44               0.0%    40    40   0.4   0.4   0.3   0.5
> > > >  3.|-- 69.163.65.49               0.0%    40    40   0.6  10.6   0.4  76.8
> > > >  4.|-- 64.124.65.93               0.0%    40    40  65.6   3.6   0.4  65.6
> > > >  5.|-- 64.125.28.54               0.0%    40    40   2.8   4.2   0.7  51.7
> > > >  6.|-- 64.125.30.126              0.0%    40    40   0.8   1.4   0.7  16.7
> > > >  7.|-- 64.125.30.178              0.0%    40    40   1.1   5.8   1.1  65.5
> > > >  8.|-- 75.149.228.133             0.0%    40    40 148.7 136.9 117.4 150.2
> > > >  9.|-- 68.86.85.65                5.0%    40    38 139.3 135.5 119.9 148.3
> > > > 10.|-- 68.86.90.158               2.5%    40    39 141.6 138.1 120.0 149.8
> > > > 11.|-- 68.86.143.93               2.5%    40    39 140.3 136.8 120.5 149.8
> > > > 12.|-- 68.85.191.250              0.0%    40    40 150.8 144.8 128.7 159.5
> > > > 13.|-- 67.180.84.87               0.0%    40    40 146.7 149.6 132.2 173.9
> > > > === END
> > > > 
> > > > $ host 64.125.30.178
> > > > 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
> > > > 
> > > > $ host 75.149.228.133
> > > > 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
> > > > 
> > > > So Ren, if you can investigate this, I would be appreciative of it.
> > > > 
> > > > -- 
> > > > | Jeremy Chadwick                              jdc at parodius.com |
> > > > | Parodius Networking                     http://www.parodius.com/ |
> > > > | UNIX Systems Administrator                 Mountain View, CA, US |
> > > > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> > > > 
> > > > On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
> > > >> Hi Jeremy,
> > > >> 
> > > >> When the issue was raised a week or two ago there seemed to be a route
> > > >> announcement issue for 72.20.98.67.  When your colo provider changed
> > > >> their policy did they update filters with their upstream?
> > > >> 
> > > >> Cheers, -ren, who will confirm there is no congestion with Abovenet on
> > > >> the port in SJC to Comcast.
> > > >> 
> > > >> On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick
> > > >> <outages at jdc.parodius.com> wrote:
> > > >>> There's an issue I've been tracking for a few months now pertaining to a
> > > >>> network link between Abovenet and Comcast which appears to become
> > > >>> saturated (or impacted negatively in some way) at nearly the same time
> > > >>> every night, and lasts for numerous hours, then ceases -- on a
> > > >>> near-daily basis (no exaggeration).
> > > >>> 
> > > >>> Latency and packet loss occur during this time, with latency hitting
> > > >>> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%.
> > > >>> I've been storing periodic traceroutes/mtrs for over a month showing
> > > >>> this problem, and been tracking start/end times as well.
> > > >>> 
> > > >>> Thankfully I own devices/have connectivity on both ends (src and dst,
> > > >>> thus can provide mtrs/traceroutes from both directions. ?Analysis so
> > > >>> far, done by myself as well as senior network techs at my co-lo
> > > >>> provider, confirms this issue is with a link between Abovenet/Comcast,
> > > >>> likely within the San Jose Great Oaks POP (which I'm familiar with as
> > > >>> part of my job).
> > > >>> 
> > > >>> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only
> > > >>> Comcast employees can respond to/view tickets for) over a month ago.
> > > >>> Someone has been viewing it, but nobody has replied except me.
> > > >>> 
> > > >>> I've since made the issue public, where (of course) the general Internet
> > > >>> community does not quite understand how peering arrangements/contracts
> > > >>> work (people think that any company who has a contract with Abovenet can
> > > >>> report issues, but that is simply not the case; you must be a POC for
> > > >>> the transport to report issues with it), nor do they understand how a
> > > >>> co-lo provider changing route preferencing can impact the provider
> > > >>> financially (based on billing metrics, etc.). ?My co-lo provider is very
> > > >>> strict with their routing policies, and it has to do with financial
> > > >>> reasons that are their own business, not mine.
> > > >>> 
> > > >>> The public thread is here, which also includes start/end times,
> > > >>> traceroutes (both directions), and so on. ?I update it every day when
> > > >>> the issue happens, and ~90% of the time edit my posts when the issue
> > > >>> ends.
> > > >>> 
> > > >>> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-latency-nightly
> > > >>> 
> > > >>> Anyway, all the technical details aside:
> > > >>> 
> > > >>> Is there anyone on this list who works for Comcast who can contact me
> > > >>> off-list who is willing to investigate this and drive it to completion?
> > > >>> 
> > > >>> An alternative would be for someone to contact me off-list with the name
> > > >>> or Email address of someone (or division) who handles issues like this
> > > >>> at Comcast. ?I'd love for Abovenet to get involved, but I have no
> > > >>> contractual obligation to them. ?(If there is an Abovenet individual who
> > > >>> is willing to investigate this "pro bono" per se, that would be
> > > >>> awesome, but I imagine such is often above one's pay grade).
> > > >>> 
> > > >>> --
> > > >>> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> > > >>> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> > > >>> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> > > >>> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> > > >>> 
> > > >>> _______________________________________________
> > > >>> Outages mailing list
> > > >>> Outages at outages.org
> > > >>> https://puck.nether.net/mailman/listinfo/outages
> > > > _______________________________________________
> > > > Outages mailing list
> > > > Outages at outages.org
> > > > https://puck.nether.net/mailman/listinfo/outages
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages



More information about the Outages mailing list