[outages] SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks

Jeremy Chadwick outages at jdc.parodius.com
Wed Apr 11 14:12:23 EDT 2012


On Wed, Apr 11, 2012 at 01:38:49PM -0400, Adam Rothschild wrote:
> That your collocation provider is a) load-balancing by hashing traffic
> between two disparate transit ASes (with very different interprovider
> connectivity, performance characteristics, ...), on a per-IP basis b)
> running its transit ports hot, by its own admission... does not
> inspire confidence in its ability to deliver reliable service.
> 
> At this juncture, it seems like we've sufficiently beaten this horse
> and established that there are no known issues between the backbone
> providers you've called out in this thread, from the information
> provided to date.
> 
> The good news about the Bay Area is that it's a competitive
> marketplace, with no shortage of competent providers selling
> collocation and IP.  Probably off-topic for discussion on this list,
> though I'd be happy to recommend some offline if you're coming up
> short.

Adam,

Sorry, but it doesn't help.  The recurring latency I've reported is
quite real.  As I said, I have months of mtrs/traceroutes from both
directions showing this problem.  Asymmetric routing does not/can not
explain the chronic high latency seen at roughly the same times nearly
every day.  I wish it was ICMP prioritisation (I really do).

I have done as much work as I can on documenting the recurring nature of
the problem, where its seen (either between Level 3 and AS7151, or
between Abovenet and Comcast), when it starts, and when it ends.  As I
am the customer on *both ends* (src and dst), the fact that I'm getting
no where is preposterous.  Given that I do not have access to Abovenet,
Comcast, Level 3, or my co-lo providers' routers, I'm forced to rely on
the competency of others.

Nobody has "beaten this horse" -- the horse is still there, blocking the
road, its corpse rotting and festering, affecting network traffic.  Who
hauls it into the road every day from roughly 17:00 to 21:00 PDT is
unknown.  Instead, all that's happened is folks focusing on the
asymmetric aspect of routing, and with my co-lo provider's choice to
siphon certain IPs through certain routes.

As for the Bay Area having "no shortage of competent providers selling
co-location": let me know when that happens.  All I've seen so far is
complete and total incompetence on the parts of co-lo providers (not
only our current but our previous as well), transit and peering
providers, and many other divisions.  Honest: do not get me started on
this.  Please do not.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |
> 
> On Wed, Apr 11, 2012 at 3:12 AM, Jeremy Chadwick
> <outages at jdc.parodius.com> wrote:
> > I guess there's no need for anyone to do this. ?I completely forgot that
> > Abovenet has a looking glass.
> >
> > They absolutely see a route announcement for 72.20.96.0/19 from AS7151,
> > including from mpr4.sjc7.us.above.net (keep reading):
> >
> > Per http://lg.above.net/lg.cgi --
> >
> > Router: mpr4.sjc7.us.above.net
> > Command: show route protocol bgp table inet.0 72.20.96.0/19 terse exact
> >
> > inet.0: 404034 destinations, 2133715 routes (403943 active, 108 holddown, 1638 hidden)
> > Restart Complete
> > + = Active Route, - = Last Active, * = Both
> >
> > A Destination ? ? ? ?P Prf ? Metric 1 ? Metric 2 ?Next hop ? ? ? ?AS path
> > * 72.20.96.0/19 ? ? ?B 170 ? ? ? ?200 ? ? ? ? ?0 >64.125.27.94 ? ?7151 I
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?64.125.27.85
> >
> > Peering point confirmation (egress traceroute run from 72.20.98.124
> > destined to 67.180.84.87):
> >
> > traceroute to 67.180.84.87 (67.180.84.87), 64 hops max, 52 byte packets
> > ?1 ?72.20.98.65 (72.20.98.65) ?0.354 ms ?0.232 ms ?0.362 ms
> > ?2 ?er1sc2.bayarea.net (69.163.64.44) ?0.363 ms ?0.258 ms ?0.243 ms
> > ?3 ?er2sc2.bayarea.net (69.163.65.49) ?0.489 ms ?0.438 ms *
> > ?4 ?xe-7-1-0.er1.sjc2.above.net (64.124.65.93) ?0.527 ms ?0.476 ms ?0.488 ms
> > ?5 ?xe-4-0-0.cr1.sjc2.us.above.net (64.125.28.54) ?1.650 ms ?0.711 ms ?1.087 ms
> > ?6 ?xe-0-0-0.cr2.sjc2.us.above.net (64.125.30.126) ?0.879 ms ?0.876 ms ?0.735 ms
> > ?7 ?xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) ?1.156 ms ?1.104 ms ?1.121 ms
> > ?8 ?be-10-403-pe01.11greatoaks.ca.ibone.comcast.net (75.149.228.133) ?7.601 ms ?11.717 ms ?11.968 ms
> > ?9 ?pos-2-1-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.65) ?6.686 ms ?3.310 ms ?3.851 ms
> > 10 ?pos-0-14-0-0-ar01.sfsutro.ca.sfba.comcast.net (68.86.90.158) ?8.716 ms ?7.414 ms ?8.343 ms
> > 11 ?te-9-8-ur03.santaclara.ca.sfba.comcast.net (68.86.143.93) ?5.722 ms ?5.975 ms ?5.698 ms
> > 12 ?68.85.191.250 (68.85.191.250) ?10.868 ms ?13.714 ms ?7.968 ms
> > 13 ?c-67-180-84-87.hsd1.ca.comcast.net (67.180.84.87) ?16.969 ms ?48.121 ms ?15.588 ms
> >
> > So what Comcast's "backbone team" told me appears to be incorrect (we're
> > all human), or there are route filters being applied, or they don't get
> > a full routing table from Abovenet -- unknown which. ?I'm still talking
> > to them about that, but probably won't get an answer until later
> > tomorrow.
> >
> > I still have a ticket open with my co-lo provider to investigate the
> > Level 3 link they have. ?That's just as much of a possibility of an
> > saturation point as the Abovenet/Comcast link is.
> >
> > Abovenet's LG also offers ping capability, so I should be able to use
> > that as a way to narrow down/confirm if the problem is there or with the
> > Level 3<->BAIS link. ?Will find out tomorrow...
> >
> > --
> > | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> > | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> > | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> > | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >
> > On Tue, Apr 10, 2012 at 10:35:31PM -0700, Jeremy Chadwick wrote:
> >> Following up to my own post (in bad habit):
> >>
> >> Can some folks here who have peering with Abovenet (preferably with a
> >> full routing table) verify that you see an announcement for
> >> 72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?
> >>
> >> I've confirmed this is the case at my workplace, but I want extra
> >> eyes/verification.
> >>
> >> Thanks.
> >>
> >> --
> >> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> >> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> >> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> >> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >>
> >> On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
> >> > I choose not to do DNS resolution in mtr because otherwise the terminal
> >> > width required to see FQDNs has to be >76 characters, which often upsets
> >> > mailing list folks. ?(This for example is one of the few lists on which
> >> > I top-post)
> >> >
> >> > I'm not so concerned with the packet loss -- for example in the 2nd mtr
> >> > set I showed, the loss only seems to happen at routers, which is almost
> >> > certainly the result of ICMP prioritisation.
> >> >
> >> > But the latency is a definite problem and is easily noticeable across
> >> > SSH, Remote Desktop, and any other TCP service (i.e. the latency shown
> >> > is not a result of ICMP prioritisation). ?src/dst IPs on both sides are
> >> > actual servers/boxes, not routers.
> >> >
> >> > But you're absolutely right -- asymmetric routing is in place here,
> >> > which means that everyone has to work together, and simultaneously, to
> >> > really figure out where the problem is. ?I can only do so much when I
> >> > have little to no visibility into things (e.g. if I had access to BAIS
> >> > and Abovenet and Level 3 and Comcast routers I could figure out where
> >> > the problem is... ;-) )
> >> >
> >> > I'm currently engaged in a conversation with Comcast engineers about
> >> > this issue. ?(Seems my DSLR post got proper attention)
> >> >
> >> > So far the statement is that they've looked at the interface for the
> >> > Abovenet/Comcast peering point in question, and although it's being
> >> > used/busy, it's not oversaturated. ?They also pointed out that the only
> >> > announcements they see for 72.20.96.0/19 are via Level 3 and Cogent,
> >> > thus the issue is likely to be on my co-lo providers' side (e.g. the
> >> > Level 3 <-> BAIS link). ?route-views also confirms the same thing, as
> >> > does my place of work (who has peering with Abovenet natively).
> >> >
> >> > I have a ticket open with my co-lo provider to investigate this ordeal.
> >> >
> >> > If this does turn out to be a problem with their Level 3 link being
> >> > saturated chronically, then I owe Comcast/Abovenet an apology (welcome
> >> > to one of the complexities with asymmetric routing!), and I'm going to
> >> > have to make some decisions with regards to co-location and so on,
> >> > because the chronic nature of this problem is unacceptable for myself as
> >> > well as my customers.
> >> >
> >> > --
> >> > | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> >> > | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> >> > | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> >> > | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >> >
> >> > On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
> >> > > A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
> >> > >
> >> > > Feel free to put me in my place, but please do so on -discuss.
> >> > >
> >> > > On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages at jdc.parodius.com> wrote:
> >> > >
> >> > > > Hi Ren,
> >> > > >
> >> > > > The issue with my co-lo pertaining to route announcements has actually
> >> > > > been "dealt with", meaning "this is just how it is". ?I'm wondering if I
> >> > > > can go into details without violating contractual obligations, hmm.
> >> > > > Yes, I imagine I can, because it becomes quite obvious if I provide
> >> > > > traceroutes from both directions, and that's public knowledge.
> >> > > >
> >> > > > It appears that my co-lo (BAIS) doesn't actually adjust route
> >> > > > announcements on a per-IP basis, but they internally have a hashing
> >> > > > algorithm in place where on a per-IP basis different addresses utilise
> >> > > > different network paths. ?I still have an open ticket with their senior
> >> > > > networking engineer about this, who has been somewhat "careful" in what
> >> > > > he tells me, but so far I've basically gotten confirmation that this is
> >> > > > indeed how they do their load-balancing for customers to balance out
> >> > > > network traffic between all of their peering providers (Level 3,
> >> > > > Abovenet, Cogent, and 2-3 others).
> >> > > >
> >> > > > I can provide those examples (to/from different IPs) if you want to see
> >> > > > them, but that is a separate matter. ?There still seems to be a problem
> >> > > > between Abovenet/Comcast. ?Alternate links/paths through my co-lo (e.g.
> >> > > > BAIS/Cogent) show no problems on the ingress or egress path -- the
> >> > > > common path seems to be Abovenet/Comcast when there are problems.
> >> > > >
> >> > > > This is what's presently happening right now:
> >> > > >
> >> > > > Source IP: 67.180.84.87
> >> > > > Dest IP: ? 72.20.98.124
> >> > > >
> >> > > > === Tue Apr 10 19:09:00 PDT 2012 ?(1334110140)
> >> > > > HOST: icarus.home.lan ? ? ? ? ? ? Loss% ? Snt ? Rcv ?Last ? Avg ?Best ?Wrst
> >> > > > ?1.|-- 192.168.1.1 ? ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ? 0.3 ? 0.6 ? 0.2 ? 1.5
> >> > > > ?2.|-- 67.180.84.1 ? ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ?24.5 ?22.7 ?10.4 ?54.0
> >> > > > ?3.|-- 68.85.191.253 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ?10.2 ?11.1 ? 8.4 ?25.5
> >> > > > ?4.|-- 68.86.143.98 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?15.6 ?16.4 ?11.1 ?34.7
> >> > > > ?5.|-- 68.86.91.5 ? ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?14.3 ?18.7 ?12.4 ?49.7
> >> > > > ?6.|-- 68.86.87.182 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?17.1 ?19.4 ?14.4 ?51.6
> >> > > > ?7.|-- 4.71.118.45 ? ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ?14.3 ?23.7 ?13.0 ?77.9
> >> > > > ?8.|-- 4.69.152.148 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?67.5 ?27.1 ?13.3 128.0
> >> > > > ?9.|-- 4.53.16.18 ? ? ? ? ? ? ? ? 5.0% ? ?40 ? ?38 151.6 153.3 133.3 184.1
> >> > > > 10.|-- 69.163.65.39 ? ? ? ? ? ? ? 2.5% ? ?40 ? ?39 176.3 155.8 135.3 198.1
> >> > > > 11.|-- 72.20.98.124 ? ? ? ? ? ? ? 5.0% ? ?40 ? ?38 205.3 152.0 129.8 205.3
> >> > > > === END
> >> > > >
> >> > > >
> >> > > > Source IP: 72.20.98.124
> >> > > > Dest IP: ? 67.180.84.87
> >> > > >
> >> > > > === Tue Apr 10 19:09:00 PDT 2012 ?(1334110140)
> >> > > > HOST: isis.parodius.com ? ? ? ? ? Loss% ? Snt ? Rcv ?Last ? Avg ?Best ?Wrst
> >> > > > ?1.|-- 72.20.98.65 ? ? ? ? ? ? ? ?0.0% ? ?41 ? ?41 0.4 0.4 0.3 0.6
> >> > > > ?2.|-- 69.163.64.44 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ? 0.4 ? 0.4 ? 0.3 ? 0.5
> >> > > > ?3.|-- 69.163.65.49 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ? 0.6 ?10.6 ? 0.4 ?76.8
> >> > > > ?4.|-- 64.124.65.93 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?65.6 ? 3.6 ? 0.4 ?65.6
> >> > > > ?5.|-- 64.125.28.54 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ? 2.8 ? 4.2 ? 0.7 ?51.7
> >> > > > ?6.|-- 64.125.30.126 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ? 0.8 ? 1.4 ? 0.7 ?16.7
> >> > > > ?7.|-- 64.125.30.178 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ? 1.1 ? 5.8 ? 1.1 ?65.5
> >> > > > ?8.|-- 75.149.228.133 ? ? ? ? ? ? 0.0% ? ?40 ? ?40 148.7 136.9 117.4 150.2
> >> > > > ?9.|-- 68.86.85.65 ? ? ? ? ? ? ? ?5.0% ? ?40 ? ?38 139.3 135.5 119.9 148.3
> >> > > > 10.|-- 68.86.90.158 ? ? ? ? ? ? ? 2.5% ? ?40 ? ?39 141.6 138.1 120.0 149.8
> >> > > > 11.|-- 68.86.143.93 ? ? ? ? ? ? ? 2.5% ? ?40 ? ?39 140.3 136.8 120.5 149.8
> >> > > > 12.|-- 68.85.191.250 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 150.8 144.8 128.7 159.5
> >> > > > 13.|-- 67.180.84.87 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 146.7 149.6 132.2 173.9
> >> > > > === END
> >> > > >
> >> > > > $ host 64.125.30.178
> >> > > > 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
> >> > > >
> >> > > > $ host 75.149.228.133
> >> > > > 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
> >> > > >
> >> > > > So Ren, if you can investigate this, I would be appreciative of it.
> >> > > >
> >> > > > --
> >> > > > | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> >> > > > | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> >> > > > | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> >> > > > | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >> > > >
> >> > > > On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
> >> > > >> Hi Jeremy,
> >> > > >>
> >> > > >> When the issue was raised a week or two ago there seemed to be a route
> >> > > >> announcement issue for 72.20.98.67. ?When your colo provider changed
> >> > > >> their policy did they update filters with their upstream?
> >> > > >>
> >> > > >> Cheers, -ren, who will confirm there is no congestion with Abovenet on
> >> > > >> the port in SJC to Comcast.
> >> > > >>
> >> > > >> On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick
> >> > > >> <outages at jdc.parodius.com> wrote:
> >> > > >>> There's an issue I've been tracking for a few months now pertaining to a
> >> > > >>> network link between Abovenet and Comcast which appears to become
> >> > > >>> saturated (or impacted negatively in some way) at nearly the same time
> >> > > >>> every night, and lasts for numerous hours, then ceases -- on a
> >> > > >>> near-daily basis (no exaggeration).
> >> > > >>>
> >> > > >>> Latency and packet loss occur during this time, with latency hitting
> >> > > >>> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%.
> >> > > >>> I've been storing periodic traceroutes/mtrs for over a month showing
> >> > > >>> this problem, and been tracking start/end times as well.
> >> > > >>>
> >> > > >>> Thankfully I own devices/have connectivity on both ends (src and dst,
> >> > > >>> thus can provide mtrs/traceroutes from both directions. ?Analysis so
> >> > > >>> far, done by myself as well as senior network techs at my co-lo
> >> > > >>> provider, confirms this issue is with a link between Abovenet/Comcast,
> >> > > >>> likely within the San Jose Great Oaks POP (which I'm familiar with as
> >> > > >>> part of my job).
> >> > > >>>
> >> > > >>> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only
> >> > > >>> Comcast employees can respond to/view tickets for) over a month ago.
> >> > > >>> Someone has been viewing it, but nobody has replied except me.
> >> > > >>>
> >> > > >>> I've since made the issue public, where (of course) the general Internet
> >> > > >>> community does not quite understand how peering arrangements/contracts
> >> > > >>> work (people think that any company who has a contract with Abovenet can
> >> > > >>> report issues, but that is simply not the case; you must be a POC for
> >> > > >>> the transport to report issues with it), nor do they understand how a
> >> > > >>> co-lo provider changing route preferencing can impact the provider
> >> > > >>> financially (based on billing metrics, etc.). ?My co-lo provider is very
> >> > > >>> strict with their routing policies, and it has to do with financial
> >> > > >>> reasons that are their own business, not mine.
> >> > > >>>
> >> > > >>> The public thread is here, which also includes start/end times,
> >> > > >>> traceroutes (both directions), and so on. ?I update it every day when
> >> > > >>> the issue happens, and ~90% of the time edit my posts when the issue
> >> > > >>> ends.
> >> > > >>>
> >> > > >>> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-latency-nightly
> >> > > >>>
> >> > > >>> Anyway, all the technical details aside:
> >> > > >>>
> >> > > >>> Is there anyone on this list who works for Comcast who can contact me
> >> > > >>> off-list who is willing to investigate this and drive it to completion?
> >> > > >>>
> >> > > >>> An alternative would be for someone to contact me off-list with the name
> >> > > >>> or Email address of someone (or division) who handles issues like this
> >> > > >>> at Comcast. ?I'd love for Abovenet to get involved, but I have no
> >> > > >>> contractual obligation to them. ?(If there is an Abovenet individual who
> >> > > >>> is willing to investigate this "pro bono" per se, that would be
> >> > > >>> awesome, but I imagine such is often above one's pay grade).
> >> > > >>>
> >> > > >>> --
> >> > > >>> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> >> > > >>> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> >> > > >>> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> >> > > >>> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> >> > > >>>
> >> > > >>> _______________________________________________
> >> > > >>> Outages mailing list
> >> > > >>> Outages at outages.org
> >> > > >>> https://puck.nether.net/mailman/listinfo/outages
> >> > > > _______________________________________________
> >> > > > Outages mailing list
> >> > > > Outages at outages.org
> >> > > > https://puck.nether.net/mailman/listinfo/outages
> >> > _______________________________________________
> >> > Outages mailing list
> >> > Outages at outages.org
> >> > https://puck.nether.net/mailman/listinfo/outages
> >> _______________________________________________
> >> Outages mailing list
> >> Outages at outages.org
> >> https://puck.nether.net/mailman/listinfo/outages
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages



More information about the Outages mailing list