[outages] SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks

Jeremy Chadwick outages at jdc.parodius.com
Wed Apr 11 01:35:31 EDT 2012


Following up to my own post (in bad habit):

Can some folks here who have peering with Abovenet (preferably with a
full routing table) verify that you see an announcement for
72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?

I've confirmed this is the case at my workplace, but I want extra
eyes/verification.

Thanks.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
> I choose not to do DNS resolution in mtr because otherwise the terminal
> width required to see FQDNs has to be >76 characters, which often upsets
> mailing list folks.  (This for example is one of the few lists on which
> I top-post)
> 
> I'm not so concerned with the packet loss -- for example in the 2nd mtr
> set I showed, the loss only seems to happen at routers, which is almost
> certainly the result of ICMP prioritisation.
> 
> But the latency is a definite problem and is easily noticeable across
> SSH, Remote Desktop, and any other TCP service (i.e. the latency shown
> is not a result of ICMP prioritisation).  src/dst IPs on both sides are
> actual servers/boxes, not routers.
> 
> But you're absolutely right -- asymmetric routing is in place here,
> which means that everyone has to work together, and simultaneously, to
> really figure out where the problem is.  I can only do so much when I
> have little to no visibility into things (e.g. if I had access to BAIS
> and Abovenet and Level 3 and Comcast routers I could figure out where
> the problem is... ;-) )
> 
> I'm currently engaged in a conversation with Comcast engineers about
> this issue.  (Seems my DSLR post got proper attention)
> 
> So far the statement is that they've looked at the interface for the
> Abovenet/Comcast peering point in question, and although it's being
> used/busy, it's not oversaturated.  They also pointed out that the only
> announcements they see for 72.20.96.0/19 are via Level 3 and Cogent,
> thus the issue is likely to be on my co-lo providers' side (e.g. the
> Level 3 <-> BAIS link).  route-views also confirms the same thing, as
> does my place of work (who has peering with Abovenet natively).
> 
> I have a ticket open with my co-lo provider to investigate this ordeal.
> 
> If this does turn out to be a problem with their Level 3 link being
> saturated chronically, then I owe Comcast/Abovenet an apology (welcome
> to one of the complexities with asymmetric routing!), and I'm going to
> have to make some decisions with regards to co-location and so on,
> because the chronic nature of this problem is unacceptable for myself as
> well as my customers.
> 
> -- 
> | Jeremy Chadwick                              jdc at parodius.com |
> | Parodius Networking                     http://www.parodius.com/ |
> | UNIX Systems Administrator                 Mountain View, CA, US |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
> 
> On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
> > A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples. 
> > 
> > Feel free to put me in my place, but please do so on -discuss. 
> > 
> > On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages at jdc.parodius.com> wrote:
> > 
> > > Hi Ren,
> > > 
> > > The issue with my co-lo pertaining to route announcements has actually
> > > been "dealt with", meaning "this is just how it is".  I'm wondering if I
> > > can go into details without violating contractual obligations, hmm.
> > > Yes, I imagine I can, because it becomes quite obvious if I provide
> > > traceroutes from both directions, and that's public knowledge.
> > > 
> > > It appears that my co-lo (BAIS) doesn't actually adjust route
> > > announcements on a per-IP basis, but they internally have a hashing
> > > algorithm in place where on a per-IP basis different addresses utilise
> > > different network paths.  I still have an open ticket with their senior
> > > networking engineer about this, who has been somewhat "careful" in what
> > > he tells me, but so far I've basically gotten confirmation that this is
> > > indeed how they do their load-balancing for customers to balance out
> > > network traffic between all of their peering providers (Level 3,
> > > Abovenet, Cogent, and 2-3 others).
> > > 
> > > I can provide those examples (to/from different IPs) if you want to see
> > > them, but that is a separate matter.  There still seems to be a problem
> > > between Abovenet/Comcast.  Alternate links/paths through my co-lo (e.g.
> > > BAIS/Cogent) show no problems on the ingress or egress path -- the
> > > common path seems to be Abovenet/Comcast when there are problems.
> > > 
> > > This is what's presently happening right now:
> > > 
> > > Source IP: 67.180.84.87
> > > Dest IP:   72.20.98.124
> > > 
> > > === Tue Apr 10 19:09:00 PDT 2012  (1334110140)
> > > HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> > >  1.|-- 192.168.1.1                0.0%    40    40   0.3   0.6   0.2   1.5
> > >  2.|-- 67.180.84.1                0.0%    40    40  24.5  22.7  10.4  54.0
> > >  3.|-- 68.85.191.253              0.0%    40    40  10.2  11.1   8.4  25.5
> > >  4.|-- 68.86.143.98               0.0%    40    40  15.6  16.4  11.1  34.7
> > >  5.|-- 68.86.91.5                 0.0%    40    40  14.3  18.7  12.4  49.7
> > >  6.|-- 68.86.87.182               0.0%    40    40  17.1  19.4  14.4  51.6
> > >  7.|-- 4.71.118.45                0.0%    40    40  14.3  23.7  13.0  77.9
> > >  8.|-- 4.69.152.148               0.0%    40    40  67.5  27.1  13.3 128.0
> > >  9.|-- 4.53.16.18                 5.0%    40    38 151.6 153.3 133.3 184.1
> > > 10.|-- 69.163.65.39               2.5%    40    39 176.3 155.8 135.3 198.1
> > > 11.|-- 72.20.98.124               5.0%    40    38 205.3 152.0 129.8 205.3
> > > === END
> > > 
> > > 
> > > Source IP: 72.20.98.124
> > > Dest IP:   67.180.84.87
> > > 
> > > === Tue Apr 10 19:09:00 PDT 2012  (1334110140)
> > > HOST: isis.parodius.com           Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> > >  1.|-- 72.20.98.65                0.0%    41    41   0.4   0.4   0.3   0.6
> > >  2.|-- 69.163.64.44               0.0%    40    40   0.4   0.4   0.3   0.5
> > >  3.|-- 69.163.65.49               0.0%    40    40   0.6  10.6   0.4  76.8
> > >  4.|-- 64.124.65.93               0.0%    40    40  65.6   3.6   0.4  65.6
> > >  5.|-- 64.125.28.54               0.0%    40    40   2.8   4.2   0.7  51.7
> > >  6.|-- 64.125.30.126              0.0%    40    40   0.8   1.4   0.7  16.7
> > >  7.|-- 64.125.30.178              0.0%    40    40   1.1   5.8   1.1  65.5
> > >  8.|-- 75.149.228.133             0.0%    40    40 148.7 136.9 117.4 150.2
> > >  9.|-- 68.86.85.65                5.0%    40    38 139.3 135.5 119.9 148.3
> > > 10.|-- 68.86.90.158               2.5%    40    39 141.6 138.1 120.0 149.8
> > > 11.|-- 68.86.143.93               2.5%    40    39 140.3 136.8 120.5 149.8
> > > 12.|-- 68.85.191.250              0.0%    40    40 150.8 144.8 128.7 159.5
> > > 13.|-- 67.180.84.87               0.0%    40    40 146.7 149.6 132.2 173.9
> > > === END
> > > 
> > > $ host 64.125.30.178
> > > 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
> > > 
> > > $ host 75.149.228.133
> > > 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
> > > 
> > > So Ren, if you can investigate this, I would be appreciative of it.
> > > 
> > > -- 
> > > | Jeremy Chadwick                              jdc at parodius.com |
> > > | Parodius Networking                     http://www.parodius.com/ |
> > > | UNIX Systems Administrator                 Mountain View, CA, US |
> > > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> > > 
> > > On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
> > >> Hi Jeremy,
> > >> 
> > >> When the issue was raised a week or two ago there seemed to be a route
> > >> announcement issue for 72.20.98.67.  When your colo provider changed
> > >> their policy did they update filters with their upstream?
> > >> 
> > >> Cheers, -ren, who will confirm there is no congestion with Abovenet on
> > >> the port in SJC to Comcast.
> > >> 
> > >> On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick
> > >> <outages at jdc.parodius.com> wrote:
> > >>> There's an issue I've been tracking for a few months now pertaining to a
> > >>> network link between Abovenet and Comcast which appears to become
> > >>> saturated (or impacted negatively in some way) at nearly the same time
> > >>> every night, and lasts for numerous hours, then ceases -- on a
> > >>> near-daily basis (no exaggeration).
> > >>> 
> > >>> Latency and packet loss occur during this time, with latency hitting
> > >>> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%.
> > >>> I've been storing periodic traceroutes/mtrs for over a month showing
> > >>> this problem, and been tracking start/end times as well.
> > >>> 
> > >>> Thankfully I own devices/have connectivity on both ends (src and dst,
> > >>> thus can provide mtrs/traceroutes from both directions. ?Analysis so
> > >>> far, done by myself as well as senior network techs at my co-lo
> > >>> provider, confirms this issue is with a link between Abovenet/Comcast,
> > >>> likely within the San Jose Great Oaks POP (which I'm familiar with as
> > >>> part of my job).
> > >>> 
> > >>> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only
> > >>> Comcast employees can respond to/view tickets for) over a month ago.
> > >>> Someone has been viewing it, but nobody has replied except me.
> > >>> 
> > >>> I've since made the issue public, where (of course) the general Internet
> > >>> community does not quite understand how peering arrangements/contracts
> > >>> work (people think that any company who has a contract with Abovenet can
> > >>> report issues, but that is simply not the case; you must be a POC for
> > >>> the transport to report issues with it), nor do they understand how a
> > >>> co-lo provider changing route preferencing can impact the provider
> > >>> financially (based on billing metrics, etc.). ?My co-lo provider is very
> > >>> strict with their routing policies, and it has to do with financial
> > >>> reasons that are their own business, not mine.
> > >>> 
> > >>> The public thread is here, which also includes start/end times,
> > >>> traceroutes (both directions), and so on. ?I update it every day when
> > >>> the issue happens, and ~90% of the time edit my posts when the issue
> > >>> ends.
> > >>> 
> > >>> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-latency-nightly
> > >>> 
> > >>> Anyway, all the technical details aside:
> > >>> 
> > >>> Is there anyone on this list who works for Comcast who can contact me
> > >>> off-list who is willing to investigate this and drive it to completion?
> > >>> 
> > >>> An alternative would be for someone to contact me off-list with the name
> > >>> or Email address of someone (or division) who handles issues like this
> > >>> at Comcast. ?I'd love for Abovenet to get involved, but I have no
> > >>> contractual obligation to them. ?(If there is an Abovenet individual who
> > >>> is willing to investigate this "pro bono" per se, that would be
> > >>> awesome, but I imagine such is often above one's pay grade).
> > >>> 
> > >>> --
> > >>> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> > >>> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> > >>> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> > >>> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> > >>> 
> > >>> _______________________________________________
> > >>> Outages mailing list
> > >>> Outages at outages.org
> > >>> https://puck.nether.net/mailman/listinfo/outages
> > > _______________________________________________
> > > Outages mailing list
> > > Outages at outages.org
> > > https://puck.nether.net/mailman/listinfo/outages
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages



More information about the Outages mailing list