[outages] SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks
Jeremy Chadwick
outages at jdc.parodius.com
Wed Apr 11 01:35:31 EDT 2012
Following up to my own post (in bad habit):
Can some folks here who have peering with Abovenet (preferably with a
full routing table) verify that you see an announcement for
72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?
I've confirmed this is the case at my workplace, but I want extra
eyes/verification.
Thanks.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
> I choose not to do DNS resolution in mtr because otherwise the terminal
> width required to see FQDNs has to be >76 characters, which often upsets
> mailing list folks. (This for example is one of the few lists on which
> I top-post)
>
> I'm not so concerned with the packet loss -- for example in the 2nd mtr
> set I showed, the loss only seems to happen at routers, which is almost
> certainly the result of ICMP prioritisation.
>
> But the latency is a definite problem and is easily noticeable across
> SSH, Remote Desktop, and any other TCP service (i.e. the latency shown
> is not a result of ICMP prioritisation). src/dst IPs on both sides are
> actual servers/boxes, not routers.
>
> But you're absolutely right -- asymmetric routing is in place here,
> which means that everyone has to work together, and simultaneously, to
> really figure out where the problem is. I can only do so much when I
> have little to no visibility into things (e.g. if I had access to BAIS
> and Abovenet and Level 3 and Comcast routers I could figure out where
> the problem is... ;-) )
>
> I'm currently engaged in a conversation with Comcast engineers about
> this issue. (Seems my DSLR post got proper attention)
>
> So far the statement is that they've looked at the interface for the
> Abovenet/Comcast peering point in question, and although it's being
> used/busy, it's not oversaturated. They also pointed out that the only
> announcements they see for 72.20.96.0/19 are via Level 3 and Cogent,
> thus the issue is likely to be on my co-lo providers' side (e.g. the
> Level 3 <-> BAIS link). route-views also confirms the same thing, as
> does my place of work (who has peering with Abovenet natively).
>
> I have a ticket open with my co-lo provider to investigate this ordeal.
>
> If this does turn out to be a problem with their Level 3 link being
> saturated chronically, then I owe Comcast/Abovenet an apology (welcome
> to one of the complexities with asymmetric routing!), and I'm going to
> have to make some decisions with regards to co-location and so on,
> because the chronic nature of this problem is unacceptable for myself as
> well as my customers.
>
> --
> | Jeremy Chadwick jdc at parodius.com |
> | Parodius Networking http://www.parodius.com/ |
> | UNIX Systems Administrator Mountain View, CA, US |
> | Making life hard for others since 1977. PGP 4BD6C0CB |
>
> On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
> > A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
> >
> > Feel free to put me in my place, but please do so on -discuss.
> >
> > On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages at jdc.parodius.com> wrote:
> >
> > > Hi Ren,
> > >
> > > The issue with my co-lo pertaining to route announcements has actually
> > > been "dealt with", meaning "this is just how it is". I'm wondering if I
> > > can go into details without violating contractual obligations, hmm.
> > > Yes, I imagine I can, because it becomes quite obvious if I provide
> > > traceroutes from both directions, and that's public knowledge.
> > >
> > > It appears that my co-lo (BAIS) doesn't actually adjust route
> > > announcements on a per-IP basis, but they internally have a hashing
> > > algorithm in place where on a per-IP basis different addresses utilise
> > > different network paths. I still have an open ticket with their senior
> > > networking engineer about this, who has been somewhat "careful" in what
> > > he tells me, but so far I've basically gotten confirmation that this is
> > > indeed how they do their load-balancing for customers to balance out
> > > network traffic between all of their peering providers (Level 3,
> > > Abovenet, Cogent, and 2-3 others).
> > >
> > > I can provide those examples (to/from different IPs) if you want to see
> > > them, but that is a separate matter. There still seems to be a problem
> > > between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g.
> > > BAIS/Cogent) show no problems on the ingress or egress path -- the
> > > common path seems to be Abovenet/Comcast when there are problems.
> > >
> > > This is what's presently happening right now:
> > >
> > > Source IP: 67.180.84.87
> > > Dest IP: 72.20.98.124
> > >
> > > === Tue Apr 10 19:09:00 PDT 2012 (1334110140)
> > > HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst
> > > 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5
> > > 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0
> > > 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5
> > > 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7
> > > 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7
> > > 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6
> > > 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9
> > > 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0
> > > 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1
> > > 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1
> > > 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3
> > > === END
> > >
> > >
> > > Source IP: 72.20.98.124
> > > Dest IP: 67.180.84.87
> > >
> > > === Tue Apr 10 19:09:00 PDT 2012 (1334110140)
> > > HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst
> > > 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6
> > > 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5
> > > 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8
> > > 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6
> > > 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7
> > > 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7
> > > 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5
> > > 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2
> > > 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3
> > > 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8
> > > 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8
> > > 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5
> > > 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9
> > > === END
> > >
> > > $ host 64.125.30.178
> > > 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
> > >
> > > $ host 75.149.228.133
> > > 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
> > >
> > > So Ren, if you can investigate this, I would be appreciative of it.
> > >
> > > --
> > > | Jeremy Chadwick jdc at parodius.com |
> > > | Parodius Networking http://www.parodius.com/ |
> > > | UNIX Systems Administrator Mountain View, CA, US |
> > > | Making life hard for others since 1977. PGP 4BD6C0CB |
> > >
> > > On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
> > >> Hi Jeremy,
> > >>
> > >> When the issue was raised a week or two ago there seemed to be a route
> > >> announcement issue for 72.20.98.67. When your colo provider changed
> > >> their policy did they update filters with their upstream?
> > >>
> > >> Cheers, -ren, who will confirm there is no congestion with Abovenet on
> > >> the port in SJC to Comcast.
> > >>
> > >> On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick
> > >> <outages at jdc.parodius.com> wrote:
> > >>> There's an issue I've been tracking for a few months now pertaining to a
> > >>> network link between Abovenet and Comcast which appears to become
> > >>> saturated (or impacted negatively in some way) at nearly the same time
> > >>> every night, and lasts for numerous hours, then ceases -- on a
> > >>> near-daily basis (no exaggeration).
> > >>>
> > >>> Latency and packet loss occur during this time, with latency hitting
> > >>> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%.
> > >>> I've been storing periodic traceroutes/mtrs for over a month showing
> > >>> this problem, and been tracking start/end times as well.
> > >>>
> > >>> Thankfully I own devices/have connectivity on both ends (src and dst,
> > >>> thus can provide mtrs/traceroutes from both directions. ?Analysis so
> > >>> far, done by myself as well as senior network techs at my co-lo
> > >>> provider, confirms this issue is with a link between Abovenet/Comcast,
> > >>> likely within the San Jose Great Oaks POP (which I'm familiar with as
> > >>> part of my job).
> > >>>
> > >>> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only
> > >>> Comcast employees can respond to/view tickets for) over a month ago.
> > >>> Someone has been viewing it, but nobody has replied except me.
> > >>>
> > >>> I've since made the issue public, where (of course) the general Internet
> > >>> community does not quite understand how peering arrangements/contracts
> > >>> work (people think that any company who has a contract with Abovenet can
> > >>> report issues, but that is simply not the case; you must be a POC for
> > >>> the transport to report issues with it), nor do they understand how a
> > >>> co-lo provider changing route preferencing can impact the provider
> > >>> financially (based on billing metrics, etc.). ?My co-lo provider is very
> > >>> strict with their routing policies, and it has to do with financial
> > >>> reasons that are their own business, not mine.
> > >>>
> > >>> The public thread is here, which also includes start/end times,
> > >>> traceroutes (both directions), and so on. ?I update it every day when
> > >>> the issue happens, and ~90% of the time edit my posts when the issue
> > >>> ends.
> > >>>
> > >>> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-latency-nightly
> > >>>
> > >>> Anyway, all the technical details aside:
> > >>>
> > >>> Is there anyone on this list who works for Comcast who can contact me
> > >>> off-list who is willing to investigate this and drive it to completion?
> > >>>
> > >>> An alternative would be for someone to contact me off-list with the name
> > >>> or Email address of someone (or division) who handles issues like this
> > >>> at Comcast. ?I'd love for Abovenet to get involved, but I have no
> > >>> contractual obligation to them. ?(If there is an Abovenet individual who
> > >>> is willing to investigate this "pro bono" per se, that would be
> > >>> awesome, but I imagine such is often above one's pay grade).
> > >>>
> > >>> --
> > >>> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com |
> > >>> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ |
> > >>> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US |
> > >>> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
> > >>>
> > >>> _______________________________________________
> > >>> Outages mailing list
> > >>> Outages at outages.org
> > >>> https://puck.nether.net/mailman/listinfo/outages
> > > _______________________________________________
> > > Outages mailing list
> > > Outages at outages.org
> > > https://puck.nether.net/mailman/listinfo/outages
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages
More information about the Outages
mailing list