[outages] Comcast<->AT&T packet loss (possibly within CA)
Adam Rothschild
asr at latency.net
Sun Mar 9 00:12:10 EST 2014
Jeremy,
For the record, we at Internap do take connectivity issues seriously.
I'd suggest having your provider reach out to our NOC, so that we may
investigate comprehensively. You (and others on this list) are, of
course, welcome to mail me privately in addition.
(FWIW: a quick look into 76.96.0.0/11 in Dallas shows we've not been
routing to it over any of Comcast's congested paths.)
Regards,
-a
On Sat, Mar 8, 2014 at 10:52 PM, Jeremy Chadwick <jdc at koitsu.org> wrote:
> 1. Thanks -- the problem is that in my experience Company X will blame
> Company Y for the device, but the device is owned/maintained by Company
> X, and this nonsense goes on for about a week before someone finally
> owns up to something (by which time the problem is usually gone). It's
> a depressing and sad modus operandi; sometimes I think it's done
> intentionally (stall tactic).
>
> 2. No, because I don't think it's necessary -- when I can clearly "feel"
> the slowdown via SSH (which is TCP-based) the issue isn't related to
> ICMP prio. Plus, showing a network provider hping results doesn't
> necessarily convince them of anything if they're unfamiliar with the
> tool. That's been my experience anyway. It'd be akin to giving them
> packet captures and doing a 10-page write-up showing how TCP packet with
> PSH+ACK seq no 123456789 wasn't seen by the remote end until 2-3
> retries.
>
> 3. No I haven't, because the process would be significantly more
> convoluted than that. This is what would have to happen, starting with
> the forward path:
>
> - I'd have to open a ticket with Comcast through standard 800-COMCAST
> means, i.e. complaining to someone in the Philippines about packet
> loss (read: likelihood of someone screwing this up: 99% likely)
> - Comcast would have to hand it off to the Comcast NOC
> - Comcast NOC would have to care enough to open a ticket with AT&T
>
> For the reverse path:
>
> - I'd have to open a ticket with my VPS provider, RootBSD
> - RootBSD would have to open a ticket with InterNAP (assuming they have
> relationship with them directly; it may be more convoluted, for
> example it may be they have to open a ticket with their co-lo
> provider who then opens a ticket with InterNAP)
> - InterNAP would have to care enough to open a ticket with
> Qwest/Centurylink
> - Qwest/Centurylink would have to care enough to open a ticket with
> Comcast
>
> Historically I've mailed things of this nature to outages at outages.org
> because there are lurkers on the list who quietly go behind the scenes
> and start trying to fix/rectify things. Other times it's purely about
> bringing to light something that's happening on the Internet in hopes
> that one or more of the involved peers are, in a roundabout way,
> publicly shamed for not having better monitoring.
>
> P.S. -- Issue is still ongoing and appears worse than before (at least
> now there aren't sporadic times of 0% loss at intermediary hops).
>
> --
> | Jeremy Chadwick jdc at koitsu.org |
> | UNIX Systems Administrator http://jdc.koitsu.org/ |
> | Making life hard for others since 1977. PGP 4BD6C0CB |
>
> On Sat, Mar 08, 2014 at 06:27:15PM -0800, Michael Smith wrote:
>> A couple of things
>>
>> - Hop 8's IP is an AT&T so likely an interface on an AT&T router, since you're headed towards it in your traceroute (next_hop).
>> - Have you tried something like hping that will allow you to use TCP for your test?
>> - Have you contacted InterNAP and told them to open a ticket with AT&T to open a ticket with AT&T using the data you have?
>>
>> Mike
>>
>>
>> On Mar 8, 2014, at 4:53 PM, Jeremy Chadwick <jdc at koitsu.org> wrote:
>>
>> > Since roughly Friday, I've been seeing what appears to be packet loss
>> > somewhere within Comcast/AT&T network mesh. Source and destination IPs
>> > are provided below as well, ditto with some mtrs from src->dst and
>> > dst->src. I keep periodic mtrs (both directions) going all the way back
>> > to 03/04. I can make all of those logs available if asked.
>> >
>> > The issue started on 03/07 @ 21:33 PST suddenly -- not a "gradual"
>> > increase -- and lasted until an undetermined time (very hard to tell
>> > from mtrs) but I'd estimate ~02:00 PST on 03/08 (today).
>> >
>> > The issue then appeared to start back up again ~07:00 PST, though it's
>> > hard to give an exact time (seems sort of a gradual increase, thus hard
>> > to pinpoint). It's been ongoing since.
>> >
>> > The loss varies from 3% to 20%, but you can definitely "feel" it across
>> > an SSH session, so it's not ICMP prio.
>> >
>> > I will make myself clear: it's very hard to "show" someone the way this
>> > problem manifests itself, because the packet loss will vary all over the
>> > place between different hops. It *definitely* starts at a particular
>> > point and "trickles down", but due to the fact that the loss is a
>> > smaller percentage, there are times where a hop will suddenly show 0%.
>> > TL;DR -- You'd really have to see a longer log (say, an hour's worth) to
>> > be able to say "ah yes, this really is a problem" and not blow it off as
>> > ICMP prio.
>> >
>> > And as usual, there's one of those "mystery routers" (hop #8 in the
>> > first example) that peering providers looooooove to use as a scapegoat
>> > when it comes to shifting blame, ex. provider A says "that's a device
>> > owned by provider B", provider B says "that device is provider A's
>> > responsibility", and neither side does anything about the issue.
>> > However I should note that the "mystery router" usually does show some
>> > degree of loss even when this issue isn't occurring (likely ICMP prio on
>> > the device), but that makes it even more difficult to determine where
>> > the issue begins.
>> >
>> >
>> > src IP: 76.102.14.35 (Comcast; Mountain View, CA)
>> > dst IP: 204.109.61.174 (RootBSD; Dallas, TX)
>> >
>> > === Sat Mar 8 16:22:00 PST 2014 (1394324520)
>> > Start: Sat Mar 8 16:22:00 2014
>> > HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst
>> > 1.|-- gw.home.lan (192.168.1.1) 0.0% 30 30 0.4 0.3 0.2 0.4
>> > 2.|-- 76.102.12.1 0.0% 30 30 8.0 8.8 8.0 12.2
>> > 3.|-- te-0-2-0-5-ur06.santaclara.ca.sfba.comcast.net (68.86.249.253) 0.0% 30 30 8.2 9.0 8.2 16.5
>> > 4.|-- te-1-1-0-1-ar01.oakland.ca.sfba.comcast.net (69.139.198.94) 0.0% 30 30 11.9 12.2 10.1 15.0
>> > 5.|-- be-90-ar01.sfsutro.ca.sfba.comcast.net (68.85.155.14) 0.0% 30 30 12.0 12.5 10.1 15.1
>> > 6.|-- he-3-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.94.85) 0.0% 30 30 13.0 14.0 11.8 18.0
>> > 7.|-- pos-0-3-0-0-pe01.11greatoaks.ca.ibone.comcast.net (68.86.87.18) 0.0% 30 30 15.6 17.2 15.3 19.8
>> > 8.|-- 192.205.37.1 70.0% 30 9 54.8 67.7 53.6 102.2
>> > 9.|-- cr2.sffca.ip.att.net (12.122.86.202) 13.3% 30 26 65.2 63.5 61.0 65.7
>> > 10.|-- cr2.la2ca.ip.att.net (12.122.31.133) 6.7% 30 28 63.3 63.5 60.9 75.2
>> > 11.|-- cr2.dlstx.ip.att.net (12.122.28.177) 3.3% 30 29 65.2 63.7 61.1 69.7
>> > 12.|-- ggr6.dlstx.ip.att.net (12.122.138.113) 6.7% 30 28 60.3 64.5 59.9 153.6
>> > 13.|-- 12.90.228.14 6.7% 30 28 60.3 60.7 60.2 62.5
>> > 14.|-- border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19) 3.3% 30 29 60.3 60.2 59.8 60.5
>> > 15.|-- giglinx-60.border1.dal004.pnap.net (216.52.189.46) 3.3% 30 29 59.9 60.2 59.8 61.4
>> > 16.|-- 204.109.62.46 6.7% 30 28 60.1 60.5 60.1 62.7
>> > 17.|-- mambo.koitsu.org (204.109.61.174) 3.3% 30 29 60.7 60.9 60.1 63.4
>> > === END
>> >
>> >
>> > src IP: 204.109.61.174 (RootBSD; Dallas, TX)
>> > dst IP: 76.102.14.35 (Comcast; Mountain View, CA)
>> >
>> > === Sat Mar 8 16:22:00 PST 2014 (1394324520)
>> > Start: Sat Mar 8 16:22:00 2014
>> > HOST: mambo.koitsu.org Loss% Snt Rcv Last Avg Best Wrst
>> > 1.|-- 204.109.61.173 0.0% 30 30 0.5 1.3 0.4 15.0
>> > 2.|-- 204.109.62.45 0.0% 30 30 0.5 0.5 0.3 1.2
>> > 3.|-- border1.ge1-6.giglinx-60.dal004.pnap.net (216.52.189.45) 0.0% 30 30 0.5 0.6 0.4 4.7
>> > 4.|-- core3.pc1-bbnet1.ext1a.dal.pnap.net (216.52.191.41) 0.0% 30 30 0.9 1.0 0.9 1.2
>> > 5.|-- dax-edge-03.inet.qwest.net (67.133.189.93) 0.0% 30 30 0.6 2.0 0.5 22.8
>> > 6.|-- 63-235-82-234.dia.static.qwest.net (63.235.82.234) 0.0% 30 30 1.4 1.3 1.0 1.7
>> > 7.|-- be-13-cr01.dallas.tx.ibone.comcast.net (68.86.82.141) 0.0% 30 30 1.3 2.7 1.0 4.9
>> > 8.|-- he-0-14-0-0-cr01.losangeles.ca.ibone.comcast.net (68.86.85.141) 0.0% 30 30 35.6 33.6 31.8 35.7
>> > 9.|-- he-1-8-0-0-ar01.oakland.ca.sfba.comcast.net (68.86.89.54) 3.3% 30 29 52.8 53.5 51.5 55.5
>> > 10.|-- te-0-4-0-5-ur06.santaclara.ca.sfba.comcast.net (68.86.143.97) 0.0% 30 30 52.1 52.2 51.9 52.3
>> > 11.|-- te-6-0-acr03.santaclara.ca.sfba.comcast.net (68.86.249.66) 6.7% 30 28 53.0 53.0 52.8 53.8
>> > 12.|-- c-76-102-14-35.hsd1.ca.comcast.net (76.102.14.35) 3.3% 30 29 60.4 60.5 59.9 63.8
>> > === END
>> >
>> > --
>> > | Jeremy Chadwick jdc at koitsu.org |
>> > | UNIX Systems Administrator http://jdc.koitsu.org/ |
>> > | Making life hard for others since 1977. PGP 4BD6C0CB |
>> >
>> > _______________________________________________
>> > Outages mailing list
>> > Outages at outages.org
>> > https://puck.nether.net/mailman/listinfo/outages
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages
More information about the Outages
mailing list