[outages] Comcast<->AT&T packet loss (possibly within CA)

Jeremy Chadwick jdc at koitsu.org
Sat Mar 8 22:52:33 EST 2014


1. Thanks -- the problem is that in my experience Company X will blame
Company Y for the device, but the device is owned/maintained by Company
X, and this nonsense goes on for about a week before someone finally
owns up to something (by which time the problem is usually gone).  It's
a depressing and sad modus operandi; sometimes I think it's done
intentionally (stall tactic).

2. No, because I don't think it's necessary -- when I can clearly "feel"
the slowdown via SSH (which is TCP-based) the issue isn't related to
ICMP prio.  Plus, showing a network provider hping results doesn't
necessarily convince them of anything if they're unfamiliar with the
tool.  That's been my experience anyway.  It'd be akin to giving them
packet captures and doing a 10-page write-up showing how TCP packet with
PSH+ACK seq no 123456789 wasn't seen by the remote end until 2-3
retries.

3. No I haven't, because the process would be significantly more
convoluted than that.  This is what would have to happen, starting with
the forward path:

- I'd have to open a ticket with Comcast through standard 800-COMCAST
  means, i.e. complaining to someone in the Philippines about packet
  loss (read: likelihood of someone screwing this up: 99% likely)
- Comcast would have to hand it off to the Comcast NOC
- Comcast NOC would have to care enough to open a ticket with AT&T

For the reverse path:

- I'd have to open a ticket with my VPS provider, RootBSD
- RootBSD would have to open a ticket with InterNAP (assuming they have
  relationship with them directly; it may be more convoluted, for
  example it may be they have to open a ticket with their co-lo
  provider who then opens a ticket with InterNAP)
- InterNAP would have to care enough to open a ticket with
  Qwest/Centurylink
- Qwest/Centurylink would have to care enough to open a ticket with
  Comcast

Historically I've mailed things of this nature to outages at outages.org
because there are lurkers on the list who quietly go behind the scenes
and start trying to fix/rectify things.  Other times it's purely about
bringing to light something that's happening on the Internet in hopes
that one or more of the involved peers are, in a roundabout way,
publicly shamed for not having better monitoring.

P.S. -- Issue is still ongoing and appears worse than before (at least
now there aren't sporadic times of 0% loss at intermediary hops).

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Sat, Mar 08, 2014 at 06:27:15PM -0800, Michael Smith wrote:
> A couple of things
> 
> - Hop 8's IP is an AT&T so likely an interface on an AT&T router, since you're headed towards it in your traceroute (next_hop).
> - Have you tried something like hping that will allow you to use TCP for your test?
> - Have you contacted InterNAP and told them to open a ticket with AT&T to open a ticket with AT&T using the data you have?  
> 
> Mike
> 
> 
> On Mar 8, 2014, at 4:53 PM, Jeremy Chadwick <jdc at koitsu.org> wrote:
> 
> > Since roughly Friday, I've been seeing what appears to be packet loss
> > somewhere within Comcast/AT&T network mesh.  Source and destination IPs
> > are provided below as well, ditto with some mtrs from src->dst and
> > dst->src.  I keep periodic mtrs (both directions) going all the way back
> > to 03/04.  I can make all of those logs available if asked.
> > 
> > The issue started on 03/07 @ 21:33 PST suddenly -- not a "gradual"
> > increase -- and lasted until an undetermined time (very hard to tell
> > from mtrs) but I'd estimate ~02:00 PST on 03/08 (today).
> > 
> > The issue then appeared to start back up again ~07:00 PST, though it's
> > hard to give an exact time (seems sort of a gradual increase, thus hard
> > to pinpoint).  It's been ongoing since.
> > 
> > The loss varies from 3% to 20%, but you can definitely "feel" it across
> > an SSH session, so it's not ICMP prio.
> > 
> > I will make myself clear: it's very hard to "show" someone the way this
> > problem manifests itself, because the packet loss will vary all over the
> > place between different hops.  It *definitely* starts at a particular
> > point and "trickles down", but due to the fact that the loss is a
> > smaller percentage, there are times where a hop will suddenly show 0%.
> > TL;DR -- You'd really have to see a longer log (say, an hour's worth) to
> > be able to say "ah yes, this really is a problem" and not blow it off as
> > ICMP prio.
> > 
> > And as usual, there's one of those "mystery routers" (hop #8 in the
> > first example) that peering providers looooooove to use as a scapegoat
> > when it comes to shifting blame, ex. provider A says "that's a device
> > owned by provider B", provider B says "that device is provider A's
> > responsibility", and neither side does anything about the issue.
> > However I should note that the "mystery router" usually does show some
> > degree of loss even when this issue isn't occurring (likely ICMP prio on
> > the device), but that makes it even more difficult to determine where
> > the issue begins.
> > 
> > 
> > src IP: 76.102.14.35   (Comcast; Mountain View, CA)
> > dst IP: 204.109.61.174 (RootBSD; Dallas, TX)
> > 
> > === Sat Mar  8 16:22:00 PST 2014  (1394324520)
> > Start: Sat Mar  8 16:22:00 2014
> > HOST: icarus.home.lan                                                 Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> >  1.|-- gw.home.lan (192.168.1.1)                                        0.0%    30    30   0.4   0.3   0.2   0.4
> >  2.|-- 76.102.12.1                                                      0.0%    30    30   8.0   8.8   8.0  12.2
> >  3.|-- te-0-2-0-5-ur06.santaclara.ca.sfba.comcast.net (68.86.249.253)   0.0%    30    30   8.2   9.0   8.2  16.5
> >  4.|-- te-1-1-0-1-ar01.oakland.ca.sfba.comcast.net (69.139.198.94)      0.0%    30    30  11.9  12.2  10.1  15.0
> >  5.|-- be-90-ar01.sfsutro.ca.sfba.comcast.net (68.85.155.14)            0.0%    30    30  12.0  12.5  10.1  15.1
> >  6.|-- he-3-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.94.85)       0.0%    30    30  13.0  14.0  11.8  18.0
> >  7.|-- pos-0-3-0-0-pe01.11greatoaks.ca.ibone.comcast.net (68.86.87.18)  0.0%    30    30  15.6  17.2  15.3  19.8
> >  8.|-- 192.205.37.1                                                    70.0%    30     9  54.8  67.7  53.6 102.2
> >  9.|-- cr2.sffca.ip.att.net (12.122.86.202)                            13.3%    30    26  65.2  63.5  61.0  65.7
> > 10.|-- cr2.la2ca.ip.att.net (12.122.31.133)                             6.7%    30    28  63.3  63.5  60.9  75.2
> > 11.|-- cr2.dlstx.ip.att.net (12.122.28.177)                             3.3%    30    29  65.2  63.7  61.1  69.7
> > 12.|-- ggr6.dlstx.ip.att.net (12.122.138.113)                           6.7%    30    28  60.3  64.5  59.9 153.6
> > 13.|-- 12.90.228.14                                                     6.7%    30    28  60.3  60.7  60.2  62.5
> > 14.|-- border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19)               3.3%    30    29  60.3  60.2  59.8  60.5
> > 15.|-- giglinx-60.border1.dal004.pnap.net (216.52.189.46)               3.3%    30    29  59.9  60.2  59.8  61.4
> > 16.|-- 204.109.62.46                                                    6.7%    30    28  60.1  60.5  60.1  62.7
> > 17.|-- mambo.koitsu.org (204.109.61.174)                                3.3%    30    29  60.7  60.9  60.1  63.4
> > === END
> > 
> > 
> > src IP: 204.109.61.174 (RootBSD; Dallas, TX)
> > dst IP: 76.102.14.35   (Comcast; Mountain View, CA)
> > 
> > === Sat Mar  8 16:22:00 PST 2014  (1394324520)
> > Start: Sat Mar  8 16:22:00 2014
> > HOST: mambo.koitsu.org                                                Loss%   Snt   Rcv  Last   Avg  Best  Wrst
> >  1.|-- 204.109.61.173                                                   0.0%    30    30   0.5   1.3   0.4  15.0
> >  2.|-- 204.109.62.45                                                    0.0%    30    30   0.5   0.5   0.3   1.2
> >  3.|-- border1.ge1-6.giglinx-60.dal004.pnap.net (216.52.189.45)         0.0%    30    30   0.5   0.6   0.4   4.7
> >  4.|-- core3.pc1-bbnet1.ext1a.dal.pnap.net (216.52.191.41)              0.0%    30    30   0.9   1.0   0.9   1.2
> >  5.|-- dax-edge-03.inet.qwest.net (67.133.189.93)                       0.0%    30    30   0.6   2.0   0.5  22.8
> >  6.|-- 63-235-82-234.dia.static.qwest.net (63.235.82.234)               0.0%    30    30   1.4   1.3   1.0   1.7
> >  7.|-- be-13-cr01.dallas.tx.ibone.comcast.net (68.86.82.141)            0.0%    30    30   1.3   2.7   1.0   4.9
> >  8.|-- he-0-14-0-0-cr01.losangeles.ca.ibone.comcast.net (68.86.85.141)  0.0%    30    30  35.6  33.6  31.8  35.7
> >  9.|-- he-1-8-0-0-ar01.oakland.ca.sfba.comcast.net (68.86.89.54)        3.3%    30    29  52.8  53.5  51.5  55.5
> > 10.|-- te-0-4-0-5-ur06.santaclara.ca.sfba.comcast.net (68.86.143.97)    0.0%    30    30  52.1  52.2  51.9  52.3
> > 11.|-- te-6-0-acr03.santaclara.ca.sfba.comcast.net (68.86.249.66)       6.7%    30    28  53.0  53.0  52.8  53.8
> > 12.|-- c-76-102-14-35.hsd1.ca.comcast.net (76.102.14.35)                3.3%    30    29  60.4  60.5  59.9  63.8
> > === END
> > 
> > -- 
> > | Jeremy Chadwick                                   jdc at koitsu.org |
> > | UNIX Systems Administrator                http://jdc.koitsu.org/ |
> > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> > 
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages



More information about the Outages mailing list