[outages] Cox -> nLayer connectivity issues

Wed Dec 19 20:22:08 EST 2012

I can confirm that things are looking much better from here as well -- In total we had 2 client reports of related issues and 1 has confirmed that it has cleared, and  arin.net is loading at the same speed through nLayer<->Cox<->ARIN as it does on other connections to which I have access. 

-----Original Message-----
From: outages-bounces at outages.org [mailto:outages-bounces at outages.org] On Behalf Of Jeremy Chadwick
Sent: Wednesday, December 19, 2012 5:41 PM
To: Cary Wiedemann
Cc: outages at outages.org
Subject: Re: [outages] Cox -> nLayer connectivity issues

Likewise, if folks need a west-coast destination to test ICMP against (but not UDP or TCP), you can use mine: 206.125.172.42.  VPS provider
(arpnetworks) peers with Mzima, who peers with nLayer.

Earlier I was noticing that packets destined to 192.149.252.75 would consistently (100% of the time) not solicit an ICMP time-exceeded response from Cox's router (hop #7 below), but packets destined to
192.149.252.76 did solicit ICMP time-exceeded.

As of a few minutes ago, that behaviour has changed.  How things look right now:

$ traceroute -n -P icmp 192.149.252.75
traceroute to 192.149.252.75 (192.149.252.75), 64 hops max, 72 byte packets
 1  206.125.172.41  10.156 ms  4.368 ms  1.405 ms
 2  67.199.135.101  8.529 ms  0.638 ms  0.711 ms
 3  69.174.121.74  6.212 ms  1.863 ms  1.847 ms
 4  69.31.127.129  0.702 ms  0.696 ms  0.465 ms
 5  69.31.127.138  2.067 ms  2.010 ms  1.954 ms
 6  69.31.127.230  0.695 ms  0.772 ms  2.848 ms
 7  68.1.1.5  70.971 ms  104.494 ms  76.750 ms
 8  * * *
 9  * * *
10  98.172.152.14  80.197 ms  72.742 ms  82.425 ms
11  192.149.252.131  72.426 ms  72.590 ms  82.698 ms
12  192.149.252.75  82.766 ms  73.078 ms  72.656 ms

$ traceroute -n -P icmp 192.149.252.76
traceroute to 192.149.252.76 (192.149.252.76), 64 hops max, 72 byte packets
 1  206.125.172.41  4.025 ms  20.726 ms  4.288 ms
 2  67.199.135.101  8.299 ms  0.667 ms  0.442 ms
 3  69.174.121.74  4.435 ms  1.767 ms  1.699 ms
 4  69.31.127.129  0.470 ms  0.710 ms  0.490 ms
 5  69.31.127.138  1.940 ms  1.996 ms  1.860 ms
 6  69.31.127.230  0.675 ms  0.719 ms  0.714 ms
 7  68.1.1.7  70.829 ms  70.958 ms  70.773 ms
 8  * * *
 9  * * *
10  98.172.152.14  72.634 ms  105.463 ms  89.318 ms
11  192.149.252.131  82.523 ms  72.406 ms  72.402 ms
12  192.149.252.76  82.755 ms  73.740 ms  72.935 ms

How they looked before (for packets destined to 192.149.252.75), and again, this was 100% reproducible (skipping right to TTL 6):

$ traceroute -n -f 6 -P icmp 192.149.252.75 traceroute to 192.149.252.75 (192.149.252.75), 64 hops max, 72 byte packets
 6  69.31.127.230  0.759 ms  0.785 ms  0.717 ms
 7  * * *
 8  * * *
 9  * * *
10  98.172.152.14  78.109 ms  74.391 ms  81.120 ms ^C

I can only speculate at what transpired there (possibly some device with a hashing algorithm for LB misbehaving?), and maybe that's related.
Unsure.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Wed, Dec 19, 2012 at 07:17:53PM -0500, Cary Wiedemann wrote:
> All,
> 
> I knew I should have checked this list before opening tickets far and 
> wide.  I've been experiencing this issue since before 5:30pm EST and 
> just wanted to report that ICMP *IS* affected for me, but only for 
> certain IP addresses.  TCP seems to be intermittently affected.
> 
> I host a server at InfoRelay with network 69.169.88.16/28.  From a Cox 
> Communications optical internet circuit I can ping 69.169.88.20 and 
> .21, but not .22 .23 or .24.
> 
> chantilly-asa# ping 69.169.88.20
> Type escape sequence to abort.
> Sending 5, 100-byte ICMP Echos to 69.169.88.20, timeout is 2 seconds:
> !!!!!
> Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/10 ms
> 
> chantilly-asa# ping 69.169.88.21
> Type escape sequence to abort.
> Sending 5, 100-byte ICMP Echos to 69.169.88.21, timeout is 2 seconds:
> !!!!!
> Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/10 ms
> 
> chantilly-asa# ping 69.169.88.22
> Type escape sequence to abort.
> Sending 5, 100-byte ICMP Echos to 69.169.88.22, timeout is 2 seconds:
> ?????
> Success rate is 0 percent (0/5)
> 
> chantilly-asa# ping 69.169.88.23
> Type escape sequence to abort.
> Sending 5, 100-byte ICMP Echos to 69.169.88.23, timeout is 2 seconds:
> ?????
> Success rate is 0 percent (0/5)
> 
> A good trace looks like this (first hop obscured):
> 
> C:\>tracert 69.169.88.20
> 
> Tracing route to schneller.carywiedemann.com [69.169.88.20] over a 
> maximum of 30 hops:
> 
>   1     1 ms     1 ms     1 ms
> wsip-174-000-000-000-dc.dc.cox.net[174.000.000.000]
>   2     1 ms     1 ms     1 ms  mrfddsrj01gex070003.rd.dc.cox.net[68.100.0.141]
>   3     2 ms     2 ms     2 ms  68.1.4.139
>   4    11 ms     6 ms     4 ms  xe-5-0-7.ar1.iad1.us.nlayer.net[69.31.10.81]
>   5   203 ms   204 ms   209 ms
> as33597.xe-3-0-5-304.ar1.iad1.us.nlayer.net[69.31.10.70]
>   6     3 ms     3 ms     3 ms  cr1.iad1.inforelay.net [66.231.176.9]
>   7     4 ms     5 ms     3 ms  cr1.iad4.inforelay.net [66.231.177.66]
>   8     3 ms     3 ms     3 ms  schneller.carywiedemann.com [69.169.88.20]
> 
> While a trace to 69.169.88.22 dies after hop 3:
> C:\>tracert 69.169.88.22
> 
> Tracing route to fairfaxunderground.com [69.169.88.22] over a maximum 
> of 30 hops:
> 
>   1     1 ms     1 ms     1 ms
> wsip-174-000-000-000.dc.dc.cox.net[174.000.000.000]
>   2     1 ms     1 ms     1 ms  mrfddsrj01gex070003.rd.dc.cox.net[68.100.0.141]
>   3     *        2 ms     2 ms  68.1.4.139
>   4     *        *        *     Request timed out.
>   5     *        *        *     Request timed out.
>   6     *        *        *     Request timed out.
>   7     *        *        *     Request timed out.
> 
> Although TCP connections still work, they're highly intermittent.  I 
> haven't had a successful ICMP echo reply from 69.169.88.22 or 
> 69.169.88.23 from a Cox connection via nLayer for several hours.
> 
> I'm asking both Cox and InfoRelay to depeer from nLayer.
> 
> Feel free to use my server as an ICMP target.
> 
> - Cary
> 
> On Wed, Dec 19, 2012 at 6:53 PM, Corey Quinn <corey at sequestered.net> wrote:
> 
> > Also in LA here.
> >
> > traceroute to arin.net (192.149.252.76), 30 hops max, 60 byte 
> > packets
> >  1  10.201.69.1 (10.201.69.1)  0.227 ms  0.233 ms  0.232 ms
> >  2  * * *
> >  3  xe-7-2-0.mpr1.lax112.us.above.net (64.125.170.97)  1.424 ms  
> > 1.396 ms
> >  1.366 ms
> >  4  above-cox-1.lax12.us.above.net (64.125.13.10)  1.331 ms above-cox-2.
> > lax12.us.above.net (64.125.13.14)  1.411 ms 
> > above-cox-1.lax12.us.above.net(64.125.13.10)  1.386 ms
> >  5  mrfddsrj02-ae0.0.rd.dc.cox.net (68.1.1.7)  67.585 ms mrfddsrj01-ae0.0.
> > rd.dc.cox.net (68.1.1.5)  67.667 ms  67.783 ms
> >  6  * * *
> >  7  * * *
> >  8  wsip-98-172-152-14.dc.dc.cox.net (98.172.152.14)  79.017 ms  
> > 69.115 ms  69.117 ms
> >  9  * * *
> > 10  * * *
> > 11  * * *
> >
> >
> >
> > On Dec 19, 2012, at 3:50 PM, Jake Mertel <jake at nobistech.net> wrote:
> >
> > Something else that just clicked, I have been having a number of 
> > issues reaching arin.net today from one of my servers in Los Angeles 
> > that uses nLayer as its upstream. Request response times are between 
> > 20 and 40 seconds as opposed to 2 to 4 seconds on our office 
> > connection. Looking at my trace from LA, we are going 
> > LA<->Cox<->ARIN.****
> >
> > C:\Users\jake>tracert arin.net****
> >
> > Tracing route to arin.net [192.149.252.76]**** over a maximum of 30 
> > hops:****
> >
> >   1    <1 ms     1 ms    <1 ms  v403.er01.lax.ubiquity.io [72.37.224.129]*
> > ***
> >   2     1 ms     5 ms     1 ms  xe-1-0-3.ar1.lax2.us.nlayer.net
> > [69.31.127.45]****
> >   3    <1 ms    <1 ms    <1 ms  ae1-80g.cr1.lax1.us.nlayer.net
> > [69.31.127.129]****
> >   4     2 ms     5 ms     2 ms  ae2-50g.ar1.lax1.us.nlayer.net
> > [69.31.127.142]****
> >   5    <1 ms    <1 ms    <1 ms  as22773.ae12.ar1.lax1.us.nlayer.net
> > [69.31.127.230]****
> >   6    70 ms   111 ms    70 ms  mrfddsrj01-ae0.0.rd.dc.cox.net [68.1.1.5]*
> > ***
> >   7     *        *        *     Request timed out.****
> >   8     *        *        *     Request timed out.****
> >   9    72 ms    73 ms    82 ms  wsip-98-172-152-14.dc.dc.cox.net
> > [98.172.152.14]****
> > 10    72 ms    72 ms    82 ms  host-252-131.arin.net [192.149.252.131]****
> > 11     *        *        *     Request timed out.****
> > 12     *        *        *     Request timed out.****
> > 13     *        *        *     Request timed out.****
> > 14     *        *        *     Request timed out.****
> > 15     *        *        *     Request timed out.****
> >
> >
> > *From:* outages-bounces at outages.org 
> > [mailto:outages-bounces at outages.org] *On Behalf Of *Jake Mertel
> > *Sent:* Wednesday, December 19, 2012 4:45 PM
> > *To:* 'Brandon Whaley'; 'outages at outages.org'
> > *Subject:* Re: [outages] Cox -> nLayer connectivity issues****
> > ** **
> > We have received a report of similar issues today. The client has 
> > servers with us in several locations where we use nLayer and/or 
> > PacketExchagne and his monitoring system is on a network that uses 
> > Cox as its preferred upstream. He shutdown his Cox upstream and 
> > didn?t have any issues reaching the servers over his backup 
> > provider. The issues were sporadic and did not affect all protocols 
> > ? ICMP pings worked, snmpwalk was fine, but UDP traces were dying 
> > somewhere on the reverse path. Seems to be very similar to what you 
> > are seeing.****
> >
> > *From:* outages-bounces at outages.org 
> > [mailto:outages-bounces at outages.org<outages-bounces at outages.org>
> > ] *On Behalf Of *Brandon Whaley
> > *Sent:* Wednesday, December 19, 2012 4:25 PM
> > *To:* outages at outages.org
> > *Subject:* [outages] Cox -> nLayer connectivity issues****
> > ** **
> > We've been seeing intermittent TCP/UDP connectivity issues from Cox 
> > Communications in Virginia to any location that routes over nLayer.  
> > UDP traceroutes are fine, but DNS lookups time out for minutes at a 
> > time, then work again for ~5 minutes before repeating the problem.  
> > ICMP is never affected during the outages.****
> > ** **
> > traceroute to 198.46.80.1 (198.46.80.1), 30 hops max, 60 byte 
> > packets****
> >  1  router36f24c.local (192.168.14.1)  0.560 ms  0.531 ms  0.753 
> > ms****
> >  2  wsip-174-77-92-169.hr.hr.cox.net (174.77.92.169)  2.532 ms  
> > 2.581 ms
> >  3.009 ms****
> >  3  172.21.224.153 (172.21.224.153)  3.995 ms  4.067 ms  4.111 
> > ms****
> >  4  172.21.249.101 (172.21.249.101)  4.185 ms  4.396 ms  4.561 
> > ms****
> >  5  172.21.249.73 (172.21.249.73)  4.916 ms  4.900 ms  5.128 ms****
> >  6  172.21.249.18 (172.21.249.18)  5.517 ms  5.124 ms  5.043 ms****
> >  7  ip-216-54-33-22.coxfiber.net (216.54.33.22)  210.486 ms  210.477 
> > ms
> >  210.466 ms****
> >  8  68.1.4.139 (68.1.4.139)  221.950 ms  222.460 ms  232.268 ms****
> >  9  * xe-5-0-7.ar1.iad1.us.nlayer.net (69.31.10.81)  226.332 ms  
> > 226.866
> > ms****
> > 10  as54641.xe-9-0-1.ar1.iad1.us.nlayer.net (69.31.31.42)  223.718 
> > ms
> >  225.083 ms  225.744 ms****
> > 11  198.46.80.1 (198.46.80.1)  225.706 ms  226.670 ms  226.698 
> > ms****
> > ** **
> > Is anyone with Cox on the list that can investigate/contact me?****
> > ** **
> > --
> > Best Regards,
> > Brandon W.****
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages
> >
> >
> >
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages
> >
> >

> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages

_______________________________________________
Outages mailing list
Outages at outages.org
https://puck.nether.net/mailman/listinfo/outages