[c-nsp] weird BGP stuff

Paul Stewart paul at paulstewart.org
Tue Jun 22 19:36:38 EDT 2010


Hey folks...

 

I'm looking for a second set of eyes here ;)  Have a pair of 7606 boxes that
have been handling 100's of BGP sessions for a long time now with no
problems (well, performance but I'll leave that alone).

 

We added a Juniper MX480 into the mix recently and now seem to be having a
routing issue that I can't seem to pinpoint where it's occurring.

 

Here's a quick rundown to get started of a remote site that is reachable
from other providers and should be reachable from us we'd confident:

 

traceroute to 216.166.249.148 (216.166.249.148), 30 hops max, 40 byte
packets

 1  dis1-rtr-mb-vl10.nexicom.net (216.168.115.177)  0.468 ms  0.477 ms
0.543 ms

 2  core2-rtr-to-ge4-12-vl4.nexicom.net (98.124.0.226)  8.803 ms  8.866 ms
8.941 ms

3  * * *

 4  * * *

 5  * * *

 6  * * *

 7  * * *

 

So dis1 is a 6500 and core2 in this case is on the BGP speaking 7606's I was
talking about.  Traffic just stops at 98.124.0.226 or the next hop - it's
unclear.  So using this destination for example I jump onto core2 and do a
lookup:

 

core2-rtr-to#sh ip bgp 216.166.249.148

BGP routing table entry for 216.166.248.0/21, version 315975

Paths: (2 available, best #1, table Default-IP-Routing-Table)

  Advertised to update-groups:

     11         13         17         18         19         22         23

  6939 22561

    209.51.163.145 from 98.124.59.17 (76.75.100.59)

      Origin IGP, localpref 100, valid, internal, best

      Community: 11666:1000 11666:1006

  6939 22561

    209.51.163.145 from 98.124.59.25 (76.75.100.59)

      Origin IGP, localpref 100, valid, internal

      Community: 11666:1000 11666:1006

 

You'll see two paths, both valid and both from an iBGP neighbour.  The next
hop of 98.124.59.17 is valid and reachable.

 

If I run a traceroute directly on the core2 7606 box I get timeouts:

 

core2-rtr-to#traceroute 216.166.249.148

 

Type escape sequence to abort.

Tracing the route to 216-166-249-148.clec.peknil.commercial.madisonriver.net
(216.166.249.148)

 

  1  *  *  *

  2  *  *

 

Finally, the MX480 where this transit provider connects I do a traceroute
and it's perfect:

 

paul at core1.toronto1> traceroute 216.166.249.148

traceroute to 216.166.249.148 (216.166.249.148), 30 hops max, 40 byte
packets

 1  gige-g2-20.core1.tor1.he.net (209.51.163.145)  0.458 ms  0.401 ms  0.294
ms

 2  10gigabitethernet1-2.core1.nyc5.he.net (72.52.92.165)  21.863 ms  22.573
ms  24.961 ms

 3  10gigabitethernet1-4.core1.nyc1.he.net (72.52.92.153)  27.827 ms  18.939
ms  25.197 ms

 4  198.32.160.19 (198.32.160.19)  16.381 ms  16.543 ms  16.427 ms

 5  bb-nycmny83-jx9-02-ae0-0.core.centurytel.net (208.110.248.114)  27.572
ms  16.578 ms  16.591 ms

     MPLS Label=521136 CoS=0 TTL=1 S=1

 6  bb-chcgilwu-jx9-02-ae4-0.core.centurytel.net (208.110.248.69)  38.239 ms
38.107 ms  38.254 ms

     MPLS Label=570289 CoS=0 TTL=1 S=1

 7  bb-mrghmoqa-jx9-02-xe-1-1-0.core.lightcore.net (206.51.69.45)  60.820 ms
45.567 ms  45.416 ms

     MPLS Label=656386 CoS=0 TTL=1 S=1

 8  bb-peknilxd-jm1-01-ge-0-1-0-298.core.lightcore.net (206.51.69.238)
51.356 ms  51.256 ms  51.440 ms

 9  peknil-coe-ci7507-01.grics.net (64.40.75.4)  54.189 ms  53.656 ms
54.102 ms

10  209-102-183-102.nworla.commercial.madisonriver.net (209.102.183.102)
63.918 ms  60.269 ms  60.593 ms

 

 

So why is it failing from the Cisco to the Juniper?  I'm pulling my hair
(what I have left) out on this ... and it's only happening to a handful of
routes that we are aware of so far....

 

Thanks,

 

Paul

 



More information about the cisco-nsp mailing list