[c-nsp] 7600 HFIB bug?

Thu Jul 28 19:01:32 EDT 2011

I'm not an mpls person, but I know how I would start to debug it. I would save the current config, and then setup the simplest possible configuration (no nat, minimum mpls, no hold-queues, no rsvp, no pim, no routing, just a single p2p link. Test the hell out of it. Then setup some static routes and push some traffic through it. Then add feature by feature back testing heavily each step.

If you run into a bug, you might want to look at the latest SRE train.

From: Persio Pucci [mailto:persio at gmail.com]
Sent: Thursday, July 28, 2011 3:53 PM
To: Matthew Huff
Cc: cisco-nsp at puck.nether.net
Subject: Re: [c-nsp] 7600 HFIB bug?

Matthew,

clear arp, clear ip route, clear cef, nothing helps, I have even reloaded SP and Rio routers during a window, tired of this, and still it won't work.

I am running 12.2(33r)SRB4 on a RSP720-3CXL-GE. Interfaces have MPLS running. And Multicast, but so they did before when it worked.

Before it broke, MPLS was on both to_SP and to_NY interfaces, and I had a MPLS-TE tunnel from SP to NY. When it broke, the tunnel would not work and I had to remove it, remove MPLS from the to_NY interface, and make Rio a BGP hop for both SP and NY to resume communications.

The weirdest part is that when we first brought up the 7600, it was working OK. But then we had a hit on our Rio/SP circuit, and when it cam back, it never worked again.

This is the to_SP interface

interface POS4/1/0
 description * TO SPO * ACTIVE
 ip address X.X.X.X 255.255.255.252
 ip nat outside
 ip router isis
 ip pim sparse-dense-mode
 mpls traffic-eng tunnels
 mpls ldp discovery transport-address X.X.X.X
 mpls label protocol ldp
 mpls ip
 crc 32
 pos framing sdh
 pos scramble-atm
 aps group 20
 aps working 1
 hold-queue 4096 in
 hold-queue 4096 out
 ip rsvp bandwidth 100000 100000
end

This is to NY

interface GigabitEthernet1/2
 description * TO NY*1 *
 ip address X.X.X.X 255.255.255.252 secondary
 ip address X.X.X.X 255.255.255.240
 ip access-group 123 out
 no ip redirects
 ip router isis
 load-interval 30
 mpls mtu 1524
 mpls traffic-eng tunnels
 mpls ldp discovery transport-address X.X.X.X
 mpls label protocol ldp
 mpls ip
 spanning-tree link-type point-to-point
 hold-queue 4096 in
 ip rsvp bandwidth 100000 100000
end

ACL 123 is the one I have in place in the meanwhile punting the packets I really need to go through:

access-list 123 permit ip any X.X.XX 0.0.0.255 log
access-list 123 permit ip X.X.X.X 0.0.0.255 any log
access-list 123 permit ip any host X.X.X.X log
access-list 123 permit ip host X.X.X.X any log
access-list 123 permit ip any X.X.X.X 0.0.0.3 log
access-list 123 permit ip X.X.X.X 0.0.0.3 any log
access-list 123 permit ip any any

On Thu, Jul 28, 2011 at 4:37 PM, Matthew Huff <mhuff at ox.com<mailto:mhuff at ox.com>> wrote:
It's very possible the fib is correct, but should be correctable by doing a "clear arp" and a "clear ip route *". What IOS are you running and what sup engine do you have? Also, what does "show ip cef exact-route source_ip dest_ip" show?

Are there anything else "interesting" configured? MPLS, PBR, i.e., what does the interface config look like?

----
Matthew Huff             | 1 Manhattanville Rd
Director of Operations   | Purchase, NY 10577
OTA Management LLC       | Phone: 914-460-4039
aim: matthewbhuff        | Fax:   914-460-4139

-----Original Message-----
From: cisco-nsp-bounces at puck.nether.net<mailto:cisco-nsp-bounces at puck.nether.net> [mailto:cisco-nsp-bounces at puck.nether.net<mailto:cisco-nsp-bounces at puck.nether.net>] On Behalf Of Persio Pucci
Sent: Thursday, July 28, 2011 3:23 PM
To: cisco-nsp at puck.nether.net<mailto:cisco-nsp at puck.nether.net>
Subject: [c-nsp] 7600 HFIB bug?

Hi all. I am new to the list and this is my first post. :)

Trying to get to the bottom of a situation, sans-TAC. Long story short, for
context sake, I had a 7300 that was replaced by a 7600 at my Rio de Janeiro
site connecting to SP and NY.

(SP --- RIO --- NY)

Everything was working fine by the time we were finishing replacing the box,
when our circuit to Sao Paulo was hit and stayed down for about 6 hours.
When the circuit came back up, some communication to NY was just simply not
working, the SP rotuer could not reach, for whatever reason, IP addresses
that were reachable after replacing the box, before the hit. It used to work
on a TE tunnel I had to remove and make Rio a BGP hop to put it to work
while I tried to figure wtf was going on. Ever since, I can ping NY's IP
address from Rio, but cannot from SP, altough all routing is in place
(ISIS), all CEF entries are there.

Well, after a few weeks working on this when time was allowed, I came to a
intriguing situation today, while working with the help of a friend. I was
trying to debug this by using a permit ACL with log-input on the Rio
interfaces and see what was going on. When I applied the ACL on the
interfaces (ip permit x x log-input, ip permit any any), things started
working, and I was again able to ping from SP to NY. If I remove the ACL, I
cease to ping NY from SP.

I seems like something is borked at the 7600, cause the packets won't go
through if they are CEF switched, but they will when they are punted to the
CPU for the logging. Lookis like some FIB/HFIB issue that is beyond
my comprehension.

Any ideas besides going to TAC? Tks!
_______________________________________________
cisco-nsp mailing list  cisco-nsp at puck.nether.net<mailto:cisco-nsp at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/