[c-nsp] MPLS TE tunnel not working after inserting a hop
Ross Halliday
ross.halliday at wtccommunications.ca
Thu Feb 12 12:03:11 EST 2015
Hi list,
Long time no write, but I see the 6500 chatter is alive and well. That's good because I'd like to pick your brains...
I currently take care of an MPLS network running on a pile of SUP720-3B Cisco 6509s (Well, there's an NPE-G2 in there, but that's not important) running a mix of 12.2(33)SXI4a, 15.1(2)SY, and 15.1(2)SY2. The underlying IGP is ISIS, and LDP is the primary method of setting up LSPs.
We have two sites which we run some fun VoIP stuff on that are directly connected with fiber. These two 6509s also peer into the "ring". For the purpose of this discussion, the network was laid out like this:
P202---P203
/ \
P201 P7
| |
P4------\ P0--PE0
\ \ / |
P3-----P2--/ |
\ /
P6---PE6
Where PE0 and PE6 are the two 6509s that have the VoIP gear. Requirement from the VoIP equipment vendor was that we provide a "primary" and "protect" path for heartbeat between some equipment at each site, and tie them together with simple EoMPLS port x-connects. Instead of doing a couple tunnels with FRR, I built two TE tunnels using distinct paths. The "primary" tunnel just goes between the two PEs. The "protect" tunnel goes all the way around the ring. This is done with explicit paths of routers' Loopback0s on Tunnel interfaces and works great. Very cookie-cutter setup.
So... the problem:
Yesterday, turned up our first 10 gigabit link in our core. Since we're limited on available fiber and using "regular" X2s we had to do away with the direct link between P2 and P4, and wound up with P2-P3 and P3-P4. P2-P3 is the original gigabit stuff we were using before. P3-P4 is with drivers in 6708-10GE cards:
P202---P203
/ \
P201 P7
| |
P4 P0--PE0
\ / |
P3-----P2--/ |
\ /
P6---PE6
The basic migration went fine and all dynamic LSPs negotiated through LDP seem to have come back up. Or rather - everything works as expected.
However, after adding the new hop into the explicit-path list, the TE tunnels are not coming back up. I've been staring at debugs for the last day or so and I do not understand why. On both PEs, debug for MPLS traffic engineering path lookup and verification shows the correct next hops, all interface addresses, everything. The path validates and is approved. I have been receiving an error message that tunnel setup timed out.
>From PE0:
Feb 12 11:37:31.806: TE-SIG-LM: __PE6 IP___225->__PE0 IP___12299 {7}: received DELETE RESV request
Feb 12 11:37:31.806: TE-SIG-LM: __PE6 IP___225->__PE0 IP___12299 {7}: path previous hop is 10.42.9.74 (Gi6/2)
Feb 12 11:37:31.806: LSP-TUNNEL-LABELS: tunnel __PE6 IP___225->__PE0 IP___12299 {7}: fabric UNPROGRAM request
Feb 12 11:37:31.806: LSP-TUNNEL-LABELS: tunnel __PE6 IP___225->__PE0 IP___12299 {7}: unprogramming label implicit-null on input interface GigabitEthernet6/2
Feb 12 11:37:31.806: TE-EVENTS-LBLS: descriptor D541A4: continuing "Unprogram" request
Feb 12 11:37:31.806: TE-EVENTS-LBLS: descriptor D541A4: set "Fabric State" to, none
Feb 12 11:37:31.806: TE-EVENTS-LBLS: descriptor D541A4: succeeded "Unprogram" request
Feb 12 11:37:31.806: TE-EVENTS-HE: tunnel __PE6 IP___225->__PE0 IP___12299 {7}: "Connected" -> "Disconnected"
Feb 12 11:37:31.806: LSP-TUNNEL-LABELS: tunnel __PE6 IP___225->__PE0 IP___12299 {7}: fabric UNPROGRAM reply
Feb 12 11:37:31.806: TE-SIG-LM: __PE6 IP___225->__PE0 IP___12299 {7}: sending DELETE RESV reply
Feb 12 11:37:31.806: TE-SIG-LM: __PE6 IP___225->__PE0 IP___12299 {7}: received RESV DESTROY event
Feb 12 11:37:31.806: TE-SIG-LM: __PE6 IP___225->__PE0 IP___12299 {7}: RSVP tail-end close
Feb 12 11:37:31.806: TE-EVENTS-HE: __PE6 IP___225->__PE0 IP___12299 {7}: Deleting tspt_desct_t tunnel
Feb 12 11:37:31.806: TE-EVENTS-HE: tunnel __PE6 IP___225->__PE0 IP___12299 {7}: "Disconnected" -> "Dead"
Feb 12 11:37:31.806: TE-SIG-LM: __PE6 IP___225->__PE0 IP___12299 {7}: received PATH TAIL DELETION event
Feb 12 11:37:31.806: TE-SIG-LM: tunnel path/reservation teardown failed: Tunnel not found (state may have been deleted already)
...
Feb 12 11:39:12.390: TE-SIG-HE: Tunnel12299 [228]: setup timed out (unprotected)
Feb 12 11:39:12.390: %MPLS_TE-5-TUN: Tun12299: installed LSP nil for 12299_228 (popt 1), setup timed out
Feb 12 11:39:12.390: %MPLS_TE-5-TUN: Tun12299: LSP path change nil for 12299_228, setup timed out
Feb 12 11:39:12.390: TE-SIG-HE: Tunnel12299 [228]->__PE6 IP__: RSVP head-end close
Feb 12 11:39:12.390: TE-SIG-HE: Tunnel12299 [0]: Attempting to activate
Feb 12 11:39:12.394: TE-SIG-HE: Tunnel12299 [229]->__PE6 IP__: RSVP head-end open
Feb 12 11:39:12.394: TE-SIG-HE: Tunnel12299 [229]: Activation succeeded
Feb 12 11:39:12.394: %MPLS_TE-5-TUN: Tun12299: installed LSP 12299_229 (popt 1) for nil, got 1st feasible path opt
Feb 12 11:39:12.578: TE-SIG-LM: __PE6 IP___227->__PE0 IP___12299 {7}: received NEW PATH TAIL ARRIVAL event
Feb 12 11:39:12.578: TE-EVENTS-HE: Allocating tspt_desct_t tunnel, id: 1451
Feb 12 11:39:12.578: TE-SIG-LM: __PE6 IP___227->__PE0 IP___12299 {7}: RSVP tail-end open
Feb 12 11:39:12.578: TE-SIG-LM:__PE6 IP___227->__PE0 IP___12299 {7} NEW PATH TAIL ARRIVAL event handled successfully
Feb 12 11:39:12.582: TE-SIG-LM: __PE6 IP___227->__PE0 IP___12299 {7}: received ADD RESV request
Feb 12 11:39:12.582: TE-SIG-LM: __PE6 IP___227->__PE0 IP___12299 {7}: path previous hop is 10.42.9.74 (Gi6/2)
Feb 12 11:39:12.582: LSP-TUNNEL-LABELS: tunnel __PE6 IP___227->__PE0 IP___12299 {7}: fabric PROGRAM request
Feb 12 11:39:12.582: LSP-TUNNEL-LABELS: tunnel __PE6 IP___227->__PE0 IP___12299 {7}: programming label implicit-null on input interface GigabitEthernet6/2
Feb 12 11:39:12.582: TE-EVENTS-LBLS: descriptor D541A4: continuing "Program" request
Feb 12 11:39:12.582: TE-EVENTS-LBLS: descriptor D541A4: set "Fabric State" to, enabled
Feb 12 11:39:12.582: TE-EVENTS-LBLS: descriptor D541A4: succeeded "Program" request
Feb 12 11:39:12.582: TE-EVENTS-HE: tunnel __PE6 IP___227->__PE0 IP___12299 {7}: "Dead" -> "Connected"
Feb 12 11:39:12.582: LSP-TUNNEL-LABELS: tunnel __PE6 IP___227->__PE0 IP___12299 {7}: fabric PROGRAM reply
Feb 12 11:39:12.582: TE-SIG-LM: __PE6 IP___227->__PE0 IP___12299 {7}: sending ADD RESV reply
>From PE6:
Feb 12 11:36:53.118: TE-SIG-HE: Tunnel12299: Activation failed, reason: Generic error
...
Feb 12 11:39:12.493: TE-SIG-HE: Tunnel12299 [226]: setup timed out (unprotected)
Feb 12 11:39:12.493: %MPLS_TE-5-TUN: Tun12299: installed LSP nil for 12299_226 (popt 1), setup timed out
Feb 12 11:39:12.493: %MPLS_TE-5-TUN: Tun12299: LSP path change nil for 12299_226, setup timed out
Feb 12 11:39:12.493: TE-SIG-HE: Tunnel12299 [226]->__PE0 IP__: RSVP head-end close
Feb 12 11:39:12.493: TE-SIG-HE: Tunnel12299 [0]: Attempting to activate
Feb 12 11:39:12.493: TE-SIG-HE: Tunnel12299 [227]->__PE0 IP__: RSVP head-end open
Feb 12 11:39:12.497: TE-SIG-HE: Tunnel12299 [227]: Activation succeeded
Feb 12 11:39:12.497: %MPLS_TE-5-TUN: Tun12299: installed LSP 12299_227 (popt 1) for nil, got 1st feasible path opt
I've tried tearing down and recreating the tunnels and explicit paths, I create a second one to mess with with new explicit-path and tunnel interface and it does the same thing. Made a "short" path just for testing that went PE0-P0-P2-P6-PE6 (and vice-versa) to test and that works just fine. So considering that everything *ELSE* works, and everything worked *BEFORE* introducing P3, I'm assuming something is wrong with P3 itself, or one of its links.
Obviously, the cause is "Generic error".
RSVP working but the LSP creation not is leading me to believe that there's some sort of label-related problem. Looking at the brief TE tunnel list on each end hints that there is a unidirectional issue:
voip-(PE0)-c6509#sh mpls traffic-eng tun br
Signalling Summary:
LSP Tunnels Process: running
Passive LSP Listener: running
RSVP Process: running
Forwarding: enabled
Periodic reoptimization: every 3600 seconds, next in 106 seconds
Periodic FRR Promotion: Not Running
Periodic auto-bw collection: every 300 seconds, next in 159 seconds
P2P TUNNELS/LSPs:
TUNNEL NAME DESTINATION UP IF DOWN IF STATE/PROT
voip-(PE0)-c6509_t12100 __PE6 IP__ - Gi5/1 up/up
voip-(PE0)-c6509_t12200 __PE6 IP__ - Gi5/1 up/up
v v v v v
voip-(PE0)-c6509_t12299 __PE6 IP__ - - up/down
^ ^ ^ ^ ^
voip-(PE0)-c6509_t19922 __PE6 IP__ - Gi6/2 up/up
voip-(PE6)-c6509_t12100 __PE0 IP__ Gi5/1 - up/up
voip-(PE6)-c6509_t12200 __PE0 IP__ Gi5/1 - up/up
v v v v v
voip-(PE6)-c6509_t12299 __PE0 IP__ Gi6/2 - up/up
^ ^ ^ ^ ^
voip-(PE6)-c6509_t19922 __PE0 IP__ Gi6/2 - up/up
Displayed 4 (of 4) heads, 0 (of 0) midpoints, 4 (of 4) tails
voip-(PE6)-c6509#sh mpls traffic-eng tun br
Signalling Summary:
LSP Tunnels Process: running
Passive LSP Listener: running
RSVP Process: running
Forwarding: enabled
Periodic reoptimization: every 3600 seconds, next in 3231 seconds
Periodic FRR Promotion: Not Running
Periodic auto-bw collection: every 300 seconds, next in 281 seconds
P2P TUNNELS/LSPs:
TUNNEL NAME DESTINATION UP IF DOWN IF STATE/PROT
voip-(PE6)-c6509_t12100 __PE0 IP__ - Gi5/1 up/up
voip-(PE6)-c6509_t12200 __PE0 IP__ - Gi5/1 up/up
v v v v v
voip-(PE6)-c6509_t12299 __PE0 IP__ - - up/down
^ ^ ^ ^ ^
voip-(PE6)-c6509_t19922 __PE0 IP__ - Gi6/1 up/up
voip-(PE0)-c6509_t12100 __PE6 IP__ Gi5/1 - up/up
voip-(PE0)-c6509_t12200 __PE6 IP__ Gi5/1 - up/up
voip-(PE0)-c6509_t19922 __PE6 IP__ Gi6/1 - up/up
Displayed 4 (of 4) heads, 0 (of 0) midpoints, 3 (of 3) tails
Note the marked absence of voip-(PE0)-c6509_t12299 on PE6... and for some reason PE0 thinks voip-(PE6)-c6509_t12299 is up???
To the best of my ability I cannot find any configuration issues with ISIS, LDP, MPLS, or the interface configurations on the router P3. Everything has TE enabled where it should be, MTU tests good, ISIS wide metrics are enabled, dynamic LDP-managed traffic in VRFs work great. Most of the configuration is just copy-paste with different IPs.
So all I have to go on is:
- Something's wrong with P3
- Something's wrong with the 6708-10GE card(s) or their DFCs
- 12.2(33)SXI4a does not like the 6708-10GE cards or their DFCs
After trying everything else I can think of, my next plan of attack is to "downgrade" the new 10 gig link between P3 and P4 and go back to SFPs off of the SUPs. After that it's upgrade IOS time.
Can anybody here think of a reason this could be happening? Can send configs and debug output etc, figured this email was long enough as it is.
Thanks muchly!
Ross
More information about the cisco-nsp
mailing list