[c-nsp] MPLS LDP and BGP Neighbor flapping constantly

Justin Shore justin at justinshore.com
Thu Mar 5 01:34:25 EST 2009


This afternoon I stumbled across a problem with a LDP session between a 
7613 and a 7201.  Actually both LDP and iBGP were flapping every 10 
seconds or so.  I had both interfaces configured for MPLS, LDP, IS-IS 
(with AUTH and BFD though BFD isn't enabled on the interface itself yet) 
with an interface MTU of 9000 and CLNS MTU of 1496.  Nothing too fancy. 
  The systems as a whole are configured with MPLS graceful-restart, LDP, 
no mpls ip propagate-ttl, and LDP router-ID on a loopback:

# 7201
mpls label protocol ldp
no mpls ip propagate-ttl
mpls ldp graceful-restart
mpls ldp router-id Loopback0 force

# 7613
mls mpls tunnel-recir
mpls traffic-eng tunnels
mpls ldp graceful-restart
no mpls ip propagate-ttl
mpls label protocol ldp
mpls ldp router-id Loopback0 force

This morning at 7:05 the router stopped responding to SNMP queries for 
about 15m.  The load was about 13 before.  Cacti shows the load doubling 
in the 10m prior to the 15m of nothing.  When it came back the load was 
just shy of 50 and stayed there for about 30m.  After that it stayed at 
around 30-35 for the next 7.5hrs before I noticed the BGP flapping issue 
and shutdown the peer for troubleshooting.  The load dropped back to 
around 16, higher than it was before the hiccup this morning.  I'm at a 
loss to adequately explain why the load has been so jacked.  I think the 
30-35 load was because BGP flapping and the slightly higher load now is 
due to the LDP flapping issue.  That's my best guess.

Anyone know how to troubleshoot a LDP neighbor flapping issue?  The 7613 
is logging this:

730278: Mar  4 20:43:48.696 CST: LDP GR: Received FT Sess TLV from 
10.64.0.34:0  (fl 0x1, rs 0x0, rconn 0, rcov 120000)
730279: Mar  4 20:43:48.696 CST: LDP GR: MFI cutover wait delay = 
600000, Forwarding State Hold Timer = 600000
730280: Mar  4 20:43:48.696 CST: LDP GR: searching for down nbr record 
(10.64.0.34:0, 10.64.0.178)
730281: Mar  4 20:43:48.696 CST: LDP GR: Added FT Sess TLV (Rconn 
120000, Rcov 0) to INIT msg to 10.64.0.34:0

The 7201 is logging this:

054705: Mar  5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: lost
054706: Mar  5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: created 
[1 total]
054707: Mar  5 00:28:19 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. 
3): interrupted--recovery pending
054708: Mar  5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: 
bindings retained
054709: Mar  5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: state 
change (None -> Reconnect-Wait)
054710: Mar  5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: 
reconnect timer started [120000 msecs]
054711: Mar  5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: added 
to bindings task queue [1 entries]
054712: Mar  5 00:28:19 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0 
(0) is DOWN (Received error notification from peer: Shut down)

054713: Mar  5 00:28:25.923 CST: LDP GR: searching for down nbr record 
(10.64.0.20:0, 10.64.0.179)
054714: Mar  5 00:28:25.923 CST: LDP GR: search for down nbr record 
(10.64.0.20:0, 10.64.0.179) returned 10.64.0.20:0
054715: Mar  5 00:28:25.923 CST: LDP GR: Added FT Sess TLV (Rconn 0, 
Rcov 120000) to INIT msg to 10.64.0.20:0
054716: Mar  5 00:28:25.947 CST: LDP GR: Received FT Sess TLV from 
10.64.0.20:0  (fl 0x1, rs 0x0, rconn 120000, rcov 0)
054717: Mar  5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: 
established
054718: Mar  5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: found 
down nbr 10.64.0.20:0
054719: Mar  5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: 
reconnect timer stopped
054720: Mar  5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: state 
change (Reconnect-Wait -> Recovering)
054721: Mar  5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: 
recovery timer started [1 msecs]
054722: Mar  5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. 
4): starting graceful recovery
054723: Mar  5 00:28:25 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0 
(4) is UP
054724: Mar  5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: 
recovery timer expired
054725: Mar  5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. 
4): completed graceful recovery
054726: Mar  5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: 
destroying record [0 left]
054727: Mar  5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: state 
change (Recovering -> Delete-Wait)

054728: Mar  5 00:28:28.091 CST: LDP GR: Tagcon querying for up to 12 
bindings update tasks [table 0]
054729: Mar  5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: 
requesting bindings DEL for {10.64.0.20:0, 3}
054730: Mar  5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: removed 
from bindings task queue [0 entries]
054731: Mar  5 00:28:28.091 CST: LDP GR: Requesting 1 bindings update 
tasks [0 left in queue]

10.64.0.20 is a loopback on the 7613 and 10.64.0.34 is a loopback on the 
7201.

I do have some interface errors which I also can't explain.  They do not 
appear to be incrementing though.  7613:

GigabitEthernet9/1 is up, line protocol is up (connected)
   Hardware is C6k 1000Mb 802.3, address is 001a.3063.0a80 (bia 
001a.3063.0a80)
   Description: TO 2821-2.dc Gi0/0
   Internet address is 10.64.0.179/31
   MTU 9000 bytes, BW 1000000 Kbit, DLY 10 usec,
      reliability 255/255, txload 1/255, rxload 1/255
   Encapsulation ARPA, loopback not set
   Keepalive set (10 sec)
   Full-duplex, 1000Mb/s
   input flow-control is off, output flow-control is off
   Clock mode is auto
   ARP type: ARPA, ARP Timeout 04:00:00
   Last input 00:00:02, output 00:00:00, output hang never
   Last clearing of "show interface" counters never
   Input queue: 0/75/1936665/7581 (size/max/drops/flushes); Total output 
drops: 4
   Queueing strategy: fifo
   Output queue: 0/40 (size/max)
   5 minute input rate 49000 bits/sec, 17 packets/sec
   5 minute output rate 56000 bits/sec, 24 packets/sec
   L2 Switched: ucast: 52903876 pkt, 3771470311 bytes - mcast: 15056043 
pkt, 1653756471 bytes
   L3 in Switched: ucast: 80170438 pkt, 12709078926 bytes - mcast: 0 
pkt, 0 bytes mcast
   L3 out Switched: ucast: 185161821 pkt, 36022953056 bytes mcast: 0 
pkt, 0 bytes
      150040994 packets input, 30087625055 bytes, 0 no buffer
      Received 15660647 broadcasts (0 IP multicasts)
      30 runts, 4247159 giants, 0 throttles
      1929071 input errors, 68 CRC, 0 frame, 13 overrun, 0 ignored
      0 watchdog, 0 multicast, 0 pause input
      0 input packets with dribble condition detected
      257650143 packets output, 64726258058 bytes, 0 underruns
      2 output errors, 0 collisions, 2 interface resets
      0 babbles, 0 late collision, 0 deferred
      0 lost carrier, 0 no carrier, 0 PAUSE output
      0 output buffer failures, 0 output buffers swapped out

7201:
GigabitEthernet0/0 is up, line protocol is up
   Hardware is MV64460 Internal MAC, address is 0023.5ee9.ac1b (bia 
0023.5ee9.ac1b)
   Description: TO 7613-2.clr Gi9/1
   Internet address is 10.64.0.178/31
   MTU 9000 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
      reliability 255/255, txload 1/255, rxload 1/255
   Encapsulation ARPA, loopback not set
   Keepalive set (10 sec)
   Full-duplex, 1000Mb/s, media type is RJ45
   output flow-control is XON, input flow-control is unsupported
   ARP type: ARPA, ARP Timeout 04:00:00
   Last input 00:00:00, output 00:00:00, output hang never
   Last clearing of "show interface" counters never
   Input queue: 0/75/3951/0 (size/max/drops/flushes); Total output drops: 6
   Queueing strategy: fifo
   Output queue: 0/40 (size/max)
   5 minute input rate 45000 bits/sec, 19 packets/sec
   5 minute output rate 64000 bits/sec, 13 packets/sec
      51466122 packets input, 1916487584 bytes, 0 no buffer
      Received 1891956 broadcasts, 0 runts, 0 giants, 0 throttles
      5 input errors, 0 CRC, 0 frame, 0 overrun, 5 ignored
      0 watchdog, 2247902 multicast, 0 pause input
      0 input packets with dribble condition detected
      32927369 packets output, 1549013167 bytes, 0 underruns
      8 output errors, 0 collisions, 1 interface resets
      23 unknown protocol drops
      23 unknown protocol drops
      0 babbles, 0 late collision, 0 deferred
      8 lost carrier, 0 no carrier, 0 pause output
      0 output buffer failures, 0 output buffers swapped out


Any thoughts as to what's going on here?  I can't tell for certain which 
of the 2 routers is causing LDP and BGP to drop.  Knowing that would 
help me narrow my troubleshooting focus.  The 7600 is running SRB1 and 
the 7201 is running 12.4(15)T7.

Thanks
  Justin



More information about the cisco-nsp mailing list