[c-nsp] MPLS LDP and BGP Neighbor flapping constantly
David Freedman
david.freedman at uk.clara.net
Thu Mar 5 13:05:52 EST 2009
You appear to have a high number of input queue drops and input errors,
granted the counters have never been cleared, do you haver any PPS
graphs of the link between these two boxes? I would suspect a traffic
spike or link fault causing control messages to be dropped being the
cause here.
Dave.
Justin Shore wrote:
> This afternoon I stumbled across a problem with a LDP session between a
> 7613 and a 7201. Actually both LDP and iBGP were flapping every 10
> seconds or so. I had both interfaces configured for MPLS, LDP, IS-IS
> (with AUTH and BFD though BFD isn't enabled on the interface itself yet)
> with an interface MTU of 9000 and CLNS MTU of 1496. Nothing too fancy.
> The systems as a whole are configured with MPLS graceful-restart, LDP,
> no mpls ip propagate-ttl, and LDP router-ID on a loopback:
>
> # 7201
> mpls label protocol ldp
> no mpls ip propagate-ttl
> mpls ldp graceful-restart
> mpls ldp router-id Loopback0 force
>
> # 7613
> mls mpls tunnel-recir
> mpls traffic-eng tunnels
> mpls ldp graceful-restart
> no mpls ip propagate-ttl
> mpls label protocol ldp
> mpls ldp router-id Loopback0 force
>
> This morning at 7:05 the router stopped responding to SNMP queries for
> about 15m. The load was about 13 before. Cacti shows the load doubling
> in the 10m prior to the 15m of nothing. When it came back the load was
> just shy of 50 and stayed there for about 30m. After that it stayed at
> around 30-35 for the next 7.5hrs before I noticed the BGP flapping issue
> and shutdown the peer for troubleshooting. The load dropped back to
> around 16, higher than it was before the hiccup this morning. I'm at a
> loss to adequately explain why the load has been so jacked. I think the
> 30-35 load was because BGP flapping and the slightly higher load now is
> due to the LDP flapping issue. That's my best guess.
>
> Anyone know how to troubleshoot a LDP neighbor flapping issue? The 7613
> is logging this:
>
> 730278: Mar 4 20:43:48.696 CST: LDP GR: Received FT Sess TLV from
> 10.64.0.34:0 (fl 0x1, rs 0x0, rconn 0, rcov 120000)
> 730279: Mar 4 20:43:48.696 CST: LDP GR: MFI cutover wait delay =
> 600000, Forwarding State Hold Timer = 600000
> 730280: Mar 4 20:43:48.696 CST: LDP GR: searching for down nbr record
> (10.64.0.34:0, 10.64.0.178)
> 730281: Mar 4 20:43:48.696 CST: LDP GR: Added FT Sess TLV (Rconn
> 120000, Rcov 0) to INIT msg to 10.64.0.34:0
>
> The 7201 is logging this:
>
> 054705: Mar 5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: lost
> 054706: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: created
> [1 total]
> 054707: Mar 5 00:28:19 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst.
> 3): interrupted--recovery pending
> 054708: Mar 5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0::
> bindings retained
> 054709: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: state
> change (None -> Reconnect-Wait)
> 054710: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0::
> reconnect timer started [120000 msecs]
> 054711: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: added
> to bindings task queue [1 entries]
> 054712: Mar 5 00:28:19 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0
> (0) is DOWN (Received error notification from peer: Shut down)
>
> 054713: Mar 5 00:28:25.923 CST: LDP GR: searching for down nbr record
> (10.64.0.20:0, 10.64.0.179)
> 054714: Mar 5 00:28:25.923 CST: LDP GR: search for down nbr record
> (10.64.0.20:0, 10.64.0.179) returned 10.64.0.20:0
> 054715: Mar 5 00:28:25.923 CST: LDP GR: Added FT Sess TLV (Rconn 0,
> Rcov 120000) to INIT msg to 10.64.0.20:0
> 054716: Mar 5 00:28:25.947 CST: LDP GR: Received FT Sess TLV from
> 10.64.0.20:0 (fl 0x1, rs 0x0, rconn 120000, rcov 0)
> 054717: Mar 5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0::
> established
> 054718: Mar 5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: found
> down nbr 10.64.0.20:0
> 054719: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0::
> reconnect timer stopped
> 054720: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: state
> change (Reconnect-Wait -> Recovering)
> 054721: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0::
> recovery timer started [1 msecs]
> 054722: Mar 5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst.
> 4): starting graceful recovery
> 054723: Mar 5 00:28:25 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0
> (4) is UP
> 054724: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0::
> recovery timer expired
> 054725: Mar 5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst.
> 4): completed graceful recovery
> 054726: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0::
> destroying record [0 left]
> 054727: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: state
> change (Recovering -> Delete-Wait)
>
> 054728: Mar 5 00:28:28.091 CST: LDP GR: Tagcon querying for up to 12
> bindings update tasks [table 0]
> 054729: Mar 5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0::
> requesting bindings DEL for {10.64.0.20:0, 3}
> 054730: Mar 5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: removed
> from bindings task queue [0 entries]
> 054731: Mar 5 00:28:28.091 CST: LDP GR: Requesting 1 bindings update
> tasks [0 left in queue]
>
> 10.64.0.20 is a loopback on the 7613 and 10.64.0.34 is a loopback on the
> 7201.
>
> I do have some interface errors which I also can't explain. They do not
> appear to be incrementing though. 7613:
>
> GigabitEthernet9/1 is up, line protocol is up (connected)
> Hardware is C6k 1000Mb 802.3, address is 001a.3063.0a80 (bia
> 001a.3063.0a80)
> Description: TO 2821-2.dc Gi0/0
> Internet address is 10.64.0.179/31
> MTU 9000 bytes, BW 1000000 Kbit, DLY 10 usec,
> reliability 255/255, txload 1/255, rxload 1/255
> Encapsulation ARPA, loopback not set
> Keepalive set (10 sec)
> Full-duplex, 1000Mb/s
> input flow-control is off, output flow-control is off
> Clock mode is auto
> ARP type: ARPA, ARP Timeout 04:00:00
> Last input 00:00:02, output 00:00:00, output hang never
> Last clearing of "show interface" counters never
> Input queue: 0/75/1936665/7581 (size/max/drops/flushes); Total output
> drops: 4
> Queueing strategy: fifo
> Output queue: 0/40 (size/max)
> 5 minute input rate 49000 bits/sec, 17 packets/sec
> 5 minute output rate 56000 bits/sec, 24 packets/sec
> L2 Switched: ucast: 52903876 pkt, 3771470311 bytes - mcast: 15056043
> pkt, 1653756471 bytes
> L3 in Switched: ucast: 80170438 pkt, 12709078926 bytes - mcast: 0 pkt,
> 0 bytes mcast
> L3 out Switched: ucast: 185161821 pkt, 36022953056 bytes mcast: 0 pkt,
> 0 bytes
> 150040994 packets input, 30087625055 bytes, 0 no buffer
> Received 15660647 broadcasts (0 IP multicasts)
> 30 runts, 4247159 giants, 0 throttles
> 1929071 input errors, 68 CRC, 0 frame, 13 overrun, 0 ignored
> 0 watchdog, 0 multicast, 0 pause input
> 0 input packets with dribble condition detected
> 257650143 packets output, 64726258058 bytes, 0 underruns
> 2 output errors, 0 collisions, 2 interface resets
> 0 babbles, 0 late collision, 0 deferred
> 0 lost carrier, 0 no carrier, 0 PAUSE output
> 0 output buffer failures, 0 output buffers swapped out
>
> 7201:
> GigabitEthernet0/0 is up, line protocol is up
> Hardware is MV64460 Internal MAC, address is 0023.5ee9.ac1b (bia
> 0023.5ee9.ac1b)
> Description: TO 7613-2.clr Gi9/1
> Internet address is 10.64.0.178/31
> MTU 9000 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
> reliability 255/255, txload 1/255, rxload 1/255
> Encapsulation ARPA, loopback not set
> Keepalive set (10 sec)
> Full-duplex, 1000Mb/s, media type is RJ45
> output flow-control is XON, input flow-control is unsupported
> ARP type: ARPA, ARP Timeout 04:00:00
> Last input 00:00:00, output 00:00:00, output hang never
> Last clearing of "show interface" counters never
> Input queue: 0/75/3951/0 (size/max/drops/flushes); Total output drops: 6
> Queueing strategy: fifo
> Output queue: 0/40 (size/max)
> 5 minute input rate 45000 bits/sec, 19 packets/sec
> 5 minute output rate 64000 bits/sec, 13 packets/sec
> 51466122 packets input, 1916487584 bytes, 0 no buffer
> Received 1891956 broadcasts, 0 runts, 0 giants, 0 throttles
> 5 input errors, 0 CRC, 0 frame, 0 overrun, 5 ignored
> 0 watchdog, 2247902 multicast, 0 pause input
> 0 input packets with dribble condition detected
> 32927369 packets output, 1549013167 bytes, 0 underruns
> 8 output errors, 0 collisions, 1 interface resets
> 23 unknown protocol drops
> 23 unknown protocol drops
> 0 babbles, 0 late collision, 0 deferred
> 8 lost carrier, 0 no carrier, 0 pause output
> 0 output buffer failures, 0 output buffers swapped out
>
>
> Any thoughts as to what's going on here? I can't tell for certain which
> of the 2 routers is causing LDP and BGP to drop. Knowing that would
> help me narrow my troubleshooting focus. The 7600 is running SRB1 and
> the 7201 is running 12.4(15)T7.
>
> Thanks
> Justin
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
More information about the cisco-nsp
mailing list