[c-nsp] Sporadic loss of LDP neighbor ...
Garry
gkg at gmx.de
Mon Dec 12 02:38:56 EST 2011
Hi *,
I've been fighting this problem for quite a while, need some ideas from
the collective intelligence ...
On of our backbone locations has multiple routers that have worked fine
for quite a while ... during the last couple months, we've been
experiencing some sporadic failures in the LAN which I've not been able
to pin-point any logical reason for ...
Basic setup is this ... currently, three 7200 routers (2x NPE300 VXR
[BB1 & 2], 1x NPE150 [BB3] for a couple of L2TP wireless links). We've
added an AS1002F [Core1] to that as new primary router for the location
about a year ago (running a 300M link to our core uplink, 1G dark fiber
link to another backbone location). All of our backbone is running with
MPLS enabled (multiple VRFs for MPLS-VPNs). Everything fine up until
something like 2-3 months ago (don't have an exact date, otherwise it
might be easier to get some correlations to other changes in the configs
or infrastructure). Then it started with sporadic losses of the LAN
interconnections, like this: (log excerpt from BB2)
Dec 11 22:59:31: %LDP-5-NBRCHG: LDP Neighbor [BB1]:0 is DOWN (Received
error notification from peer: Holddown time expired)
Dec 11 22:59:52: %LDP-5-NBRCHG: LDP Neighbor [BB3]:0 is DOWN (Discovery
Hello Hold Timer expired)
Dec 11 23:00:00: %LDP-5-NBRCHG: LDP Neighbor [BB3] is UP
Dec 11 23:00:27: %LDP-5-NBRCHG: LDP Neighbor [BB1]:0 is UP
These interruptions (at least the timestamps between down and up)
sometimes only last 3-4 seconds, the BB1 one above with almost a minute
is just about the longest I've seen to date. Of course this disrupts
routing to a certain degree ... sometimes even bad enough to take down
iBGP/eBGP multihop connections.
Now, at two other backbone locations, we have more or less the identical
setup, without any of these problems. I've already compared interface
configs, but everything seems identical (apart from IP addresses of
course). Problem here is that it's impossible to analyze any of the
problem causes, as for one the problems occur without any predictable
interval, and they're to short to react to the loss of connection in
time ... I've tried activating some debugs on the router, but couldn't
get any helpful information out of it (at least nothing I could identify)
We've recently added an ASR1001 to the site, which (together with the
1002F) will be used to replace two 7200 routers, and already moved about
half of the existing VLANs of the site (~20 of the 40+) to the ASRs.
Didn't change much, though the interval of the interruptions went to
maybe once every 2 or 3 days (from 1-2 per day). One thing I did notice
is that mostly BB1 router is involved, with 1-2 times out of three BB2
also losing LDP connection at the same time, and BB3 usually not showing
any problems reaching either of the Core routers. BB1 and BB2 will also
lose connectivity to each other most of the time, albeit not always. In
attempting to locate the cause, we already moved BB1 to the same switch
as Core1&2, with no results. Needless to say that there are no
disruptions on Layer 2, at least not as far as could be seen in the logs.
If these problems had manifested themselves when we installed the first
ASR, I'd say it's something in the IOS versions that might be
incompatible, but everything ran fine for something like 9 months, so
that shouldn't be it. I've tried going through config diffs from 4-6
months ago and now, but couldn't find any changes that should break MPLS
on the LAN layer.
Anybody have any idea at what might be causing this, or what I should
check into to get to the cause of this problem?
Here's some excerpts from the router configs:
BB1:
interface GigabitEthernet3/0
mtu 1500
no ip redirects
ip route-cache flow
negotiation auto
mpls label protocol ldp
tag-switching mtu 1520
tag-switching ip
BB2: identical settings
Core1:
interface GigabitEthernet0/0/0
no ip redirects
ip flow ingress
negotiation auto
mpls ip
mpls label protocol ldp
mpls mtu 1520
Thanks, Garry
More information about the cisco-nsp
mailing list