[c-nsp] MPLS LDP and BGP Neighbor flapping constantly
Justin Shore
justin at justinshore.com
Mon Mar 9 12:30:33 EDT 2009
This message slipped through the cracks. It leads me to giving an
update on the problem though.
I worked with TAC to troubleshoot the issue last week. The TAC engineer
also noticed the giants on the 7600's side. He tried sending large
ICMPs through to the 7600 from the 7201. Nothing over 1508 would pass
even though the interface MTU was 9000 on both sides (and the IP MTU
followed). Even sending ICMPs WITHOUT df set still resulted in a
failure. We dropped the MTU to 1500 and suddenly we could send large
ICMPs that needed to be fragged. Very weird. It gets weirder though.
Prior to calling TAC I upgraded the code on another 7201 that's
dual-homed to both 7613s in the core. As soon as I reloaded that 7201
LDP on it also started flapping to BOTH 7600s (the original 7201 was
only single-homed to one 7600). BGP appears to be unaffected on this
7201. So now I have 2 7201s with constantly flapping LDP neighbors.
The 2nd 7201 also can't ping either 7600 with large ICMPs. However, and
this is weird, BOTH 7600s can ping the loopback on the 7201 with 9000
byte ICMPs.
When I wrote that last sentence it got me thinking. I was pinging from
the 7201s to Lo0 on the 7600s. Large ICMPs weren't getting there and
giants were logged on the incoming L3 interface on the 7600s. I can
ping from the 2nd 7201 to the directly-connected interface on either
7600 with large ICMPs and they are not dropped and no giants are logged.
Even though it can send large frames to the directly-connected
interface it can't to the loopback. I don't believe that's normal.
From the 7600 I can turn around and ping the loopback on the 2nd 7201
with jumbo frames without any problems. It's like MTU is only being
honored in one direction.
This is a confusing one to me that smells like a bug. I'm running SRB1
on both 7600s and was running different 12.4(15)Tn releases on the
7201s. They are both now running 12.2(24)T. I'll drop one of them back
to an early 12.4(15)Tn tonight to troubleshoot if I have to. The
problem occured on the 1st 7201 without a code change and didn't occur
on the 2nd until after the code change and reboot.
Any thoughts?
Justin
David Freedman wrote:
> You appear to have a high number of input queue drops and input errors,
> granted the counters have never been cleared, do you haver any PPS
> graphs of the link between these two boxes? I would suspect a traffic
> spike or link fault causing control messages to be dropped being the
> cause here.
More information about the cisco-nsp
mailing list