[c-nsp] MPLS LDP and BGP Neighbor flapping constantly

Justin Shore justin at justinshore.com
Mon Mar 9 12:30:33 EDT 2009


This message slipped through the cracks.  It leads me to giving an 
update on the problem though.

I worked with TAC to troubleshoot the issue last week.  The TAC engineer 
also noticed the giants on the 7600's side.  He tried sending large 
ICMPs through to the 7600 from the 7201.  Nothing over 1508 would pass 
even though the interface MTU was 9000 on both sides (and the IP MTU 
followed).  Even sending ICMPs WITHOUT df set still resulted in a 
failure.  We dropped the MTU to 1500 and suddenly we could send large 
ICMPs that needed to be fragged.  Very weird.  It gets weirder though.

Prior to calling TAC I upgraded the code on another 7201 that's 
dual-homed to both 7613s in the core.  As soon as I reloaded that 7201 
LDP on it also started flapping to BOTH 7600s (the original 7201 was 
only single-homed to one 7600).  BGP appears to be unaffected on this 
7201.  So now I have 2 7201s with constantly flapping LDP neighbors. 
The 2nd 7201 also can't ping either 7600 with large ICMPs.  However, and 
this is weird, BOTH 7600s can ping the loopback on the 7201 with 9000 
byte ICMPs.

When I wrote that last sentence it got me thinking.  I was pinging from 
the 7201s to Lo0 on the 7600s.  Large ICMPs weren't getting there and 
giants were logged on the incoming L3 interface on the 7600s.  I can 
ping from the 2nd 7201 to the directly-connected interface on either 
7600 with large ICMPs and they are not dropped and no giants are logged. 
  Even though it can send large frames to the directly-connected 
interface it can't to the loopback.  I don't believe that's normal. 
 From the 7600 I can turn around and ping the loopback on the 2nd 7201 
with jumbo frames without any problems.  It's like MTU is only being 
honored in one direction.

This is a confusing one to me that smells like a bug.  I'm running SRB1 
on both 7600s and was running different 12.4(15)Tn releases on the 
7201s.  They are both now running 12.2(24)T.  I'll drop one of them back 
to an early 12.4(15)Tn tonight to troubleshoot if I have to.  The 
problem occured on the 1st 7201 without a code change and didn't occur 
on the 2nd until after the code change and reboot.

Any thoughts?
  Justin


David Freedman wrote:
> You appear to have a high number of input queue drops and input errors,
> granted the counters have never been cleared, do you haver any PPS
> graphs of the link between these two boxes? I would suspect a traffic
> spike or link fault causing control messages to be dropped being the
> cause here.



More information about the cisco-nsp mailing list