[c-nsp] Ethernet Freezeup

Andre Beck cisco-nsp at ibh.net
Thu Apr 10 07:36:33 EDT 2008


Re,

On Wed, Apr 09, 2008 at 10:54:47PM -0400, Ed Ravin wrote:
> On Tue, Apr 08, 2008 at 08:36:57PM +0200, Andre Beck wrote:
> > On Tue, Apr 08, 2008 at 10:35:36AM -0500, jon.hartman at verizon.com wrote:
> > > Is it possible that your interface is getting wedged?
> > > 
> > > http://www.cisco.com/en/US/products/hw/iad/ps397/products_tech_note09186a0
> > > 0800a7b85.shtml
> > 
> > Hard to say without having a "sh int fa0/0" from when the issue hit. The
> > description says that only a reload would clear this kind of problem,
> > but it's old and things may have changed. My Fa0/0 input queue looks like
> > 
> >   Input queue: 0/75/0/2 (size/max/drops/flushes); Total output drops: 0
> > 
> > and I ponder what the two flushes may be. I did indeed have exactly two
> > occasions of the interface hanging that could be cleaned with a clear int.
> 
> Compare that with my 7200 :
> 
>   Input queue: 0/75/19755/291735 (size/max/drops/flushes); Total output drops: 715217
>   ...
>    Received 23535684 broadcasts, 0 runts, 233 giants, 4480 throttles
>    568580 input errors, 0 CRC, 0 frame, 396581 overrun, 171629 ignored
> 
> That's after around 5 weeks of uptime.  We had a DoS attack a couple of
> weeks ago, that might explain the crazy numbers.

Meanwhile my Fa0/0 says

   Input queue: 0/75/0/6 (size/max/drops/flushes); Total output drops: 0         
without the issue having hit since. Obviously the flush counter isn't
related 1:1 to these stalls. It seems rather to correspond to the so
called collisions counted by the interface:

     277062433 packets output, 1899213686 bytes, 5 underruns                    
     5 output errors, 5 collisions, 1 interface resets                          
     0 babbles, 0 late collision, 0 deferred                                    
     0 lost carrier, 0 no carrier                                               
     0 output buffer failures, 0 output buffers swapped out

Now if I only knew what constitutes a "collision" an an interface that
has CSMA/CD disabled...
 
> BTW, it's not memory, neither of my two routers that have the problem
> are memory constrained nor do they have a lot of routes.

Good to know. One change I made together with replacing an NPE200 by an
NPE225 was to let a partial table (prefix le 22) flow in, so I wasn't
entirely sure that the memory use of BGP wouldn't be a factor.
 
> > Further, just giving it a clear int when it is running normally doesn't
> > increment that counter. When it strikes again (hopefully auto-healed by my
> > new EEM applet) and that counter increments, it's probably indeed an input
> > queue overrun (wedged).
> 
> Will the EEM applet leave something in your log when it resets the
> interface?  Otherwise, if the auto-heal happens fast enough, you might
> not know that it kicked in.

It drops a critical syslog message and I would also notice that it should
have kicked in by a "show track". As always with Heisenbugs that are
closely monitored, of course it didn't trigger since.
 
> > BTW, there's also a chance of the switch being involved.
> 
> I've checked this a couple of times and never found anything.
> Also, the two routers affected are in wildly disparate environments.

Ok, so what's common in *all* these cases is the NPE225. Exactly what
I fear is the culprit.

Thanks,
Andre.
-- 
   Real men don't make backups of their mail. They just send it out
    on the Internet and let the secret services do the hard work.

-> Andre Beck    +++ ABP-RIPE +++      IBH IT-Service GmbH, Dresden <-


More information about the cisco-nsp mailing list