[c-nsp] 7200 PA-GE interface resets

Thu Oct 21 22:33:09 EDT 2004

On Wed, Oct 20, 2004 at 02:28:51PM +1300, Beatty Lane-Davis wrote:
> Thanks Bruce,
> 
> > Hmm, both of those DDTS's you reference are about stuck interface hold
> > queues, not interfaces that spontaneously reset.   Got a 
> > version number?
> 
> 12.2(15)T2 and I'm hoping to be upgrading that to 6.3 in the immediate
> future ;-)

If you upgrade go to the latest 12.3 mainline on CCO.

> 
> > The interface resets would occur for failure to transmit 
> > after a number of retries.  Any number of reasons for that, 
> > including the media, bug, lack of resources, etc.  Any 
> > messages in the log prior to the interface resetting?
> 
> Nothing.  And actually, looking again, it doesn't appear that the
> interface down message showed up in the router's logs, but it definitely
> showed up on the switch.  Very strange, what could cause the two to
> disagree on something so fundamental?

So, without looking at the data here are my thougts.
I'd suspect your underlying problme are the resets.
If we are resetting the chip (that doesn't always mean
the interface gets reset) then if you are switching packets
the rx and tx interrupts would surely be disabled for
that brief amount of time and you would see ignores and input
drops.  I've seen that before.  Now as to why we are resetting
the interface I'm not sure.  In looking at the bug you mentioned
CSCdt37135
Externally found catastrophic defect: Resolved (R)
output queue hangs 40/40 on 7200 GigE interface

it's fixed in your code...but from looking at it there was
a condition where the txring getting full would stick the interface
and that fix was to correct that condition by forcing the
chip to reset.  What I'm not sure about is if that is where
your resets are coming from.

Could you do this:

clear counters
sh clock
sh int gig <whatever>
sh controllers for the gig

I want to see if we can tell if the txring is getting full:

ie:

  rxring(128)=0xF3CF000, shadow=0x61482C7C, head=70, rx_buf_size=512
  txring(256)=0xF3D0000, shadow=0x61482EA8, head=77, tail=77
                                            ^^^^^^^^^^^^^^^^
  tx_int_txdw=0, tx_int_txqe=0, rx_int_rxdmt0=0, rx_int_rxt0=0
  tx_count=256, txring_full=0, rx_max=0, filtered_pak=0
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^

> 
> > The "mostly switched in and out the same GE interface" would 
> > grab my attention WRT the ignores.  I'm wondering if that 
> > interface is starved for buffers given that inbound and 
> > outbound packets are fighting for the same resources and 
> > queue allocations rather than the load being shared across a 
> > separate inbound and outbound interface.
> 
> Sorry, should have gotten into this in the first message, TAC's
> explanation for the ignored's essentially boiled down to a
> head-of-lineish blocking issue.  On the other side of this box is an ATM
> STM-1 with a heap of low-rate PVC's hanging off it.  TAC's explanation
> is that when the ATM interface has to buffer packets and does so by
> seizing particle buffers from the gig interface causing ignored to
> increment there.  
> I asked the same questions of endless TAC escalation people: is the
> traffic vector part of the problem here?  They assured me that no, this
> should work fine, but...
> 
> Thoughts?

Well, while I don't 100% understand exactly how that happens
in my head from a theory standpoint I do think it's possible here.
I just can't remember how to tell if that's it or not.  Let me
ask around.

Are you seeing a bunch of drops on the ATM side?

Rodney

> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/