[c-nsp] VIP/PA question
rodunn at cisco.com
Thu Oct 21 22:10:33 EDT 2004
I agree with you 99% this is not rx buffering.
What I suspect is some form of TXACC (accumulator
loss). The common symptom when that happens is
pings from the router work and ping through it do
not. Why? Because we reserve 2 (or 3) txacc's
from the RSP to send messages down to the VIP.
So you need 3 or more for transit dCEF traffic
to work. You look at this by doing:
sh contr cbus
and look at the value and txlimit.
75xx_#sh contr cbus | incl txlimit
Serial3/0/0/1:1, txq E8001A00, txacc E8000082 (value 5), txlimit 5
The value is decremented. So here it isn't loaded at all.
When you load the link the value will decrement. This is how
the ingress VIPs know the egress interface is congested. This
is where the backpressure happens.
The value should NEVER be greater than the txlimit. If it
is it's a problem.
Now, if the value goes low (say 2 or 3) and you stop routing
traffic over the link and it doesn't go back up to the txlimit
then you have lost an accumulator. That's a problem. You could
try bumping it up with tx-queue-limit to see if it starts back
75xx_#sh contr cbus | incl txlimit
*Oct 22 01:27:05: %SYS-5-CONFIG_I: Confish contr cbus | incl txlimit
Serial3/0/0/1:1, txq E8001A00, txacc E8000082 (value 10), txlimit 10
The process to find lost txacc's is pretty complicated so
you would need to open a TAC case.
Now, there is one gotcha here. I usually don't like pointing
at hardware unless I'm positive. But I have seen 3 times proven
where this type of problem happens because of bad hardware.
Those couple of times that I worked on the only interface that
would show funky values for "value" were the :0 channel-group
on the T3. And believe it or not the problem was actually
a bad ingress VIP. It took me a week long trip to South America
to see it myself before I would believe it. :)
12.0(27)S1 is pretty new. I know we've had some txacc loss
bugs especially around the 25S range.
Externally found moderate defect: Resolved (R)
Loss of Txacc after OIR
Externally found severe defect: Resolved (R)
losing txacc and buffers in mpls code path on VIP
How often is this happening?
Could you monitor the 'sh contr cbus' and let me know
if you can find a pattern?
Are you running MLPPP at all?
If you see one stop switching transit traffic go
one router back and send 100 packets over the link
with a timeout of 0 and see if you see 100 output
drops increment. Also capture the 'sh contr cbus | incl
<interface>'. Then bump the tx-queue-limit up by 5 or
10 and then see if your transit pings work.
One more thing to note. If you start looking at the
'sh contr cbus' and you notice that an interface has
a txlimit > 15 (say 20) and it's constantly staying
at say 18 or so don't worry about it. That's because
we do something special to speed up the txacc handling
for high speed interfaces. But on low speed ones with
txlimits of say 5 when there is no traffic they should
always go back to the txlimit.
Let me know what you find.
On Mon, Oct 04, 2004 at 04:48:43AM -0700, Pete Templin wrote:
> Oleksandr Pantus wrote:
> > Take a look at:
> > http://www.cisco.com/warp/public/63/vip_cpu_rxbuffering.html
> Already familiar with this, and I don't think it applies here. The
> outgoing interfaces are not congested (outbound rates of 40-100kbps on a
> T1), yet the router appears to be dropping most or all traffic headed
> out those interfaces once the "magic trigger" is pulled.
> I'm starting to suspect the PA-MC-2T3. We had an episode where EVERY
> T1/frac T1 dropped line protocol (PPP or HDLC) simultaneously. A week
> later, we had the scenario mentioned above, and it's recurred twice
> since then. It's always been the same slot/PA that's been affected, and
> the companion PA-FE-TX doesn't appear to be afflicted at all.
> Any other thoughts?
> Thanks for the help!
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> archive at http://puck.nether.net/pipermail/cisco-nsp/
More information about the cisco-nsp