[c-nsp] FE ignored errors

Jon Lewis jlewis at lewis.org
Mon Dec 20 13:54:33 EST 2004


On Mon, 20 Dec 2004, Rodney Dunn wrote:

> Exactly what I said in my other email.  There are situations where
> the RSP can do more work than the VIPs combined.

In our case, I don't think that's an option...at least not with our
RSP4's.  We ran some of the 7500s without dcef by accident for a couple
weeks (someone turned it off while troubleshooting and forgot to turn it
back on) and they really don't seem to handle large routing updates (like
one transit provider suddenly going away) very well while the RSP is
trying to CEF switch ~50mbit/s.

Based on the tests I did last night (flapping a BGP session at 4am), I
really don't think there was enough aggregate traffic coming into our
network for there to be bursts large enough to cause large numbers of
ignores, so I don't have much faith in that explanation.

What seems far more likely to me is that when there are large numbers of
routing updates (i.e. one of our BGP transits flap, and suddenly 60k
routes "change") the VIPs are too busy receiving FIB updates from the RSP
and are either failing to drain output packets from MEMD or are failing to
move RX buffered packets from VIP particle buffers to MEMD as MEMD becomes
available.

I don't know how to prove this, but it seems much more likely to me given
the low traffic levels at 4am.  If everything is chugging along fine with
3 transits (one on each 7500) and we take down one transit, shifting
traffic onto the other two, we only see large numbers of ignores during
the resulting BGP updates.  Once BGP has stabilized, ignores stop
incrementing...at least in the short term.  If it were "bursty traffic
overloading the VIP", I'd expect to continue to see ignores as long as one
of the transits is down since that puts more traffic on the remaining two.

It might be interesting to graph and compare bgpPeerInUpdates
(.1.3.6.1.2.1.15.3.1.10.peer-ip) against input errors (ignores).  I'll
have to look into doing that.  I suspect we'll find that the ignores
typically coincide with bursts of routing updates.

In the mean time, if my assumptions are correct, are there things that can
be done to mitigate this?  I actually found a URL last night where it was
suggested that too much routing updates could bog down the router it
suggested things like shrinking the tcp window to slow down the updates.
It's kind of funny, because normally we want BGP to converge as quickly as
possible.

Are there other settings that might keep the VIP from ignoring network
receive interrupts for too long?

----------------------------------------------------------------------
 Jon Lewis                   |  I route
 Senior Network Engineer     |  therefore you are
 Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


More information about the cisco-nsp mailing list