[nsp] HSRP failing under high CPU load...and other issues

jlewis at lewis.org jlewis at lewis.org
Sat Nov 1 11:43:55 EST 2003


On Fri, 31 Oct 2003, Gert Doering wrote:

> I've seen this, and it coincided with EIGRP flapping and an increasing
> number of "input ignore" packets on the Catalyst FE port where the
> HSRP master was connected to.  No specific pattern to it - some days 
> it did not happen at all, some days we had 10 flaps.
> 
> Cisco TAC wasn't able to solve the problem, but eventually the HSRP
> master rebooted due to a memory parity error, and since then, the

We've been having a bad week for routers.  First we had the previously 
mentioned 7206vxr NPE300 HSRP failure.  I think I tracked this down to 
simple packet overload.  It's the router I'd previously posted about that 
was experiencing lots of underruns and some output drops.  It seems it 
just couldn't (or barely could) handle routing for a few lans and one of 
our 100mb (FastE) transit circuits and a little policy routing (nachi 
mitigation) and access-lists (slammer and cisco DoS packets).  

The other day when HSRP failed, one of our T3 customers had apparently 
made a change to their network and started sending us an extra 15-20kpps, 
most of which wanted to go out through the router that was just barely 
getting by.  I was going to contact the customer to see what was up, but I 
noticed this was actually normal for them...and that they'd just had a few 
weeks of very low usage.

So this takes me back to my recent question of what to expect of NPE300's
and what to go to next since we seem to have outgrown them on some of our
routers.  We have a 7513 (dual RSP4, VIP2-50s) that seems to be far more
capable than the 7206vxr NPE300 running in the same POP.  It's currently
routing twice as much traffic as the VXR ever did and though the slots
with FE or OC3 transit circuits may run up to around 50% CPU at peak time, 
the router overall doesn't look like it's about to fall over.

This has me thinking that perhaps the distributed processing we get from
VIPs makes the 7500 platform capable of handling several times the traffic
of a 7206vxr.  I'm considering replacing some of our vxr's with 7507's.  
Management is a little concerned that the 7500 platform is long in the
tooth, but these seem like they'll give us easier growth/upgrade options
with faster/bigger RSPs and VIPs available.  We'd probably still
start with RSP4/VIP2-50's, but I assume we could move to VIP4-80's for a
big performance boost if/when we need it.

Yesterday we had another 7206vxr NPE300 lock up.  When we got it power
cycled, it crashed/reloaded every few minutes due to what appears to have
been main memory gone bad (solar flare induced?).  We swapped out the NPE
with a spare, and that seems to have fixed it.

After we got that router back up with the spare NPE, we had MPLS VPN 
issues.  One of our routers apparently would pass MPLS VPN tagged packets 
only in one direction.  I noticed it was complaining about
%TAGCON-3-LCLTAG_ALLOC: Cannot allocate local tag
when I tried pinging VPN customers through it.  This means it was out of 
available local tags?...but I only saw ~1000 tags in use...though it had 
used numbers up to the end of the range, and didn't seem to be going back 
and reusing lower tag numbers.  After clearing this router's BGP sessions, 
the situation cleared up.

 
----------------------------------------------------------------------
 Jon Lewis *jlewis at lewis.org*|  I route
 Senior Network Engineer     |  therefore you are
 Atlantic Net                |  
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________



More information about the cisco-nsp mailing list