[c-nsp] Problems with 7500 router crashing

Ejay Hire ejay.hire at isdn.net
Thu Jul 22 13:49:23 EDT 2004


The way I read this is the VIP for the gig interface in slot
0 is running out of memory...  The VIP is a
"mini-router-on-a-card" that does CEF and dCef processing
and makes forwarding decisions without pushing packets to
the CPU.  The FIB Forwarding-Information-Base is the VIP's
version of the routing table.  This is what it was trying to
allocate memory to and failed.

Since it handles all of the traffic at least once before
router 2 touches it, It makes sense that router2 would be
unaffected.

Unfortunately, I've never pushed the VIP's hard enough to
have this, so here are some off of the top-of-my-head
thoughts.

If this was a router and not a vip failing, I'd use "show
proc mem sorted 5min" to see what was using all the memory.
I expect it will be the CEF or FIB process.

Here's the ways I've broken smaller routers doing this in
the past.
... Something badgered up in redistribution and sent a full
BGP table into OSPF.
... Software Bug, (only seen this once, in an old version of
code.)
... Computer(s) in a wormfest generating 10k's of packets
and filling nat or state tables.
... Dynamic creation of Virtual-access interfaces instead of
Pre-cloned interfaces.  

Here's a couple of things that might help.

1. Hsrp's cousin, GLBP.  This replaces HSRP, and will allow
rtr2 and rtr1 to share the load, with seamless failover in
either direction.

2. Temporarily disable CEF on things in this slot.  For a
full BGP table, the CEF tables can be over 30mb.  The router
will slow down a little, but you'll put out the fire until
you fix the problem.

Last but not least, do a show bootflash:  sometimes the
crashdump files end up there if there is enough free space.
In this case I don't think it will help much, because you
know what's running out of memory, just not why.  

Good Luck,
Ejay



> -----Original Message-----
> From: cisco-nsp-bounces at puck.nether.net 
> [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of
Olav Langeland
> Sent: Thursday, July 22, 2004 9:52 AM
> To: NSP List
> Subject: [c-nsp] Problems with 7500 router crashing
> 
> Hi,
> 
> we have a problem with one of our border routers
apparently crashing
> randomly. Our setup is 2 uplinks with one Cisco 7513 for
each uplink,
> doing full eBGP up and iBGP between the routers, nothing
more fancy.
> They share a HSRP IP on the inside, so all traffic goes to
router1
> before it either goes to router2 or internet. The hardware
is/was 100%
> identical, yet router2 has been stable as rock. We have
had 
> crashes with
> both IOS 12.2 and 12.3, so it doesn't seem version
related.
> 
> Router1 crashed about a year ago, didn't get any logs or
find anything
> interesting when we got it back up. It has been stable
since, until
> recently it crashed several times late one Friday so I
ended up
> switching it off. We got some syslogs this time, bits of
it included
> below. We borrowed a Cisco 7505 chassis and changed a RSP
card, but
> noticed the router rebooted once a couple of days ago with

> more or less
> the same error. Most of the crashes has been simple
reboots, 
> other times
> CPU went 100% so forcing a reboot. 
> 
> Did some checking with the Output Interpreter, didn't help
much. It
> listed some bugid's that didn't seem related and also the 
> always helpful
> "The failure was caused by a software defect. Note that
this is a bus
> error crash and can also be hardware related." ...
> I found some pages on cisco.com regarding the
%FIB-3-FIBDISABLE error,
> relates to IPC running out of memory. I did what one page 
> suggested and
> increased the allocated memory, didn't help. Is all this
caused by
> faulty hardware (replacing the VIP4-80 in slot1 with a
VIP4-5 this
> weekend and switching back to 7513 chassis), hardware that
cant cope
> with traffic or a IOS configuration issue. 
> 
> Any hints appreciated!
> 
> More info:
> --show ver--
> System returned to ROM by bus error at PC 0x4051C618,
address 0x5A7
> --end--
> 
> --show stacks--
> Stack trace from system failure:
> FP: 0x42B748B8, RA: 0x4051C618
> FP: 0x42B748F8, RA: 0x402F16EC
> FP: 0x42B74960, RA: 0x402E9034
> FP: 0x42B749A8, RA: 0x40272F50
> --end--
> 
> --begin syslog--
> CET: %IPC-5-SLAVELOG: VIP-SLOT1:
> %SYS-2-MALLOCFAIL: Memory allocation of 65556 bytes failed
from
> 0x60113CF8, alignment 16 
> Pool: Processor  Free: 92976  Cause: Memory fragmentation 
> Alternate Pool: None  Free: 0  Cause: No Alternate pool 
> -Process= "CEF IPC Background", ipl= 2, pid= 36
> -Traceback= 60118D80 60119EF8 60113D00 603F9D10 603FA598
603FA820
> 603DC7B8 603E1388 603E1C9C 603E8510 603F5680 603EED94 
> 603EF07C 603EF40C
> 603EFAA4
> %IPC-5-SLAVELOG: VIP-SLOT1:
> %SYS-2-MALLOCFAIL: Memory allocation of 65556 bytes failed
from
> 0x60113CF8, alignment 16 
> Pool: Processor  Free: 41916  Cause: Not enough free
memory 
> Alternate Pool: None  Free: 0  Cause: No Alternate pool 
> -Process= "CEF IPC Background", ipl= 0, pid= 36
> -Traceback= 60118D80 60119EF8 60113D00 603F9C00 603DC380
603E1B98
> 603E8510 603F5680 603EED94 603EF07C 603EF40C 603EFAA4
> %FIB-3-FIBDISABLE: Fatal error, slot 1: no memory
> %IPC-5-SLAVELOG: VIP-SLOT1:
> %SYS-2-MALLOCFAIL: Memory allocation of 65556 bytes failed
from
> 0x60113CF8, alignment 16 
> Pool: Processor  Free: 82196  Cause: Memory fragmentation 
> Alternate Pool: None  Free: 0  Cause: No Alternate pool 
> -Process= "CEF IPC Background", ipl= 0, pid= 36
> --end--
> 
> --show tech--
> ------------------ show controllers cbus
------------------
>   slot0: VIP4-50 RM5271, hw 2.02, sw 22.20, ccb F800FF10, 
> cmdq E8000080,
> vps 8192
>     software loaded from system 
>     IOS (tm) VIP Software (SVIP-DW-M), Version 12.3(5b),
RELEASE
> SOFTWARE (fc1)
>     ROM Monitor version 103.0
>     POS0/0/0, applique is SONET
>       gfreeq E8000170, lfreeq E8000180 (4512 bytes)
>       rxlo 4, rxhi 81, rxcurr 4, maxrxcurr 43
>       txq E8001A00, txacc E8001A02 (value 80), txlimit 81
>     FastEthernet0/1/0, addr 0000.0c62.e308 (bia
0000.0c62.e308)
>       gfreeq E8000150, lfreeq E8000188 (1600 bytes)
>       rxlo 4, rxhi 530, rxcurr 0, maxrxcurr 0
>       txq E8001A08, txacc E8001A0A (value 0), txlimit 530
>   slot1: VIP4-80 RM7000, hw 2.01, sw 22.20, ccb F800FF20, 
> cmdq E8000088,
> vps 8192
>     software loaded from system 
>     IOS (tm) VIP Software (SVIP-DW-M), Version 12.3(5b),
RELEASE
> SOFTWARE (fc1)
>     ROM Monitor version 103.0
>     GigabitEthernet1/0/0, addr 0000.0c62.e320 (bia
0000.0c62.e320)
>       gfreeq E8000150, lfreeq E8000190 (1600 bytes)
>       rxlo 4, rxhi 795, rxcurr 6, maxrxcurr 369
>       txq E8001A48, txacc E8001A4A (value 529), txlimit
530
> --end--
> 
> 
> /olav langeland
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/



More information about the cisco-nsp mailing list