[c-nsp] Cisco 12000: linecard disabled due to not enough ram - VRRP stays active - how to avoid this nasty behavior

Sun Nov 6 03:53:16 EST 2005

On (2005-11-06 09:33 +0100), gstammw at gmx.net wrote:

> Is there anything one can do in order to circumvent this behavior. The Cisco
> should really stop sending the VRRP-heartbeats.

 I think the answer is no, GSR is really lacking in the edge. The
problem you're experiencing isn't unknown to cisco, same problem exists
for IS-IS, luckily however for IS-IS there is a fix 'external overload
signalling'.
 It completely escapes me why GSR is so poor as edge box these days,
no RPF in their best edge line card (E3), sad state of port-channels and
so forth. I do hope that most developement efford has gone to IOS-XR,
or at least thats what I keep on telling myself so I don't get depressed
looking at 7600 being cheaper and even from point of operator being more
complete.

> #CISCO12000:Show log
> Nov  6 08:15:02.308 CET: %FIB-2-FIBDISABLE: Fatal error, slot 2: no memory
> SLOT 2:Nov  6 08:15:02.228 CET: %SYS-2-MALLOCFAIL: Memory allocation of
> 65556 bytes failed from 0x400CE06C, alignment 16 
> Pool: Processor  Free: 121296  Cause: Memory fragmentation 
> Alternate Pool: None  Free: 0  Cause: No Alternate pool 
>           
> -Process= "CEF LC IPC Background", ipl= 0, pid= 57
> -Traceback= 400D328C 400D5690 400CE074 40E508DC 40E15DD8 40E20768 40E2B17C
> 40E40648 40E380A4 40E38348 40E38714 40E39444
> SLOT 2:Nov  6 08:15:02.300 CET: %FIB-3-NOMEM: Malloc Failure, disabling DCEF
> on linecard
> Nov  6 08:15:05.452 CET: %VRRP-6-STATECHANGE: Gi2/0.200 Grp 200 state Backup
> -> Master
> Nov  6 08:15:32.648 CET: %BGP-5-ADJCHANGE: neighbor x.x.x.2 Down BGP
> Notification sent
> Nov  6 08:15:32.648 CET: %BGP-3-NOTIFICATION: sent to neighbor x.x.x.2 4/0
> (hold time expired) 0 bytes 
> Nov  6 08:15:39.980 CET: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.2 on
> GigabitEthernet2/0.991 from FULL to DOWN, Neighbor Down: Ded
> Nov  6 08:16:32.941 CET: %BGP-5-ADJCHANGE: neighbor x.x.x.113 Down BGP
> Notification sent
> Nov  6 08:16:32.941 CET: %BGP-3-NOTIFICATION: sent to neighbor x.x.x.113 4/0
> (hold time expired) 0 bytes 
> Nov  6 08:17:49.061 CET: %BGP-5-ADJCHANGE: neighbor x.x.x.242 Down BGP
> Notification sent
> Nov  6 08:17:49.061 CET: %BGP-3-NOTIFICATION: sent to neighbor x.x.x.242 4/0
> (hold time expired) 0 bytes 
> As one can see the communication with the outside world has partly stopped.
> BGP is down and the Cisco even thinks that it can now become vrrp-master for
> Gi2/0.200. Terrible!
> 
> The cisco was still sending vrrp-heartbeats until I shut the linecard
> manually down: then vrrp got disabled too and the backup router kicked in.
> #CISCO12000:Show log
> Nov  6 08:45:45.840 CET: %VRRP-6-STATECHANGE: Gi2/0.200 Grp 200 state Master
> -> Init
> Nov  6 08:45:45.840 CET: %VRRP-6-STATECHANGE: Gi2/0.201 Grp 201 state Master
> -> Init
> Nov  6 08:45:45.844 CET: %VRRP-6-STATECHANGE: Gi2/0.202 Grp 202 state Master
> -> Init
> Nov  6 08:45:45.844 CET: %VRRP-6-STATECHANGE: Gi2/0.203 Grp 203 state Master
> -> Init
> Nov  6 08:45:45.844 CET: %VRRP-6-STATECHANGE: Gi2/0.204 Grp 204 state Master
> -> Init
> Nov  6 08:45:45.844 CET: %VRRP-6-STATECHANGE: Gi2/0.207 Grp 207 state Master
> -> Init
> Nov  6 08:45:45.848 CET: %VRRP-6-STATECHANGE: Gi2/0.208 Grp 208 state Master
> -> Init
> Nov  6 08:45:45.848 CET: %VRRP-6-STATECHANGE: Gi2/0.210 Grp 210 state Master
> -> Init
> Nov  6 08:45:45.848 CET: %VRRP-6-STATECHANGE: Gi2/0.211 Grp 211 state Master
> -> Init
> Nov  6 08:45:47.836 CET: %LINK-5-CHANGED: Interface GigabitEthernet2/0,
> changed state to administratively down
> Nov  6 08:45:48.836 CET: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> GigabitEthernet2/0, changed state to down
> Nov  6 08:45:50.404 CET: %SYS-5-CONFIG_I: Configured from console by console
> 
> 
> 
> The annoying thing is that my second "backup" router sees bgp and ospf to
> the cisco go down but still seems to be receiving ospf-heartbeats:
> #BACKUPROUTER: show log
> Nov  6 08:15:53:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id x.y.139.200, LSA router id x.x.x.2
> Nov  6 08:15:53:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id x.y.139.208, LSA router id x.x.x.2
> Nov  6 08:15:53:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id x.y.139.224, LSA router id x.x.x.2
> Nov  6 08:15:53:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id x.y.143.0, LSA router id x.x.x.2
> Nov  6 08:15:53:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id x.y.148.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id x.y.143.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id a.b.121.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id a.b.110.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id a.b.168.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id a.b.148.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id a.b.234.0, LSA router id x.x.x.2
> Nov  6 08:15:48:N:OSPF: originate LSA, rid x.x.x.2, area 0.0.0.0, LSA type
> 5, LSA id a.b.240.0, LSA router id x.x.x.2
> Nov  6 08:15:42:N:OSPF: originate LSA, rid x.x.x.2, area x.y.128.0, LSA type
> 1, LSA id x.x.x.2, LSA router id x.x.x.2
> Nov  6 08:15:42:N:OSPF: nbr state changed, rid x.x.x.2, nbr addr x.x.x.17,
> nbr rid x.x.x.1, state initializing, rcv event 1-WayReceived
> Nov  6 08:15:39:N:BGP: Peer x.x.x.1 DOWN (Hold Timer Expired)
> 
> It then takes until 8:45 - that's when I shut down the interface on the
> cisco - to kick in:
> #BACKUPROUTER: show log
> Nov  6 08:46:23:N:OSPF: nbr state changed, rid x.x.x.2, nbr addr x.y.17, nbr
> rid x.x.x.1, state down, rcv event NeighborGoingDown
> Nov  6 08:46:23:N:OSPF: nbr state changed, rid x.x.x.2, nbr addr x.y.131.17,
> nbr rid x.x.x.1, state initializing, rcv event Inactivity Timer Expires
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v211, vrid 211, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v210, vrid 210, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v208, vrid 208, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v207, vrid 207, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v204, vrid 204, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v203, vrid 203, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v202, vrid 202, state
> master
> Nov  6 08:45:48:N:VRRP: VRRP intf state changed, intf v201, vrid 201, state
> master
> 
> I really don't get why bgp and ospf fail as they should but vrrp stays
> active. Is there *anything* I can do about it? I really don't want this to
> happen again.
> 
> 
> 
> This is so nasty.. On the paper we do have a nice hot-failover redundancy
> concept but in reality a human needs to get involved :-(
> 
> Thanks for your help in advance.
> 
> Best regards,
> Gunther
> 
> 
> By the way: you shouldn't be using linecards or route processor with less
> than 512mb ram nowadays...
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
> 

-- 
  ++ytti