[c-nsp] Having a fun time with a GSR!

Drew Weaver drew.weaver at thenap.com
Sun Apr 16 06:59:50 EDT 2006


Hi there, we had an old GSR lying about in storage and thought we'd try
to use it to light up some transit. We are now thinking about putting it
back in storage like the arc of the covenant.

At any rate perhaps you could help me figure out what the issue is here:

First, on Friday we got everything configured in time to start an early
holiday weekend whooo! About an hour after everyone left I got a page,
the new router is acting funny:

Apr 14 12:47:50.509 EST: %BGP-5-ADJCHANGE: neighbor 10.1.0.9 Up
SLOT 1:Apr 14 14:29:18.800 EST: %ALIGN-3-SPURIOUS: Spurious memory
access made at 0x40D057D8 reading 0x14
SLOT 1:Apr 14 14:29:18.800 EST: %ALIGN-3-TRACE: -Traceback= 40D057D8
40A8940C 408C810C 400F4D78 00000000 00000000 00000000 00000000
SLOT 1:Apr 14 14:30:18.780 EST: %ALIGN-3-SPURIOUS: Spurious memory
access made at 0x40D057D8 reading 0x14
SLOT 1:Apr 14 14:30:18.780 EST: %ALIGN-3-TRACE: -Traceback= 40D057D8
406BEC00 408BD740 408C7F7C 400F4D78 00000000 00000000 00000000

So I'm like.. "Ok" it seems fine except for that error message.. well it
was, except no traffic would pass through anything on slot 1; which
happened to be an FE-8 card. So, we reloaded, it came back up fine;
through investigation learned that SPURIOUS is a software bug.. no
biggie!

We updated to the latest: 12.0(32)S1 and everything seemed pretty
good... until 1AM this morning when (the router which is having problems
we will refer to as R1, and the router it peers with we will refer to as
R0). When a transit provider connected to R0 decided to do a little
scheduled maintenance and the BGP connection was dropped:

This is what it looked like on the R0 end:

Apr 16 00:09:14.981 EST: %BGP-5-ADJCHANGE: neighbor transitproviderIP
Down BGP Notification sent
Apr 16 00:09:14.981 EST: %BGP-3-NOTIFICATION: sent to neighbor
transitproviderip 4/0(hold time expired) 0 bytes

-- this is link to R1 --

Apr 16 00:10:19.489 EST: %GRPGE-6-SYNC_LOSS: Interface
GigabitEthernet4/0: Loss of Sync
Apr 16 00:10:21.509 EST: %LINK-3-UPDOWN: Interface GigabitEthernet4/0,
changed state to down
Apr 16 00:10:21.513 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.0.2 on
GigabitEthernet4/0 from FULL to DOWN, Neighbor Down: Interface down or
detached
Apr 16 00:10:22.733 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet4/0, changed state to down
Apr 16 00:10:37.313 EST: %GRPGE-6-AUTONEG_STATE: Interface
GigabitEthernet4/0: Link OK - autonegotiation complete
Apr 16 00:10:39.313 EST: %LINK-3-UPDOWN: Interface GigabitEthernet4/0,
changed state to up
Apr 16 00:10:40.313 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet4/0, changed state to up

(this is what it looked like on R1 simultaneously, ignore the hour
difference the timezone is not adjusted for daylight) The transit
provider begins maintenance; and then 65 seconds later on R1 this
happens:

Apr 16 01:09:38.146 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:09:43.182 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:09:48.262 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:09:55.094 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:10:00.098 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:10:05.306 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:10:05.510 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.122.161 on
FastEthernet1/5 from FULL to DOWN, Neighbor Down: Dead timer expired
Apr 16 01:10:07.798 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.122.177 on
FastEthernet1/1 from FULL to DOWN, Neighbor Down: Dead timer expired
Apr 16 01:10:10.426 EST: %RP-3-FABRIC_UNI: Unicast send timed out  (1)
Apr 16 01:10:11.426 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.113.177 on
FastEthernet1/2 from FULL to DOWN, Neighbor Down: Dead timer expired
Apr 16 01:10:13.242 EST: %FIB-2-FIBDISABLE: Fatal error, slot 1: IPC
Failure: timeout
Apr 16 01:10:13.438 EST: %RP-4-RSTSLOT: Resetting the card in the slot:
1,Event:CEF failure

Apr 16 01:10:13.458 EST: %LINK-5-CHANGED: Interface FastEthernet1/0,
changed state to administratively down
Apr 16 01:10:13.466 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/0, changed state to down
Apr 16 01:10:13.470 EST: %LINK-5-CHANGED: Interface FastEthernet1/1,
changed state to administratively down
Apr 16 01:10:13.490 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/1, changed state to down
Apr 16 01:10:13.490 EST: %LINK-5-CHANGED: Interface FastEthernet1/2,
changed state to administratively down
Apr 16 01:10:13.898 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/2, changed state to down
Apr 16 01:10:14.298 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.122.249 on
FastEthernet1/3 from FULL to DOWN, Neighbor Down: Dead timer expired
Apr 16 01:10:14.298 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.21.73 on
FastEthernet1/6 from FULL to DOWN, Neighbor Down: Dead timer expired
Apr 16 01:10:14.298 EST: %LINK-5-CHANGED: Interface FastEthernet1/3,
changed state to administratively down
Apr 16 01:10:14.722 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/3, changed state to down
Apr 16 01:10:15.126 EST: %LINK-5-CHANGED: Interface FastEthernet1/4,
changed state to administratively down
Apr 16 01:10:15.338 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/4, changed state to down
Apr 16 01:10:15.594 EST: %LINK-5-CHANGED: Interface FastEthernet1/5,
changed state to administratively down
Apr 16 01:10:15.806 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/5, changed state to down
Apr 16 01:10:16.018 EST: %LINK-5-CHANGED: Interface FastEthernet1/6,
changed state to administratively down
Apr 16 01:10:16.226 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/6, changed state to down
Apr 16 01:10:19.094 EST: %FIB-2-FIBDISABLE: Fatal error, slot 2: IPC
Failure: timeout
Apr 16 01:10:19.214 EST: %RP-4-RSTSLOT: Resetting the card in the slot:
2,Event:
 CEF failure

Apr 16 01:10:19.218 EST: %LINK-5-CHANGED: Interface GigabitEthernet2/0,
changed state to administratively down
Apr 16 01:10:19.222 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.0.9 on
GigabitEthernet2/0 from FULL to DOWN, Neighbor Down: Interface down or
detached
Apr 16 01:10:19.426 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet2/0, changed state to down
Apr 16 01:10:19.854 EST: %LINK-3-UPDOWN: Interface GigabitEthernet2/0,
changed state to down
Apr 16 01:10:35.934 EST: %RPGE-6-AUTONEG_STATE: Interface
GigabitEthernet2/0: Link OK - autonegotiation complete
SLOT 2:00:00:14: %SYS-5-RESTART: System restarted --
Cisco Internetwork Operating System Software
IOS (tm) GS Software (GLC1-LC-M), Version 12.0(32)S1, RELEASE SOFTWARE
(fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by cisco Systems, Inc.
Compiled Thu 09-Feb-06 14:08 by tinhuang
Apr 16 01:10:37.110 EST: %RPGE-6-SYNC_LOSS: Interface
GigabitEthernet2/0: Loss of Sync
Apr 16 01:10:37.318 EST: %RPGE-6-AUTONEG_STATE: Interface
GigabitEthernet2/0: Link OK - autonegotiation complete
Apr 16 01:10:37.610 EST: %LINK-3-UPDOWN: Interface GigabitEthernet2/0,
changed state to up
Apr 16 01:10:38.610 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet2/0, changed state to up
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/0,
changed state to down
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/1,
changed state to down
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/2,
changed state to down
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/3,
changed state to down
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/4,
changed state to down
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/5,
changed state to down
Apr 16 01:11:17.230 EST: %LINK-3-UPDOWN: Interface FastEthernet1/6,
changed state to down
Apr 16 01:11:21.006 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.0.9 on
GigabitEthernet2/0 from LOADING to FULL, Loading Done
SLOT 1:00:00:23: %SYS-5-RESTART: System restarted --
Cisco Internetwork Operating System Software
IOS (tm) GS Software (GLC1-LC-M), Version 12.0(32)S1, RELEASE SOFTWARE
(fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by cisco Systems, Inc.
Compiled Thu 09-Feb-06 14:08 by tinhuang
Apr 16 01:11:44.406 EST: %LINK-3-UPDOWN: Interface FastEthernet1/0,
changed state to up
Apr 16 01:11:44.558 EST: %LINK-3-UPDOWN: Interface FastEthernet1/1,
changed state to up
Apr 16 01:11:44.742 EST: %LINK-3-UPDOWN: Interface FastEthernet1/2,
changed state to up
Apr 16 01:11:44.838 EST: %LINK-3-UPDOWN: Interface FastEthernet1/3,
changed state to up
Apr 16 01:11:44.990 EST: %LINK-3-UPDOWN: Interface FastEthernet1/4,
changed state to up
Apr 16 01:11:45.142 EST: %LINK-3-UPDOWN: Interface FastEthernet1/5,
changed state to up
Apr 16 01:11:45.334 EST: %LINK-3-UPDOWN: Interface FastEthernet1/6,
changed state to up
Apr 16 01:11:45.410 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/0, changed state to up
Apr 16 01:11:45.558 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/1, changed state to up
Apr 16 01:11:45.742 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/2, changed state to up
Apr 16 01:11:45.914 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/3, changed state to up
Apr 16 01:11:45.990 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/4, changed state to up
Apr 16 01:11:46.142 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/5, changed state to up
Apr 16 01:11:46.334 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet1/6, changed state to up
Apr 16 01:12:27.958 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.113.177 on
FastEthernet1/2 from LOADING to FULL, Loading Done
Apr 16 01:12:27.974 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.122.161 on
FastEthernet1/5 from LOADING to FULL, Loading Done
Apr 16 01:12:28.910 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.122.177 on
FastEthernet1/1 from LOADING to FULL, Loading Done
Apr 16 01:12:29.114 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.122.249 on
FastEthernet1/3 from LOADING to FULL, Loading Done
Apr 16 01:12:29.542 EST: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.21.73 on
FastEthernet1/6 from LOADING to FULL, Loading Done

Sorry that this is so long winded :-)

Here is the config:

cisco 12012/GRP (R5000) processor (revision 0x05) with 524288K bytes of
memory R5000 CPU at 200Mhz, Implementation 35, Rev 2.1, 512KB L2 Cache

Here is what I am thinking might be happening:

-Transit provider loads new IOS image onto router connected to R0 and
reloads.
-Since R0 is holding something like 700,000 BGP routes from transit
providers, when transit provider that drops goes down obviously the
table gets resent to R1.
-For some reason R1 runs out of memory, CEF fails, card can no longer
pass packets, everything wigs out, slot 1 and and 2 must reload.

The problem I am having with the above theory is that when I do a sh ip
bgp sum I see for transit providers on r0 each with 180,000 routes, and
when I do it on R1, I see the roughly 180,000 from R0, and the roughly
180,000 from the 1 transit provider which is directly connected to it.
So, if they're both identical routers, same ram, same everything and one
is getting 360,000 routes, and one is getting 800,000 I'm not sure why
the one with 360,000 would be the one with a problem with RAM.

It could be that we've just come to the end of the GRP-B's usefulness,
has anyone run into a similar situation? Is 32(6)s a decent rev? I was
thinking of trying .28 and 29 also as I've heard they are good as well.

The one I have had most luck with honestly was .25 s6. but that is the
one which displayed the spurious memory access (although the error above
is probably just a cleaner way of handling the spurious memory access
[any insight here?]).

Sorry this is so long.

Hope you all have a happy easter.

Thanks,
Andrew




More information about the cisco-nsp mailing list