[c-nsp] 7513 CBUS mash-up provoked by RANCID
Ed Ravin
eravin at panix.com
Tue Dec 19 00:27:12 EST 2006
My shop uses RANCID to archive our router configs. Every day, its
scripts log into our routers, run a few commands to see the router
config and environment information, and save all that in a CVS
repository.
One fine day a month ago, RANCID sent us email saying that all our
interfaces in slot 0 had been removed from the config, as if they'd
been pulled out. The router logs show this:
.Nov 14 10:13:23 EST: %DBUS-3-DBUSINTERR: Slot 0, Internal Error
.Nov 14 10:13:23 EST: %LB-5-CHAN_MEMBER_OUT: FastEthernet0/0/0 taken out of Port-channel1
.Nov 14 10:13:23 EST: %LB-5-CHAN_MEMBER_OUT: FastEthernet0/0/0 taken out of Port-channel1
.Nov 14 10:13:59 EST: %CBUS-3-CMDTIMEOUT: Cmd timed out, CCB 0xF800FFA0, slot 8, cmd code 2
Nov 14 10:13:25 166.84.143.9/166.84.143.9 21384: .Nov 14 10:13:23 EST: %DBUS-3-DBUSINTERR: Slot 0, Internal Error
-Traceback= 4032C744 404B1A5C 404B2330 404A962C 404B83B4 401A1E44 401A0A14 401A48DC 401A57A4 401A8290 4039BD64 404A0DEC 404AF9E0 404B0020 404A17FC
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (61 0x00000008) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000060) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-ADDRFILTR: Interface FastEthernet8/1/0, address filter write command failed, code 0x8010
-Traceback= 4032C744 404B7844 404B8044 404B83BC 401A1E44 401A0A14 401A48DC 401A57A4 401A8290 4039BD64 404A0DEC 404AF9E0 404B0020 404A17FC
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x0000FFFF) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000060) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x0000FFFF) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000100) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000100) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000100) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000100) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000100) failed (0x8010)
.Nov 14 10:13:59 EST: %CBUS-3-CCBCMDFAIL1: Controller 8, cmd (36 0x00000100) failed (0x8010)
.Nov 14 10:13:59 EST: %LB-5-CHAN_MEMBER_IN: FastEthernet0/0/0 added as member-2 to Port-channel1
.Nov 14 10:13:59 EST: %LB-5-CHAN_MEMBER_OUT: FastEthernet8/1/0 taken out of Port-channel1
.Nov 14 10:13:59 EST: %LB-5-CHAN_MEMBER_OUT: FastEthernet8/1/0 taken out of Port-channel1
.Nov 14 10:13:59 EST: %SYS-3-CPUHOG: Task ran for 10828 msec (257/150), process = OIR Handler, PC = 404A158C.
-Traceback= 404A1594
And then a few moments later:
.Nov 14 10:14:59 EST: %DBUS-3-WEDGED: Line card in slot 8 is wedged
.Nov 14 10:15:37 EST: %LB-5-CHAN_MEMBER_IN: FastEthernet8/1/0 added as member-2 to Port-channel1
.Nov 14 10:15:37 EST: %SYS-3-CPUHOG: Task ran for 11732 msec (52/14), process = OIR Handler, PC = 404A158C.
-Traceback= 404A1594
And the router seems to have found its slot again. I looked at the
router config, and the slot 0 devices are back in there.
As near as I can tell, one of RANCID's diagnostic commands provoked
the CBUS stall, and when RANCID subsequently read the config, pieces
of it were missing since the router was still trying to figure out
which hardware was working and which wasn't.
The commands RANCID ran to provoke all the above excitement are
listed below. Note that many of them aren't supported by the 7513,
but RANCID believes it should just spit out everything it knows and
sort the results later.
Anybody have any ideas why this happened? I noticed a similar
CBUS squall a few days ago. We've been using RANCID for over a year
and this is the first time I've seen this kind of behavior.
@commandtable=(
{'admin show version' => "ShowVersion"},
{'show version' => "ShowVersion"},
{'show redundancy secondary' => "ShowRedundancy"},
{'show idprom backplane', => "ShowIDprom"},
{'show install active' => "ShowInstallActive"},
{'admin show env all' => "ShowEnv"},
{'show env all' => "ShowEnv"},
{'show rsp chassis-info',=> "ShowRSP"},
{'show gsr chassis' => "ShowGSR"},
{'show boot' => "ShowBoot"},
{'show bootvar' => "ShowBoot"},
{'show variables boot' => "ShowBoot"},
{'show flash' => "ShowFlash"},
{'dir /all nvram:' => "DirSlotN"},
{'dir /all bootflash:' => "DirSlotN"},
{'dir /all slot0:' => "DirSlotN"},
{'dir /all disk0:' => "DirSlotN"},
{'dir /all slot1:' => "DirSlotN"},
{'dir /all disk1:' => "DirSlotN"},
{'dir /all slot2:' => "DirSlotN"},
{'dir /all disk2:' => "DirSlotN"},
{'dir /all harddisk:' => "DirSlotN"},
{'dir /all harddiska:' => "DirSlotN"},
{'dir /all harddiskb:' => "DirSlotN"},
{'dir /all sup-bootflash:'=> "DirSlotN"}, # cat 6500-ios
{'dir /all sup-microcode:'=> "DirSlotN"}, # cat 6500-ios
{'dir /all slavenvram:' => "DirSlotN"},
{'dir /all slavebootflash:' => "DirSlotN"},
{'dir /all slaveslot0:' => "DirSlotN"},
{'dir /all slavedisk0:' => "DirSlotN"},
{'dir /all slaveslot1:' => "DirSlotN"},
{'dir /all slavedisk1:' => "DirSlotN"},
{'dir /all slaveslot2:' => "DirSlotN"},
{'dir /all slavedisk2:' => "DirSlotN"},
{'dir /all slavesup-bootflash:'=> "DirSlotN"}, # cat 7609
{'dir /all sec-nvram:' => "DirSlotN"},
{'dir /all sec-bootflash:' => "DirSlotN"},
{'dir /all sec-slot0:' => "DirSlotN"},
{'dir /all sec-disk0:' => "DirSlotN"},
{'dir /all sec-slot1:' => "DirSlotN"},
{'dir /all sec-disk1:' => "DirSlotN"},
{'dir /all sec-slot2:' => "DirSlotN"},
{'dir /all sec-disk2:' => "DirSlotN"},
{'show controllers' => "ShowContAll"},
{'show controllers cbus' => "ShowContCbus"},
{'show diagbus' => "ShowDiagbus"},
{'admin show diag' => "ShowDiag"},
{'show diag' => "ShowDiag"},
{'show module' => "ShowModule"}, # cat 6500-ios
{'show spe version' => "ShowSpeVersion"},
{'show c7200' => "ShowC7200"},
{'show vtp status' => "ShowVTP"},
{'show vlan' => "ShowVLAN"},
{'show running-config' => "WriteTerm"},
{'write term' => "WriteTerm"},
);
More information about the cisco-nsp
mailing list