[c-nsp] The mechanics of SSO

Ross Vandegrift ross at kallisti.us
Wed May 6 15:53:11 EDT 2009


Hey guys,

Today, due to what appears to be a major problem in SXF13, we
experienced two sequential crashes, taking out both SUPs in a 6500
within the time it takes to boot.  TAC case is going.

According to the crashinfo droppings left along the way, we
experienced three crashes:

1) module 6 is active SUP, IOS crashes at 13:43
2) module 5 takes over, IOS crashes at 13:52
3) module 6 is still booting, IOS crashes at 13:52

The third crash is the perplexing one.  The RP crashinfo logs:
	00:07:25: %CPU_MONITOR-STDBY-3-PEER_EXCEPTION: CPU_MONITOR peer has failed due to exception , reset by [6/0]

	%Software-forced reload

The SP crashinfo says:
	00:00:04: %PFREDUN-6-STANDBY: Initializing as STANDBY processor
	[snip usual bootup messages]
	00:01:39: SP-STDBY: SP: Currently running ROMMON from F1 region
	00:01:42: %DIAG-SP-STDBY-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
	00:02:03: %DIAG-SP-STDBY-6-DIAG_OK: Module 6: Passed Online Diagnostics
	00:07:24: %PFREDUN-SP-STDBY-6-STANDBY: Failure of ACTIVE detected, STANDBY not ready and reset

	%Software-forced reload


I guess this means there is a point in the bootup process where a
supervisor that is booting as a STANDBY cannot become ACTIVE without
restarting?

My guess is that this period is during the time the config is being
loaded from the ACTIVE module.  Can anyone confirm?  Are there things
that can make this potential window smaller? (compressed configs,
maybe)

-- 
Ross Vandegrift
ross at kallisti.us

"If the fight gets hot, the songs get hotter.  If the going gets tough,
the songs get tougher."
	--Woody Guthrie


More information about the cisco-nsp mailing list