[c-nsp] The mechanics of SSO

Charles Wyble charles at thewybles.com
Wed May 6 16:29:11 EDT 2009


Ouch..... nasty race condition from the looks of it. Those little corner 
cases that are oh so very sharp.



Ross Vandegrift wrote:
> Hey guys,
> 
> Today, due to what appears to be a major problem in SXF13, we
> experienced two sequential crashes, taking out both SUPs in a 6500
> within the time it takes to boot.  TAC case is going.
> 
> According to the crashinfo droppings left along the way, we
> experienced three crashes:
> 
> 1) module 6 is active SUP, IOS crashes at 13:43
> 2) module 5 takes over, IOS crashes at 13:52
> 3) module 6 is still booting, IOS crashes at 13:52
> 
> The third crash is the perplexing one.  The RP crashinfo logs:
> 	00:07:25: %CPU_MONITOR-STDBY-3-PEER_EXCEPTION: CPU_MONITOR peer has failed due to exception , reset by [6/0]
> 
> 	%Software-forced reload
> 
> The SP crashinfo says:
> 	00:00:04: %PFREDUN-6-STANDBY: Initializing as STANDBY processor
> 	[snip usual bootup messages]
> 	00:01:39: SP-STDBY: SP: Currently running ROMMON from F1 region
> 	00:01:42: %DIAG-SP-STDBY-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
> 	00:02:03: %DIAG-SP-STDBY-6-DIAG_OK: Module 6: Passed Online Diagnostics
> 	00:07:24: %PFREDUN-SP-STDBY-6-STANDBY: Failure of ACTIVE detected, STANDBY not ready and reset
> 
> 	%Software-forced reload
> 
> 
> I guess this means there is a point in the bootup process where a
> supervisor that is booting as a STANDBY cannot become ACTIVE without
> restarting?
> 
> My guess is that this period is during the time the config is being
> loaded from the ACTIVE module.  Can anyone confirm?  Are there things
> that can make this potential window smaller? (compressed configs,
> maybe)
> 


More information about the cisco-nsp mailing list