[c-nsp] The mechanics of SSO
Charles Wyble
charles at thewybles.com
Wed May 6 16:29:11 EDT 2009
Ouch..... nasty race condition from the looks of it. Those little corner
cases that are oh so very sharp.
Ross Vandegrift wrote:
> Hey guys,
>
> Today, due to what appears to be a major problem in SXF13, we
> experienced two sequential crashes, taking out both SUPs in a 6500
> within the time it takes to boot. TAC case is going.
>
> According to the crashinfo droppings left along the way, we
> experienced three crashes:
>
> 1) module 6 is active SUP, IOS crashes at 13:43
> 2) module 5 takes over, IOS crashes at 13:52
> 3) module 6 is still booting, IOS crashes at 13:52
>
> The third crash is the perplexing one. The RP crashinfo logs:
> 00:07:25: %CPU_MONITOR-STDBY-3-PEER_EXCEPTION: CPU_MONITOR peer has failed due to exception , reset by [6/0]
>
> %Software-forced reload
>
> The SP crashinfo says:
> 00:00:04: %PFREDUN-6-STANDBY: Initializing as STANDBY processor
> [snip usual bootup messages]
> 00:01:39: SP-STDBY: SP: Currently running ROMMON from F1 region
> 00:01:42: %DIAG-SP-STDBY-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
> 00:02:03: %DIAG-SP-STDBY-6-DIAG_OK: Module 6: Passed Online Diagnostics
> 00:07:24: %PFREDUN-SP-STDBY-6-STANDBY: Failure of ACTIVE detected, STANDBY not ready and reset
>
> %Software-forced reload
>
>
> I guess this means there is a point in the bootup process where a
> supervisor that is booting as a STANDBY cannot become ACTIVE without
> restarting?
>
> My guess is that this period is during the time the config is being
> loaded from the ACTIVE module. Can anyone confirm? Are there things
> that can make this potential window smaller? (compressed configs,
> maybe)
>
More information about the cisco-nsp
mailing list