[c-nsp] Catalyst 6500 Supervisor Engine Redundancy

Siva Valliappan svalliap at cisco.com
Mon Oct 30 16:20:05 EST 2006


just a couple of comments inline -

cheers
.siva

On Mon, 30 Oct 2006, Sam Stickland wrote:

> Lasher, Donn wrote:
>> I would only offer one caveat as to Redundancy mode discussions.
>>
>> For software-related failures, SSO may actually hurt you more than it
>> helps.
>>
>> SSO, at least in my experience in the past (was SRM as I recall), is a
>> complete "mirror" of one proc to the other. This means any memory
>> corruption issues, stack problems, IOS issues, that may cause the first
>> Proc to crash, may then crash the other proc as well, leading to a
>> chassis reboot. Badness.
>>

SSO is not a complete mirror.  Rather select information is checkpointed
to the standy supervisor.  The standby supervisor is not seeing all the
control plane messages (route updates, logins, etc).

This means that depending on what caused the active supervisor is unlikely
to cause the standby to fail.  The typical failure reasons for the
active are usually one-off or rare events (e.g. a specific race condition
or some code path executed by some process)

Now since we are only checkpointing specific data (interface state, 
forwarding table, etc).  It is likely that the reason that caused the
active to fail will not occur on the standby.  Except for a specific set
of cases - e.g. a specific packet that excercises some defect or some other 
recurring event that excercises the same defective code path.

My understanding is that most failures fall in the first category and very
few failures occur in the second category.  Hence SSO provides protection
in most cases.

For the latter case (excercising the same code path that causes a failure),
there's no difference between RPR+ and SSO.  Since you will run into it
in both cases (if the failure was being caused by going down a specific
code path).


>> RPR+, while taking longer to fail over compared to SSO, avoids those
>> issues, by being "warm" but not "hot" standby.
>>
>> For hardware related failures on the other hand, SSO > RPR*
>>
> Can anyone else offer any thoughts on this subject? From our prespective
> we hardly ever see any supervisors fail (linecards are of course a
> different story), but quite a lot of software related crashes, so it
> sounds like SSO isn't a great solution from _our_ prespective (YMMV).
>
> The described "double failure" mechanism above certainly sounds
> plausible - has anyone got any real-world experience of it?
>
> S
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>


More information about the cisco-nsp mailing list