[c-nsp] VSS - Horror stories, show-stoppers, other personal experience?

Fri Jun 17 12:22:09 EDT 2011

All,

We have been running it for two years now as core switch in our server farm.
No problems so far.
Evidence:

XXXXX uptime is 2 years, 16 weeks, 2 days, 5 hours, 54 minutes
Uptime for this control processor is 2 years, 10 weeks, 6 days, 3 hours, 51
minutes
Time since XXXXX switched to active is 2 years, 10 weeks, 6 days, 3 hours,
44 minutes
System returned to ROM by Stateful Switchover at 14:10:04 CEST Thu Apr 2
2009 (SP by Admin requested switchover during ISSU)
System restarted at 14:14:04 CEST Thu Apr 2 2009
System image file is
"sup-bootdisk:/s72033-ipservicesk9_wan-mz.122-33.SXI1.bin"
Last reload reason: Unknown reason

Running an old version (by today standards :-), but no problems so far. We
are not running complicated things on this switch, just multiple VLANs, MEC,
OSPF and 1 VRF.
My only problem is an upgrade: i need ISSU to work or else, i have big
downtime (ie. 10 minutes). Haven't done this in production
yet. If it ain't broken, don't fix it....

PS. I have done failover tests before implementing and tuned the config. I
can switchover to the other chassis with zero impact, to be exact: 200-300
ms maximum packet loss.

Some tips: at least use the two 10GE ports on the supervisor for VSL and
make sure it is redundant (physically), use LACP on MEC portchannels (don't
use "trunk mode on") and use NSF in your routing protocol, use fast-hello's
for split-brain detection and follow the design guide carefully.

regards,
Geert

2011/6/17 Mike G <geezyx at gmail.com>

> Thank you again Andrew, I appreciate the info!
>
> Has anyone else run into problems?  Have you heard of others having
> problems?
>
> Or, have you had a really successful VSS implementation and have no
> complaints?
>
> I would love to hear some more opinions on VSS.
>
> -MikeG
>
>
> On Thu, Jun 16, 2011 at 5:25 PM, Andrew Miehs <andrew at 2sheds.de> wrote:
>
> > On Friday, June 17, 2011, Mike G <geezyx at gmail.com> wrote:
> > > Thanks for the great feedback Andrew!  Did you ever discover the cause
> of
> > the crash?  Also, was the 20 second outage due to the delay in the
> > active-hot sup taking over or was it something else?
> >
> > Iirc - the crash logs reported a cache corruption, and the 20 seconds
> > were spanning tree settling down - will ask my colleague in 2 weeks
> > when he gets back from holidays. Dont forget however - we are running
> > an old release on these boxes - ?sxi6/ sxj is current?  - and it only
> > happened after more than 1 year uptime. Not being an ISP means we can
> > get away with running older software longer.
> >
> > We did open a tac case but unfortunately were not in a position to
> > replicate the problem.
> >
> > Andrew
> >
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>