[c-nsp] Redundancy vs. Paranoia

Thu May 12 22:00:45 EDT 2005

Comments inline.

On 5/12/05 1:44 PM, "John Neiberger" <John.Neiberger at efirstbank.com> wrote:

> I'm toying around with a handful of designs and I'm trying to get a
> better feel for the level of redundancy that would be considered sane so
> I thought I'd check here for some opinions. The designs in question
> generally deal with 6500s, 7600s, and 7200s, and the goal is to design a
> redundant routing and switching system with excellent failover
> characteristics. However, things can quickly get out of hand and I think
> they end up becoming more complex than necessary.
> 
> Here's one of the things I'm pondering: how do I decide which is
> "better", a single 6513 with dual sups and dual power supplies or two
> 6513s? At what point do you jump from a single box to two boxes? Does it
> make sense to even bother with making two separate boxes fully
> redundant?

As always, it depends.  =)    But there is a reason the saying "don't put
all your eggs in one basket" exists.

If the single box with dual sups and dual power supplies is used strictly
for connecting hosts that are not dual-homed, then I would say go for the
extra power supplies, but maybe skip the extra sup engines.  (IMHO, they
almost never fail is a manner as graceful as illustrated during testing.
Something freezes, but not in a graceful way, or, a card stalls, FUBAR'ing
the switch. )  If this is an infrastructure switch/router, then I'd deploy
in pairs. 

Look at Google...  They use lots of small, off-the-shelf PCs to run their
search engine.  I'd bet they also use the same philosophy of "lots-n-lots of
smaller network devices" to aggregate everything as well.

Redundant power supplies are good, because those suckers do burn out. It is
much less of a hassle to replace one than to have Bad Things Happen on your
network because you only had one power supply in a core router/switch.  I
look to prevent network isolation because of the failure of one device.  I
should be able to lose 1/2 of my network devices and be able to continue
operations once the dust settled.

  > I've got a 6513 as a core switch (L3, native IOS) and a 7513 as our
> core router for WAN and mainframe connectivity. Once our need for a CIP
> goes away (within a year) I've been toying with the idea of replacing
> the 7513 with two 7204VXRs. I need to terminate two DS3s and an ATM T1,
> so that part of the design is fairly simple.
> 
> On the other side of the room I have a single 6513 with dual sups and
> my boss wants to me consider getting an additional 6513 for redundancy,
> and he wants them to be designed in such a way that they are both active
> for various tasks. So, now I'm faced with having multiple 6513s and
> multiple 7204VXRs.
> 
> A new idea that just occurred to me is that I could replace everything
> with two 7609s that would house modules for WAN connectivity and
> security, and have a fiber gig module that breaks out to some high
> performance 48-port 10/100/1000 switches for our data center servers.
> Those switches could have an uplink to each 7609 for redundancy. I just
> started pondering this new design so I haven't really thought it
> through, but it might be easier to implement initially, easier to
> maintain in the long run, and it would actually be a simpler, more
> elegant design, which I like.

I won't shed any tears to see SVIs go away... ;).  I do like the idea of
keeping the WAN routers on their own boxes, though.  Not to mention you get
to save the slot for other cards.  It would be a bit of a waste to use 2
65/7600 slots across 2 chassis for what you could accomplish with 4
10/100/1000BaseTx Ethernet ports.

> 
> Any thoughts? How much paranoia is too much? :)  And how much
> redundancy is too much?

As soon as the administration & operation of the redundancy begins to have a
detrimental effect on the traffic passing on the network, I think it is too
much.  I think a perfect example of this is SIA routes in EIGRP caused by
too much redundancy on the WAN.  Fully-meshed WANs, beyond a very small
number of sites becomes a real disaster.

The TYPE of redundancy is also very important...  ACTIVE/ACTIVE, while kinda
cool to play with, usually ends up causing more problems than it solves.
The wide deployment of HSRP clearly shows how effective active/standby
technology can be when used correctly.  The Big IP load balancers in
active/standby also illustrate this.  GLBP is going to have an uphill
battle, IMHO.

-Brant 

> Thanks,
> John
> --
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/