[j-nsp] SRX Active/Active

Sun Jun 26 23:54:43 EDT 2016

Hi Aaron,

On Sun, Jun 26, 2016 at 2:08 PM, Aaron Dewell <aaron.dewell at gmail.com>
wrote:

>
> Hi Brian,
>
> Those all are good mitigation steps for an RG0 failover.  There are some
> caveats about graceful restart on the SRXs, but those should have been
> fixed a while ago.  Just to be sure, I’d get a recommendation on Junos
> version from your local SE.
>
> Another option is to not use a cluster at all, and make them active/active
> via routing protocols.  Then, a control plane failure only kills one side.
> But then failovers are stateless which has more impact.
>
> However, control plane failures are rare, so it’s chasing a very small
> probability in the end.
>
> Aaron
>
>
Wow, very good idea.  I actually didn't consider using them independently.
This might be a good way to achieve our goals.  Like you mention, I will be
trading one type of failure over another.  But I like this idea since I'd
have more control with standard routing protocols.

Thanks.
/bs

> On Jun 26, 2016, at 12:40 PM, Brian Spade <bitkraft at gmail.com> wrote:
>
> Hi Aaron,
>
> On Sun, Jun 26, 2016 at 11:19 AM, Aaron Dewell <aaron.dewell at gmail.com>
> wrote:
> >
> > You are correct - RG0 will always be active/passive.  A full control
> plane failover will always be painful.
> >
> > SRX active/active is more about the interfaces in use.  You can arrange
> for half of your traffic to prefer FW1 vs. FW2 and achieve active/active in
> that way so you’ll take less of a hit when an interface fails (or a
> neighbor device goes down).  So that’s really what you are protecting
> against, which seems like you’ve done that.
> >
>
> Thanks for your feedback.  It will be a lot of configuration, but was
> thinking I could do the following to limit RG0 failure (or southbound Core
> failure):
>
>    - /31 transit VLAN per link (per VRF).  So the total number of /31
>    transit's needed will be 4 * # of VRFs (28 /31's in my case).
>    - Graceful restart configured on the SRX to limit RG0 failure.
>    - Core1 failure (or Core2 failure) should be limited with graceful
>    restart and all uplinks having an OSPF adjacencies.
>
> Anyways, just wondering your thoughts on this.  I will probably just have
> to lab it to see how it performs.
>
> If active/active is not a good way, I might have to add in two MX border
> routers... That seems like a waste since I just need a default route via
> BGP.
>
> Thanks.
> /bs
>
> >> On Jun 26, 2016, at 12:15 PM, Brian Spade <bitkraft at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm trying to figure out the best way to setup an SRX cluster as
> >> active/active.  I have attached a diagram of the topology, but it's a
> >> full mesh of links.  The ISP links are local interfaces and the
> >> southbound interfaces to the core routers are reth's.  Core1 is HSRP
> >> primary for all VLANs.  FW1 is primary for RG1 and FW2 is primary for
> >> RG2.  The IGP is OSPF but have many VRFs that are connected to the FW
> >> with transit VLANs to bind the sub-interface to virtual router & zone.
> >>
> >> The issue I have is Core2 has no active OSPF neighbors in this setup.
> >> Therefore, if Core1 fails, there will be a control outage as Core2
> >> establishes OSPF adjacencies.
> >>
> >> So I'm thinking it might be better to remove the reth's and use local
> >> interfaces on the FW/CORE links.  This way I can have a full mesh of
> >> OSPF adjacencies and no control plane loss when Core1 fails.
> >>
> >> Does anyone have thoughts on this or recommend the best way to achieve
> >> this active/active full mesh setup?  If there's good reason to not use
> >> active/active, I'd welcome the feedback.
> >>
> >> Thanks.
> >> /bs
> >> _______________________________________________
> >> juniper-nsp mailing list juniper-nsp at puck.nether.net
> >> https://puck.nether.net/mailman/listinfo/juniper-nsp
> >
>
>
>