[j-nsp] Optimal BFD settings for BGP-signaled VPLS?

Mon Jan 17 15:56:26 EST 2011

Oops I meant forwarding plane in my original post.   In some of the older
hardware where it's a centralized function and not independent of the
control plane, it helps catch those failures as well.   I believe this
covers most Juniper hardware, unless I'm mistaken.

Phil 

From:  Keegan Holley <keegan.holley at sungard.com>
Date:  Mon, 17 Jan 2011 15:22:01 -0500
To:  Thedin Guruge <thedin at gmail.com>
Cc:  Phil Bedard <philxor at gmail.com>, <juniper-nsp at puck.nether.net>, Clarke
Morledge <chmorl at wm.edu>
Subject:  Re: [j-nsp] Optimal BFD settings for BGP-signaled VPLS?

I agree except for using the IGP and RSVP for failure detection.  RSVP and
OSPF/ISIS run in the control plane and BFD is designed to run in the
forwarding plane.  Running BFD will diagnose issues where the control plane
is working but the forwarding plane is not.

On Mon, Jan 17, 2011 at 3:13 PM, Thedin Guruge <thedin at gmail.com> wrote:
> Hi,
> 
> What i gather is that you have LDP implemented in MPLS level and edge
> routers are dual homed with core routers, why not consider running LDP over
> RSVP, RSVP LSPs will only be per link LSPs between P-PE links. RSVP will
> provide sub second failure times and no need for a dirty full meshed RSVP
> setup. But of course this relies on fast link down detection at a physical
> level as well as by IGP. but you can opt out BFD with BGP.
> 
> Thedin
> 
> On Mon, Jan 17, 2011 at 4:34 AM, Phil Bedard <philxor at gmail.com> wrote:
> 
>> > If BGP stability is the main goal, do not use BFD with your BGP sessions.
>> > Are you using site multi-homing with the connected CE devices or are they
>> > all single-homed?  I don't know your topology but there may be some
>> > instances where you would want to run BFD for BGP notification with
>> > multi-homing.
>> >
>> > What hardware are you using?  We are using 300x3 everywhere and while we
>> > have seen some isolated false positives, things have been relatively
>> > stable.
>> >
>> > Also, I would look at the types of failures you sustain on a regular
>> > basis.  BFD doesn't make restoration faster, it lets you catch issues
>> > which may not have otherwise been caught like control plane issues. If you
>> > do not have a history of that maybe BFD isn't really necessary and may
>> > cause more problems than it solves.  Link failures and most node failures
>> > (which cause links to go dark) trigger routing protocol events much faster
>> > than BFD.  We use it because the routers were keeping the physical links
>> > up during a reboot and would eventually start dropping traffic.
>> >
>> > Phil
>> >
>> > On 1/14/11 9:39 PM, "Clarke Morledge" <chmorl at wm.edu> wrote:
>> >
>>> > >I am trying to determine the optimal Bidirectional Forwarding Detection
>>> > >(BFD) settings for BGP auto-discovery and layer-2 signaling in a VPLS
>>> > >application.
>>> > >
>>> > >To simplify things, assume I am running LDP for building dynamic-only
>>> > >LSPs, as opposed to RSVP.  Assume I am running IS-IS as the IGP with BFD
>>> > >enabled on that, too, interconnecting all of the P and PE routers in the
>>> > >MPLS cloud.  I am following the Juniper recommendation of 300 ms mininum
>>> > >interval with 3 misses before calling a BFD down event.
>>> > >
>>> > >The network design has a small set of core routers, each one of these
>>> > >routers serves as a BGP route reflector.  All of the core routers have
>>> > >inter-meshed connections.  Each core router is only one hop away from the
>>> > >other.
>>> > >
>>> > >On the periphery, I have perhaps dozens of distribution routers.  Each
>>> > >distribution router is  directly connected to two or more core routers.
>>> > >Each distribution router has a BGP session to these core routers;
>>> > >therefore, each distribution router is connected to two route reflectors
>>> > >for redundancy.
>>> > >
>>> > >Given that above, what type of minimum interval BFD setting and miss
>>> > >count
>>> > >would you configure?  I want to try to get to a sub-second convergence
>>> > >during node/link failure, but I do not want to tune BFD too tight and
>>> > >potentially introduce unecessary flapping.  I am willing to suffer some
>>> > >sporadic loss to the layer-2 connectivity within the VPLS cloud in the
>>> > >event of a catastrophe, etc, for a few seconds, but I don't want to
>>> > >unnecessarily tear down BGP sessions and wait some 20 to 60 seconds or so
>>> > >until BGP rebuilds and redistributes L2 information.
>>> > >
>>> > >For some time now, I have been playing with 3000 ms interval with 3
>>> > >misses
>>> > >(that's 9 seconds) as what I thought was a conservative estimate.
>>> > >However, I have run into cases where there has been enough router churn
>>> > >for various reasons to uneccesarily trip a BFD down event.  My hunch is
>>> > >that this "router churn" is due to buggy JUNOS code, but I don't have
>>> > >proof of that yet.  Nevertheless, I want the BGP infrastructure to stay
>>> > >solid and ride through transient events in a redundant network.
>>> > >
>>> > >Are there any factors that I am missing or not thinking thoroughly enough
>>> > >about when considering optimal BFD settings?
>>> > >
>>> > >Thanks.
>>> > >
>>> > >Clarke Morledge
>>> > >College of William and Mary
>>> > >Information Technology - Network Engineering
>>> > >Jones Hall (Room 18)
>>> > >Williamsburg VA 23187
>>> > >_______________________________________________
>>> > >juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> > >https://puck.nether.net/mailman/listinfo/juniper-nsp
>> >
>> >
>> > _______________________________________________
>> > juniper-nsp mailing list juniper-nsp at puck.nether.net
>> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>> >
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>