[j-nsp] Optimal BFD settings for BGP-signaled VPLS?

Sun Jan 16 10:34:10 EST 2011

If BGP stability is the main goal, do not use BFD with your BGP sessions.
Are you using site multi-homing with the connected CE devices or are they
all single-homed?  I don't know your topology but there may be some
instances where you would want to run BFD for BGP notification with
multi-homing.   

What hardware are you using?  We are using 300x3 everywhere and while we
have seen some isolated false positives, things have been relatively
stable.  

Also, I would look at the types of failures you sustain on a regular
basis.  BFD doesn't make restoration faster, it lets you catch issues
which may not have otherwise been caught like control plane issues. If you
do not have a history of that maybe BFD isn't really necessary and may
cause more problems than it solves.  Link failures and most node failures
(which cause links to go dark) trigger routing protocol events much faster
than BFD.  We use it because the routers were keeping the physical links
up during a reboot and would eventually start dropping traffic.

Phil 

On 1/14/11 9:39 PM, "Clarke Morledge" <chmorl at wm.edu> wrote:

>I am trying to determine the optimal Bidirectional Forwarding Detection
>(BFD) settings for BGP auto-discovery and layer-2 signaling in a VPLS
>application.
>
>To simplify things, assume I am running LDP for building dynamic-only
>LSPs, as opposed to RSVP.  Assume I am running IS-IS as the IGP with BFD
>enabled on that, too, interconnecting all of the P and PE routers in the
>MPLS cloud.  I am following the Juniper recommendation of 300 ms mininum
>interval with 3 misses before calling a BFD down event.
>
>The network design has a small set of core routers, each one of these
>routers serves as a BGP route reflector.  All of the core routers have
>inter-meshed connections.  Each core router is only one hop away from the
>other.
>
>On the periphery, I have perhaps dozens of distribution routers.  Each
>distribution router is  directly connected to two or more core routers.
>Each distribution router has a BGP session to these core routers;
>therefore, each distribution router is connected to two route reflectors
>for redundancy.
>
>Given that above, what type of minimum interval BFD setting and miss
>count 
>would you configure?  I want to try to get to a sub-second convergence
>during node/link failure, but I do not want to tune BFD too tight and
>potentially introduce unecessary flapping.  I am willing to suffer some
>sporadic loss to the layer-2 connectivity within the VPLS cloud in the
>event of a catastrophe, etc, for a few seconds, but I don't want to
>unnecessarily tear down BGP sessions and wait some 20 to 60 seconds or so
>until BGP rebuilds and redistributes L2 information.
>
>For some time now, I have been playing with 3000 ms interval with 3
>misses 
>(that's 9 seconds) as what I thought was a conservative estimate.
>However, I have run into cases where there has been enough router churn
>for various reasons to uneccesarily trip a BFD down event.  My hunch is
>that this "router churn" is due to buggy JUNOS code, but I don't have
>proof of that yet.  Nevertheless, I want the BGP infrastructure to stay
>solid and ride through transient events in a redundant network.
>
>Are there any factors that I am missing or not thinking thoroughly enough
>about when considering optimal BFD settings?
>
>Thanks.
>
>Clarke Morledge
>College of William and Mary
>Information Technology - Network Engineering
>Jones Hall (Room 18)
>Williamsburg VA 23187
>_______________________________________________
>juniper-nsp mailing list juniper-nsp at puck.nether.net
>https://puck.nether.net/mailman/listinfo/juniper-nsp