Re: Nasty iBGP design issue...

From: Paul Schultz (khyron@ninjageek.org)
Date: Tue Jul 03 2001 - 10:18:55 EDT


If the main problem exists between IBGP sessions within a POP, why would
you need to leave the default holdtime to 3 minutes on your
intra-pop sessions? BGP timers need to
be longer than the worst case scenerio IGP convergence times, so if you
can determine that in the event of a failure your IGP will do its magic
withing 30 seconds than leaving the BGP holdtime to 180 seconds is just
too long.

The fact that IBGP doesn't immediately tear down sessions is a very good
thing.. in many designs each BGP speaking router (using loopbacks for all
IBGP communications ofcourse) will have multiple ways for other BGP
routers to reach its loopback address (like 2 fast/gig ethernet links
going to different VLANs for intra-pop sessions, and multiple
backbone links for inter-pop.) so if a direct link fails, the BGP session
never clears because an alternate route to that loopback may be found
within 10-30 seconds.

Reduce the holdtimes to whatever you feel gives you enough time for link
failures to heal (IGP finding different route to the remote loopback) but
still giving you an "acceptable" convergence time if the whole router goes
down.. Like I said if you have a very good IGP implementation you may be
able to bring your holdtimes down to 40-45 seconds, but keep in mind the
risk of blackholing a few routes for a few seconds is much better than
having a BGP session flap completely when it shouldn't have (between two
routers that should have found an alternate IGP route to the remote
loopback.) It's your decision as to what you want to risk the most.

Hope this helps..

Paul

On Tue, 3 Jul 2001, Pegg Damon wrote:

> Right then, here's one thats getting right on my pectorals - maybe Cisco
> dev. have an answer since I'm hitting a brick wall.
>
> I have a serious problem with failover speeds within iBGP for dual-homed
> customers, from a carrier's perspective. EBGP supports
> fast-external-fallover by default so that a connection failure between
> provider and customer borders, or a failure of customer border equipment
> itself , causes instantaneous teardown of the bgp session and thus instant
> triggering of route-removal. Wonderbar!
> However, there is an inherent problem with iBGP that causes serious
> blackholing of traffic and route convergence lag where the edge device
> itself fails, or connectivity to the rest of the bgp mesh becomes
> unavailable.
>
> Picture a fairly standard scenario of edge route-reflector clients each
> dual-homed physically and logically(iBGP) to two core/distribution layer
> Route-reflectors, typically within the same cluster, with iBGP sessions
> configured over loopback addresses. Now add to this a customer who, for
> resilience sake, connects to router A in NYC and router B in Oslo, with each
> router connected as described above. Now, suppose my router A fails.
> Admittedly this is not an everyday occurence for customers but it certainly
> does happen. My NYC core router, C, fails to tear down the session until
> bgp timers reach the determined length, default being three minutes. Also,
> since the loopback address of the edge device is usually reachable via a
> supernet (bgp accepts any route-match other than default for validating
> next-hop) the routes in the bgp table are not invalidated. Consequently,
> although the customer has a prefectly functional connection in Oslo, traffic
> normally travelling via NYC is blackholed! Since router crashes are a
> not-entirely uncommon cause of customer outage this seems very odd,
> especially since the affected edge device will usually have reloaded and
> begun to reestablish bgp sessions shortly after the routing has finally
> converged correctly.
>
> Now, the problem also occurs when the next-hop actually becomes unreachable,
> such as where the loopback addresses are carried only as /32 routes and are
> part of no other supernet, which seems ridiculous to me. Cisco's official
> answer so far is that the route's received from a neighbor are only
> invalidated based on next-hop by updates. Interesting answer but I never
> see too many updates over sessions that are down, or from boxes that are
> dead! So, as above, router C maintains a session, keeps all routes received
> from that neighbor valid and blackholes traffic since it doesn't have an
> effective igp route to the next-hop.
>
> The above wouldn't be too bad but it even occurs when the iBGP peerings are
> done over connected interfaces rather than loopbacks, which in the detailed
> design doesn't have any other negative impact. Even when the interface goes
> down the session is kept up, regardless of whether or not any route remains
> for the address of the peer.
>
> Without ammending the timers wholesale between core and edge, what other
> alternatives are there..?
>
> Damon.
>



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:12:43 EDT