Nasty iBGP design issue...

From: Pegg Damon (Damon.Pegg@carrier1.com)
Date: Tue Jul 03 2001 - 09:20:25 EDT


Right then, here's one thats getting right on my pectorals - maybe Cisco
dev. have an answer since I'm hitting a brick wall.

I have a serious problem with failover speeds within iBGP for dual-homed
customers, from a carrier's perspective. EBGP supports
fast-external-fallover by default so that a connection failure between
provider and customer borders, or a failure of customer border equipment
itself , causes instantaneous teardown of the bgp session and thus instant
triggering of route-removal. Wonderbar!
However, there is an inherent problem with iBGP that causes serious
blackholing of traffic and route convergence lag where the edge device
itself fails, or connectivity to the rest of the bgp mesh becomes
unavailable.

Picture a fairly standard scenario of edge route-reflector clients each
dual-homed physically and logically(iBGP) to two core/distribution layer
Route-reflectors, typically within the same cluster, with iBGP sessions
configured over loopback addresses. Now add to this a customer who, for
resilience sake, connects to router A in NYC and router B in Oslo, with each
router connected as described above. Now, suppose my router A fails.
Admittedly this is not an everyday occurence for customers but it certainly
does happen. My NYC core router, C, fails to tear down the session until
bgp timers reach the determined length, default being three minutes. Also,
since the loopback address of the edge device is usually reachable via a
supernet (bgp accepts any route-match other than default for validating
next-hop) the routes in the bgp table are not invalidated. Consequently,
although the customer has a prefectly functional connection in Oslo, traffic
normally travelling via NYC is blackholed! Since router crashes are a
not-entirely uncommon cause of customer outage this seems very odd,
especially since the affected edge device will usually have reloaded and
begun to reestablish bgp sessions shortly after the routing has finally
converged correctly.

Now, the problem also occurs when the next-hop actually becomes unreachable,
such as where the loopback addresses are carried only as /32 routes and are
part of no other supernet, which seems ridiculous to me. Cisco's official
answer so far is that the route's received from a neighbor are only
invalidated based on next-hop by updates. Interesting answer but I never
see too many updates over sessions that are down, or from boxes that are
dead! So, as above, router C maintains a session, keeps all routes received
from that neighbor valid and blackholes traffic since it doesn't have an
effective igp route to the next-hop.

The above wouldn't be too bad but it even occurs when the iBGP peerings are
done over connected interfaces rather than loopbacks, which in the detailed
design doesn't have any other negative impact. Even when the interface goes
down the session is kept up, regardless of whether or not any route remains
for the address of the peer.

Without ammending the timers wholesale between core and edge, what other
alternatives are there..?

Damon.



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:12:43 EDT