[c-nsp] delay eBGP sessions on startup?

Mon Nov 23 02:46:56 EST 2009

Hi,

so I'm now following the design that everbody claims is "best" (loopbacks
in OSPF, everything else in BGP), and I've found a few corner cases that
are seriously worse than "customer routes in OSPF".

Number one - consider the following (simplified) network:

Upstream 1 <---> ISP-Router 1 <---> ISP-Router 2 <---> Upstream 2
                                         |
                                     Customer X

both ISP-Routers announce the ISP's aggregate (let's call it 200.1.0.0/16) 
to their respective upstream providers (static route to null0, "network"
statement).  This needs to be done, to make sure that the aggregate is
always visible, even if one of the routers is down.

Customer X uses addresses from 200.1.0.0/16, let's give him 200.1.1.1/32.

So, when "ISP-Router 1" boots, the following happens, more or less in
this order:

 1. bootup complete

 2. OSPF neighbor establishes with ISP-Router 2

 3. eBGP-Session to "Upstream 1" establishes, 200.1.0.0/16 is announced
    (only a single prefix is announced outbound)

 4. iBGP-Session to "ISP-Router 2" establishes, 200k prefixes start
    propagating ISP-R2 -> ISP-R1 (full table at ISP-R2)

 5. Traffic starts flowing from "Upstream 1" to "ISP-Router 1"
    (because the Upstream router is installing the 200.1.0.0/16 route 
    right away)

 6. <20-60 seconds delay>

 7. ISP-R1 has processed all the BGP prefixes from ISP-R2, has built a
    FIB, and programmed everything in its hardware forwarding engines.

 8. Traffic from "Upstream 1" to "Customer X" can be forwarded properly

the crucial element here is: between the items "5" and "8", packets 
coming from "Upstream 1" to "Customer X" are *dropped*, because ISP-R1
has no full internal reachability information yet, but is still announcing
reachability for the aggregate to "Upstream 1".

The 20-60 seconds delay comes from the fact that even if the eBGP and iBGP
sessions are established at roughly the same time, the eBGP session only 
has to announce one single prefix ("instantaneous"), while the iBGP session 
will see ~200k prefixes, "Customer X" being just one of them, fairly far down
at the end (200.1.1.1/32).

So - now I'm wondering if it's only me?  Shouldn't this problem bite other
folks as well?

The "other" design (customer routes in IGP) doesn't suffer from it, as
IGP is usually done converging before BGP starts.  But we don't want that.

One possible solution would be to have a knob that tells IOS "delay bringing
up eBGP sessions and/or announcement of routes on eBGP sessions for <n>
seconds after initial BGP startup".  This would make sure that iBGP has
converged before eBGP starts, and no transient black-holing is seen.

Is that possible?  I have googled and stared at the command-line help for
a while, but couldn't find anything useful.

Routers in question are 6500s with SXI2a.

gert
-- 
USENET is *not* the non-clickable part of WWW!
                                                           //www.muc.de/~gert/
Gert Doering - Munich, Germany                             gert at greenie.muc.de
fax: +49-89-35655025                        gert at net.informatik.tu-muenchen.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 305 bytes
Desc: not available
URL: <https://puck.nether.net/pipermail/cisco-nsp/attachments/20091123/8aa01cba/attachment.bin>