Single Router BGP Convergence

From: Howard C. Berkowitz (hcb@clark.net)
Date: Fri Jan 26 2001 - 10:14:40 EST

Next message: Sean Doran: "3d graph theory text/reference"
Previous message: awr: "Re: Wither irtf-rr"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Here's an updated version of my draft on single-router BGP
convergence. Does this belong more in IRTF-RR with the global
convergence work or in BMWG? The intention of this paper is to
complement the global BGP convergence
work in IRTF-RR, and the router throughput methodology in BMWG.

Network Working Group H. Berkowitz
Internet-Draft Nortel Networks
January 2001

Benchmarking Methodology for Exterior Routing Convergence

Status of this Memo

    This memo provides information for the Internet community. It does
    not specify an Internet standard of any kind. Distribution of this
    memo is unlimited.

Abstract

    This document defines a specific set of tests that vendors can use to
    measure and report the convergence performance of BGP-4 processes. It does
    not consider the forwarding performance of such routers once they have
    converged. A separate document will define convergence in interior routing.
    This memo will consider changes in forwarding performance while a router
    is reconverging, but RFC 2544 remains the methodology document for
    benchmarking forwarding performance.

1. Introduction

    This document defines a specific set of tests that implementers can use to
    measure and report the convergence performance of BGP routers. It does
    not consider the forwarding performance of such routers once they have
    converged, with the caveat that the effect of the reconvergence process on
    forwarding performance can be considered.
.
    Indeed, the techniques here are appropriate for pure route servers
as well as
    for devices that do both path determination and packet forwarding. The
    results of these tests will provide the user comparable
    data from different vendors with which to evaluate these devices. RFC
    2544 remains the methodology document for forwarding performance.

Labovits, Ahuja, et al have done, and are continuing to do, valuable
work on Internet-wide convergence. Their measurements, however,
reflect a wide range of factors affecting convergence, including
media speeds, propagation times, policies, etc. Whenever possible,
terminology in
this document is consistent with Labovits et al.

The presentation does not formalize the definition of convergence,
but, in any case, there appear to be several useful meanings of "BGP
convergence time." Lack of standard terminology leads both to
difficulty in comparing research results, and generating FUD for
Internet operators and consumers.

Existing benchmarking documents, such as RFC 2544, focus on
forwarding performance rather than convergence.

2. Requirements

In this document, the words that are used to define the significance
of each particular requirement are capitalized. These words are:

* "MUST" This word, or the words "REQUIRED" and "SHALL" mean that
the item is an absolute requirement of the specification.

       * "SHOULD" This word or the adjective "RECOMMENDED" means that
          there may exist valid reasons in particular circumstances to
          ignore this item, but the full implications should be
          understood and the case carefully weighed before choosing a
          different course.

        * "MAY" This word or the adjective "OPTIONAL" means that this
          item is truly optional. One vendor may choose to include the
          item because a particular marketplace requires it or because it
          enhances the product, for example; another vendor may omit the
          same item.

    An implementation is not compliant if it fails to satisfy one or more
    of the MUST requirements for the protocols it implements. An
    implementation that satisfies all the MUST and all the SHOULD
    requirements for its protocols is said to be "unconditionally
    compliant"; one that satisfies all the MUST requirements but not all
    the SHOULD requirements for its protocols is said to be
    "conditionally compliant".
  3. Workloads and Scenarios

Providing useful convergence information for BGP routers depends
significantly on the intended use of the router. Since workload,
principally the size of the full routing table and the number of BGP
peers, but also additional processing such as route filtering, flap
dampening, authentication, etc., will affect any router.

Not all BGP routers are intended for the same applications. This
section presents some representative scenarios, but, in practice, the
tester of a given router will need to develop workload parameters
that are appropriate for the intended purpose. The goal of this
specification is not to prescribe numeric values for these
parameters, but simply to identify the parameters and require them to
accompany a compliant test report.

A given test report must include:

Number of routes to be in the device under test's (DUT) converged
routing table

Number of eBGP peers
For each peer, the number of routes to be received and to be advertised

Number of iBGP peers

The number of routes will vary with the proposed application.
Realistic numbers should be based on the size of a current
default-free routing table (exclusive of internal routes). This
table is referred to as DFRT and the number of routes it contains a
NDFRT. It is the Routing Information Base (RIB) of the DUT.
Depending on the router implementation, one or more Forwarding
Information Bases (FIB) may need to be generated from the RIB before
a router can advertise and forward at full speed.

Be aware that many service providers will have substantial numbers of
internal and non-aggregated customer routes, so the routing table of
a large provider's core router could very well contain 1.5 NDFRT or
more routes. Smaller RIBs may be used with routers explicitly
intended for edge use with defaults, and the assumptions cited.
Appendix A presents some scenarios for typical BGP applications.

4. Types of Convergence

Two significantly different types of convergence time tend to be
lumped together in product specifications. The first is the time
needed for a BGP speaker to build a full table after initialization,
or for a particular peering session to rebuild its table after a hard
reset. The second is the time needed for a router to respond to a new
announcement or withdrawal.

4.1 Reference Configuration

For tests when the number of peers is not a performance parameter of
interest, use the configuration in Figure 1:

TR1==========+---------+==========TR3
  | | |
  D1 | |
  | | DUT |
TR2==========| |
              +---------+

D1 is a prefix reachable by both TR1 and TR2. It is assumed that
neither TR1 or TR2 is the originating AS for the announcement of D1.

More complex peering arrangements will involve up to n Test Routers,
as shown in Figure 2. It is recommended that the Figure 1
configuration always be tested as a baseline, and then additional
reports made that show the effect on performance of increasing the
number of peers.

TR1==========+---------+==========TR3
  | | |
  D1 | |
  | | DUT |
TR2==========| |
              | |
                  ...
TRn==========+---------+

Interface speeds will be specified as part of the test report. At
least 100 Mbps is recommended, so media delays are not a signficant
component of the convergence time.

In the absence of other route selection criteria, TR1 shall have an
IP address that makes it most preferred.

4.3 Events in the Convergence Process

[Ahuja 2000a] defines the events:

       Tup -- A new route is advertised
       Tdown -- A route is withdrawn (i.e. single-homed failure)
       Tshort -- Advertise a shorter/better ASPath (i.e. primary path repaired)
       Tlong -- Advertise a longer/worse ASPath (i.e.primary path fails)

In this paper, the meaning of Tup and Tdown are preserved and
extended from [Ahuuja]. The notation Tup(TRx) means a Tup event
advertised to the router being tested (i.e., DUT).

The sense of the Tshort and Tlong events is also preserved, but the
basic criterion for selecting a "better" route is the final
tiebreaker defined in RFC1771, the router ID. As a consequence, this
memorandum uses the events Tbetter, Tworse, and Tbest.

While ASPath is quite likely to be the most common tiebreaker in the
operational Internet, it is not actually part of the RFC-defined
route preference algorithm. AS path prepending is another widely used
but nonstandard factor for influencing route preference, but
questions have been raised regarding its scalability in an
ever-growing Internet.

5. Measurement

Measurements can be defined either as internal or external. Internal
measurements examine the RIB/FIB of the DUT. While they are more
accurate in principle, they require measurement hooks in the
implementation, as described in [Trotter].

External measurements start with a stimulus from one or more
"upstream" routers and end with a specific event causing an
advertisement to be sent to a "downstream" peer. In the reference
configuration above, external measurements are defined with respect
to TR3 as the downstream router.

6. eBGP tests

All routers in this configuration have a policy of ADVERTISE
ALL/ACCEPT ALL [RPSL]. Tests with prefix filtering, community-based
preferences, authentication, etc., as well as performance under flap
are TBD.

Not all eBGP applications are alike. While the tests in this section
are applicable to a wide range of configurations, testers may select
configurations that are most relevant to the intended product use.
Such configurations include:

    1. Interprovider peering, characterized by an exchange of customer routes,
       which, in the case of major providers, may be in the tens of thousands
       of routes but smaller than the default-free table.

    2. Transit services, where the transit customer advertises a relatively
       small number of routes toward the provider, but variously may take
       full default-free routes, customer routes, or default only from the
       provider.

6.1 eBGP Initial Convergence

While this is relatively simple to measure, and often is the basis of
product specifications, it is operationally far less significant than
reconvergence after changes. A "carrier-grade" router should not
initialize often, and the soft reset option reduces the need to
rebuild views. The initialization time, therefore, can be amortized
over a long period of time and may disappear into the noise when
compared to reconvergence.

6.1.1 Initial Convergence Time

The test begins with OPEN requests sent from TR1 and TR2 to the DUT.
Each Test Router sends a standard routing table of TBD routes.

The test ends when the DUT begins to advertise the last route in the
routing table to TR3.

6.2 eBGP Reconvergence

For all of these measurements, report any route filters,
authentication, and reverse path verification used.

6.2.1 Time to Add Newly Advertised Route

The DUT has been initialized, with no path to D. Measurement time
begins when TR1 announces D to the DUT.

Measurement time stops when the DUT advertises D to TR3.

6.2.1.2 Time to Begin Forwarding to D

Prior to TR1 advertising D, TR2 attempts to forward to TR3 via the
DUT. Measurement time ends when TR3 receives a TR1-originated packet
via the DUT.

6.2.2 Time to Change to Alternate Path after Withdrawal

The DUT has been initialized and has paths to D via both TR1 and TR2.
TR1's path is preferred, but TR1 withdraws it with TDown(TR1).
Reconvergence occurs when the TR2 advertised paths becomes active.

Measurement time stops when the DUT advertises D to TR3.

6.2.2.2 Time to Begin Forwarding to D

Prior to TR1 advertising D, TR2 attempts to forward to TR3 via the
DUT. Measurement time ends when TR3 receives a TR1-originated packet
via the DUT.

6.2.3 Time to Reconverge after Sequential Withdraw and New Announcement

The DUT has been initialized and has a path to D1 via TR1, not TR2.
Simultaneously, TR1 sends TDown(TR1) and TR2 announces the new route
with Tbest(TR2).

Measurement time stops when the DUT advertises D to TR3.

6.2.3.2 Time to Begin Forwarding to D

Prior to TR1 advertising D, TR2 attempts to forward to TR3 via the
DUT. Measurement time ends when TR3 receives a TR1-originated packet
via the DUT.

7. iBGP

7.1 Mesh tests

Repeat the topologies of step 5, but within the same AS. The test
report shall show the specific test configuration(s). It is highly
desirable that the result show the effect of increasing the number of
peers on routing performance.

7.2 Route Reflector tests

TR1==========+---------+==========TR3
  | | |
  D1 | |
  | | DUT |
TR2==========| |
              | |
                  ...
TRn==========+---------+

7.2.1 DUT as Route Reflector

The DUT acts as the cluster server in a single-server cluster. Let
TR1 and TR2 be clients of the DUT, and repeat the tests of step 5.

7.2.2 DUT Route Reflector in multiple reflector cluster

The DUT acts as one of the the clusters server in a multi-server
cluster. TRn will be the additional server. There will be iBGP
peering between TRn and DUT, between DUT and TR1, between TRn and
TR1, between DUT and TR2, and between TRn and TR2. Let TR1 and TR2 be
clients of the DUT, and repeat the tests of step 5.

7.2.3 DUT as Route Reflector Client

The DUT acts as a client in a single-server cluster. Let TR1 be the
cluster reflector. TR2, and additional routers as desired, serve as
clients. Test results shall state the number of clients.

7.2.4 DUT as Route Reflector Client in multiple reflector cluster

The DUT acts as one of the the clients in a multi-server cluster. TRn
will be the additional server. There will be iBGP peering between
TR1 and TRn, between DUT and TR1, between DUT and TRN, between TR2
and TR1, and between TR2 and TRN.

8. Modifiers

    It might be useful to know the DUT performance under a number of
    conditions; some of these conditions are noted below. The reported
    results SHOULD include as many of these conditions as the test
    equipment is able to generate. The suite of tests SHOULD be first
    run without any modifying conditions and then repeated under each of
    the conditions separately.

8.1 Filters

8.1.1 Representative Customer Ingress Filtering

Following the principles of [RFC 2827], perform the eBGP tests with a
filter to accept a single prefix from TR1, while being sent a
10-route table and a full (TBD) table.

8.2. Bursty traffic/route flap

Let TRF be a router that will generate only flapping routes.

TR1==========+---------+==========TR3
  | | |
  D1 | |
  | | DUT |
TR2==========| |
              | |
                  ...
TRF==========+---------+

8.2.1 Flap Isolation Test

TRF will advertise a continuously flapping route. Repeat the eBGP
convergence tests.

8.2.2 Flap Rejection Tests

Repeat eBGP Reconvergence Tests while one route in the TR1 peering
flaps continuously.

8.3 Communities

8.3.1 Community-based Acceptance

Perform the eBGP tests with a filter to accept TBD prefixes tagged
with community XXX, sent as part of a full (TBD) table.

8.3.2 Community Advertising

Perform the eBGP advertising tests but adding a community YYY.

9. Security Considerations

Security issues are not addressed in this document.

10. Acknowledgements

Thanks to Francis Ovenden for review and Abha Ahuja for encouragement.

11. References

    [Ahuja 2000a] "An Experimental Study of Delayed Internet Routing
Convergence." Abha Ahuja, Farnam Jahanian, Abhijit Bose, Craig
Labovits, RIPE 37 - Routing WG.
    [RFC 2539] "BGP Route Flap Damping" C. Villamizar, R. Chandra, R.
Govindan. November 1998.
    [RFC 2544] "Benchmarking Methodology for Network Interconnect
Devices." S. Bradner, J. McQuaid. March 1999.
    [RFC 2622] Routing Policy Specification Language (RPSL)." C.
Alaettinoglu, C. Villamizar, E. Gerich, D. Kessens, D. Meyer, T.
Bates, D. Karrenberg, M. Terpstra. June 1999.
    [RFC 2827] Network Ingress Filtering: Defeating Denial of Service
Attacks which employ IP Source Address Spoofing. P. Ferguson, D.
Senie. May 2000.
    [RFC 2928] "Route Refresh Capability for BGP-4". E. Chen.
    [Trotter] "Terminology for Forwarding Information Based (FIB)
based Router Performance Benchmarking", Work in Progress, IETF
draft-ietf-bmwg-fib-term-00.txt

12. Author's Address

    Howard Berkowitz
    Nortel Networks
    5012 S. 25th St
    PO Box 6897
    Arlington VA 22206

    Phone: +1 703 998-5819 (ESN 451-5819)
    Fax: +1 703 998-5058
    EMail: hberkowi@nortelnetworks.com
           hcb@clark.net

Full Copyright Statement

    This document and translations of it may be copied and furnished to
    others, and derivative works that comment on or otherwise explain it
    or assist in its implementation may be prepared, copied, published
    and distributed, in whole or in part, without restriction of any
    kind, provided that the above copyright notice and this paragraph are
    included on all such copies and derivative works. However, this
    document itself may not be modified in any way, such as by removing
    the copyright notice or references to the Internet Society or other
    Internet organizations, except as needed for the purpose of
    developing Internet standards in which case the procedures for
    copyrights defined in the Internet Standards process must be
    followed, or as required to translate it into languages other than
    English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

    This document and the information contained herein is provided on an
    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Next message: Sean Doran: "3d graph theory text/reference"
Previous message: awr: "Re: Wither irtf-rr"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Mon Aug 04 2003 - 04:10:04 EDT