[c-nsp] Filtering Routes with Private AS Numbers in the AS Path

Thu Mar 22 18:02:41 EDT 2012

On 20/03/2012 23:17, Ivan wrote:
> For filtering private as numbers (64512-65535) using an as-path
> access-list there are a few options I have seen:

Ivan,

this is quite an interesting question.  Obviously, the best way to model it
is by testing it out in a lab environment with cisco gear, but given that
this is slightly difficult to quantify accurately, I took the liberty of
modelling the regexps against the perl regexp library. If you're interested
in the theory of how regexps work, there's lots about it on the internet.

> 1). All in one line
> ip as-path access-list 66 permit
> _(6451[2-9]|645[2-9][0-9]|64[6-9][0-9][0-9]|65[0-4][0-9][0-9]|655[0-2][0-9]|6553[0-5])_
> 
> 2). The above modified hopefully to be "better" in terms or regexp
> processing but perhaps not readability
> ip as-path access-list 66 permit
> _6(4(5(1[2-9]|[2-9][0-9])|[6-9][0-9][0-9])|5([0-4][0-9][0-9]|5([0-2][0-9]|3[0-5])))_
> 
> 3). Separate lines
> ip as-path access-list 66 permit _6451[2-9]_
> ip as-path access-list 66 permit _645[2-9][0-9]_
> ip as-path access-list 66 permit _64[6-9][0-9][0-9]_
> ip as-path access-list 66 permit _65[0-4][0-9][0-9]_
> ip as-path access-list 66 permit _655[0-2][0-9]_
> ip as-path access-list 66 permit _6553[0-5]_
> 
> I would appreciate any feedback as to which is the least CPU intensive and
> if there is a better way to optimise 2 above.

The first thing to note is that #3 is basically the same as #1 except that
there are two possible ways of implementing the regexp->DFA transformation.
Either it can be done as a single DFA with possible optimisation or else as
a implementation with multiple DFAs.  The latter will certainly be slower.
 The former can be no faster than #1 in the best case.  So #3 is more
readable but certainly no more efficient than #1, pretty much regardless of
exactly how cisco handle it.  The real question then is how #2 is processed
in comparison to #1.

In terms of how simple regexps like this are processed, there is really
only one algorithmic method: you transform the RE into a finite-state
automaton using the as path as input (note that for more complicated
regexps, this is categorically not the case and there are some interesting
edge cases which cause the perl re lib to undergo catastrophic
time-constraint failure).  There are a couple of different ways of doing
this transformation and data input, but for a simple regexp like yours with
no back-references, while there might be implementation differences which
will make one library faster than another, my suspicion is that you can do
a comparative analysis using a different RE library to make comparisons
about which expression is more efficient than which.

I created a perl model to check the two regexps 100 times each, for each
$as between 0 and 65535:

> for ($j=0; $j < 100; $j++) {
>  for ($i=0; $i < 65536; $i++) {
>    if ($i =~ /<regexp>/) {
>       $match++;
>    }
>  }  
> }   

The wall-clock execution time was measured for 60 runs for each regexp and
the 10 fastest times were compared for each regexp, in order to eliminate
cpu jitter.  The average wall clock time for the fastest 10 runs were:

#1: 4.114 seconds
#2: 2.202 seconds

i.e. #2 is about ~47% more efficient than #1.

Checking against a sequential list of asns isn't a very realistic means of
figuring out what's going to happen in real life.  So for the next run, I
took a dfz dump and pulled out all the ASN paths from it and pushed this
text file through the following code:

> for ($j=0; $j < 5; $j++) { 
>   open (INPUT, "dfz.txt");
>   while (<INPUT>) {
>     chomp;
>     if (/\b<regexp>\b/) {
>        $match++;
>     }
>   }
>   close (INPUT);
> }

"\b" is the perl equivalent of cisco's "_" character, and is necessary to
stop arbitrary sequence matching.  This produced the following average
fastest top-10 results:

#1: 5.085 seconds
#2: 1.803 seconds

i.e. #2 is about ~65% more efficient than #1.

You can see from these results that regexp #2 is much faster in general for
perl, and based on the fact that all regexp libraries use similar
algorithms I would suspect that you're going to see similar results for the
IOS implementation.

Several other bgp as-path filtering implementations (e.g. JUNOS and BIRD)
score a major win against cisco in this regard, as they can perform
numerical matching, which turns the input algorithm from:

text input -> regexp parser -> match yes/no
text input -> convert to numeric -> numeric comparison -> match yes/no

The latter is obviously much faster than the former.

hth,
Nick