[c-nsp] FRR Recovery Time

gladston at br.ibm.com gladston at br.ibm.com
Sun Jan 15 15:19:39 EST 2006


Hi,

If you have measured FRR recovery time, did you find consistent times 
between your findings on lab and the 50msec stated on theory? 
We measure it using two tools. However, the results are inconsistent with 
the expected value of 50msec.

This is the network:

     RA1--------------------------RB1
       |                                          |
       |                                          |
      RA2-------------------------RB2 
 
 The main tunnel is RA1-RB2
 The backup tunnel is RA1-RA2-RB2-RB1
The links are 155Mbps between RA1-RA2 and RB1-RB2 and Giga between RA1-RA2 
and RB1-RB2

Before failing the link RA1-RA2, we checked the results of Cisco MPLS 
commands to confirm that the main tunnel requested protection and that 
protection was ready on both sides, RA1 and RB1. We also confirmed using 
Cisco commands that after failing the main link FRR protected the link 
using the second link. We also used traceroute from a Linux box before and 
after the failure.
We avoid debug commands like "debug mpls lfib fast-reroute events" to 
check the time of failure recovery because we are interested in measure 
when recovery end-to-end IP communication is recognized by the source and 
destination application. 

First tool is:
Generates 1000 packets per seconds (1 packet each milisecond). 
Count the packets lost during FRR process.
Multiply the number of lost packets by 1msec.

This give us an aproximately time for FRR.

Second tools: 
The second tool is not really a tool, it is a timer of an application that 
uses the network for IP connectivity. This automatic timer is 200msec and 
counts down when there is no IP communication between the source 
application and the destination application (The application is Nokia 
MSS/MGW solution). If FRR takes more than 200msec to recovery the IP 
communication, there is an alarm logged on the system indicating that SCTP 
communications failed.

Comparing the results:
These were the result using the first tool:
70msec
481msec
411msec
371msec

Each time the first tool indicated that FRR took more than 200msec we 
checked the alarms on the second tool. It showed that if the first tools 
indicated less than 200msec there was no alarm. If the first tool 
indicated more than 200msec there was an alarm on the second tools.
This confirmed us that the approximated time indicated by the first tool 
is enough correctly for our purpose.

Nevertheless, as it is different from the documentations that states 
50msec for the operation of FRR, we would like to double check the result.
We used two ways to fail the link:
    -disconnect the POS fiber manually
    -shutdown the POS interface (using the command POS ais-shut)
As the time using manual shutdown increased to 800msec, we discarded the 
test using this way of failing the link and just use the first way, 
manually disconnecting the fiber.

Your comments and recommendations are more than welcome.

I am wondering if there is any timer on Cisco that can be configured, as 
"carrier-delay ms x", to improve FRR time. However, as opposite of this 
command that introduces some delay, I would like to improve the time of 
FRR is possible and test again.

Happy New Year.


Cordially,
------------------------------------------------------------------
Alaerte Gladston Vidali
IBM Global Services - SO
Tel.55+11+2121-2879   Fax:55+11+2121-2449


More information about the cisco-nsp mailing list