[c-nsp] FRR Recovery Time
gladston at br.ibm.com
gladston at br.ibm.com
Sun Jan 15 15:19:39 EST 2006
Hi,
If you have measured FRR recovery time, did you find consistent times
between your findings on lab and the 50msec stated on theory?
We measure it using two tools. However, the results are inconsistent with
the expected value of 50msec.
This is the network:
RA1--------------------------RB1
| |
| |
RA2-------------------------RB2
The main tunnel is RA1-RB2
The backup tunnel is RA1-RA2-RB2-RB1
The links are 155Mbps between RA1-RA2 and RB1-RB2 and Giga between RA1-RA2
and RB1-RB2
Before failing the link RA1-RA2, we checked the results of Cisco MPLS
commands to confirm that the main tunnel requested protection and that
protection was ready on both sides, RA1 and RB1. We also confirmed using
Cisco commands that after failing the main link FRR protected the link
using the second link. We also used traceroute from a Linux box before and
after the failure.
We avoid debug commands like "debug mpls lfib fast-reroute events" to
check the time of failure recovery because we are interested in measure
when recovery end-to-end IP communication is recognized by the source and
destination application.
First tool is:
Generates 1000 packets per seconds (1 packet each milisecond).
Count the packets lost during FRR process.
Multiply the number of lost packets by 1msec.
This give us an aproximately time for FRR.
Second tools:
The second tool is not really a tool, it is a timer of an application that
uses the network for IP connectivity. This automatic timer is 200msec and
counts down when there is no IP communication between the source
application and the destination application (The application is Nokia
MSS/MGW solution). If FRR takes more than 200msec to recovery the IP
communication, there is an alarm logged on the system indicating that SCTP
communications failed.
Comparing the results:
These were the result using the first tool:
70msec
481msec
411msec
371msec
Each time the first tool indicated that FRR took more than 200msec we
checked the alarms on the second tool. It showed that if the first tools
indicated less than 200msec there was no alarm. If the first tool
indicated more than 200msec there was an alarm on the second tools.
This confirmed us that the approximated time indicated by the first tool
is enough correctly for our purpose.
Nevertheless, as it is different from the documentations that states
50msec for the operation of FRR, we would like to double check the result.
We used two ways to fail the link:
-disconnect the POS fiber manually
-shutdown the POS interface (using the command POS ais-shut)
As the time using manual shutdown increased to 800msec, we discarded the
test using this way of failing the link and just use the first way,
manually disconnecting the fiber.
Your comments and recommendations are more than welcome.
I am wondering if there is any timer on Cisco that can be configured, as
"carrier-delay ms x", to improve FRR time. However, as opposite of this
command that introduces some delay, I would like to improve the time of
FRR is possible and test again.
Happy New Year.
Cordially,
------------------------------------------------------------------
Alaerte Gladston Vidali
IBM Global Services - SO
Tel.55+11+2121-2879 Fax:55+11+2121-2449
More information about the cisco-nsp
mailing list