[c-nsp] FRR Recovery Time

Mon Jan 23 20:39:26 EST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

gladston at br.ibm.com wrote:
> Thanks a lot Oliver,
> 
>>From what I read on Cisco pages, only these alarms trigger the line 
> protocol down:
> -Section loss of signal                         SLOS
> -Section loss of frame                  SLOF
> -Line alarm indication signal   LAIS 
>  
> And these alarm do not trigger the line protocol down by default: (but can 
> do if configuring 'pos delay trigger path...') 
> -Path alarm indication signal   PAIS
> -Path remote defect indication  PRDI
> -Path loss of pointer 
> 
> We are only receiving LRDI alarms on RB1. I could not find reference for 
> this alarm. Do you know about this specific alarm, if it should trigger 
> the line protocol down or if there is a specific command to do so?
> 
> Cordially,
> ------------------------------------------------------------------
> Alaerte Gladston Vidali
> IBM Global Services - SO
> Tel.55+11+2121-2879   Fax:55+11+2121-2449
> 
> 
> 
> 
> "Oliver Boehmer \(oboehmer\)" <oboehmer at cisco.com> 
> 23-01-2006 09:25
> 
> To
> Alaerte Gladston Vidali/Brazil/IBM at IBMBR
> cc
> <cisco-nsp at puck.nether.net>
> Subject
> RE: [c-nsp] FRR Recovery Time
> 
> 
> 
> 
> 
> 
> gladston at br.ibm.com <mailto:gladston at br.ibm.com> wrote on Sunday,
> January 22, 2006 9:59 PM:
> 
>> Sorry it took so long to get the requested information. It was
>> necessary to wait for a maintenance window.
>>
>> There is a new information. We used RSVP hellos to detect the
>> failure, and this test revealed that FRR is working pretty well. The
>> problem is that without RSVP hellos the POS alarms are not enough
>> deactivate the remote interface on the remote end router to have
>> bidirectional communication recovered.
>>
>> The failure is simulated disconnecting the fiber on the POS interface
>> of RA1.
>>
>> I am wondering if the Carrier has some configuration that does not
>> let the POS alarms arrive at RB1 when the fiber on RA1 is
>> disconnected.  Any feedback concerned to this is really appreciated.
> 
> Well, if your carrier can't deliver any remote alarms to RB1, this is
> bad (and is actually *very* strange) and will prevent you from doing
> FRR. Did you collect "show controller pos" while the link was down?
> 
> Can you try
> 
> int pos x/x
> carrier-delay ms 0
> pos ais-shut
> pos delay triggers line 100
> pos delay triggers path 100
> pos report lrdi
> pos report lais
> pos report prdi
> pos report pais
> pos report slos
> pos report slof
> 
> on your POS link to see if it makes a difference? If your POS link is
> unprotected, you could also tune down delay/line triggers to zero.
> 
> In case it doesn't work out, the best you can do in this case is to tune
> your OSPF so RB1 will be notified about the failure at RA1 using LSA
> updates. This can be done in sub-seconds as well by tuning down
> carrier-delay to 0 sec on your link and tuning down OSPF's exp-backoff
> timer for SPF/LSA updates, for example:
> 
> router ospf 1
> timers throttle spf 50 20 5000
> timers throttle lsa all 0 20 5000
> timers lsa arrival 15
> timers pacing flood 15
> 

Do you know what the underlying transport is from the provider, i.e. a real
SONET connection, DWDM, something else?

The situation you are describing sounds a lot like a "back to back" fiber
connection or router-to-router over DWDM.  In that situation LRDI may be
masking PRDI when you disconnect the remote end.  Here is a writeup that
explains the situation and some alternatives:

"One interesting situation is when S-LOS is seen on R1, as R2 will see both
L-RDI and P-RDI as R1 is both LTE and PTE. Since L-RDI explicitly disallows
any resulting action to be taken upon receipt, R2 will not drop its
interface as a result. This can potentially lead to a situation where R1?s
interface is down, however R2?s interface is still up and forwarding. Of
course any Layer 2 keepalive (like HDLC provides) will time out and declare
the link down, typically in 30 seconds depending on the configured timers.
However since a number of operators disable these Layer 2 keepalives, that
may not prevent this  situation. What can be done to address this? There
are several approaches that can be taken, each addressing this from a
different perspective:

? Turn on Path Triggers?Since P-RDI will bring an interface down with path
triggers enabled, this can be used to cause a quick response and will drop
the interface. The interesting twist is that L-RDI will mask out the P-RDI
under normal operation per GR-253. Since the pos triggers are handled at
the defect level, they are processed prior to the alarm masking and the
interface will still drop according to the configured delay time.

? Enable Layer 2 Keepalives?This option will cause the interface on R2
to time out after 3 keepalives are missed. This is typically 30 seconds
total (3x10) and not generally recommended as a tool for fast link
convergence tuning.

? Enable a Link-State Routing Protocol?When the interface on R1 is brought
down due to the S-LOS, a link state message will be sent immediately. Even
though the interface on R2 may still be up, when the link state message is
received throughout the area SPF will be run and the link will be removed
from the topology as it fails the two-way connectivity check. This will
prevent the network from trying to route through that simplex scenario."

Bidirectional Forwarding Detection (BFD) could also be an option here if it
is supported in the version(s) you are running.

- --
=========
bep

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFD1YVOE1XcgMgrtyYRAiWoAJoCsaghhAsyk5cdVGOwyd70cr+GwACg0pgp
fHp00/4Y1kitdIwG9T8Ko9g=
=bIJ8
-----END PGP SIGNATURE-----