[j-nsp] Failure detection on Ge interfaces

Filippo Cugini filippo.cugini at cnit.it
Thu Oct 6 12:46:50 EDT 2005


Hi Josef,
thanks for your effort. A couple of question:
1. is there any delay between the loss of light due to a failure (e.g. fiber 
cut) and the invocation of the interrupt link down?

2. (and most important) could you please check also the internal behavior of 
mpls recovery mechanisms?
(e.g., failure of a GE link that carries just one lsp configured with a 
standby secondary path)
In my tests they start after a delay which has the same distribution (from 
0,1 to 1sec!)

Thanx
   Filippo



----- Original Message ----- 
From: "Josef Buchsteiner" <josefb at juniper.net>
To: "Filippo Cugini" <filippo.cugini at cnit.it>
Cc: <juniper-nsp at puck.nether.net>
Sent: Thursday, October 06, 2005 4:31 PM
Subject: Re: [j-nsp] Failure detection on Ge interfaces


> Filippo,
>
>
>        below are some measurements with time-stamps. Yes traps and
>        interface status via the cli is not interrupt driven however
>        all the layers down where it is most important are.
>
>        From here you can see the the time from link-down detection
>        and PFE local-repair is 20msec.
>
>        hope this helps
>        Josef
>
>
>
>        1)16:01:33.716188 interrupt link down is invoked
>        2)16:01:33.716341 pic driver does process the linkstate change
>        3)16:01:33.716540 chassis manager send the flap to the kernel
>        4)16:01:33.736    pfe received from the kernel to destroy route
>
>          (This is already the point where local repair happen on FRR)
>
>        5)16:01:33.752958 The direct route will be moved into reject mode
>        6)16:01:33.763124 all is finished.
>
>
>        All the rest is with sending the sending the trap to mib2d which
>        in turns get it to snmp daemon which then send the trap to the 
> client
>        is really not interrupt driven.
>
>
>
>
> <-- on the FEB ( time-zones are a bit screwed but thats a minor )
>
> [Oct  6 14:01:33.716 LOG: Info] MAC: Link-down, time 15:01:33.716188, 
> ge_isr
> [Oct  6 14:01:33.716 LOG: Info] MAC: Link-down, time 15:01:33.716341, 
> ge_gem_process_linkstate
> [Oct  6 14:01:33.716 LOG: Debug] IFD: ge-0/1/0, activated alarm 0x00000001
> [Oct  6 14:01:33.716 LOG: Debug] IFD: ge-0/1/0, sent flap msg to RE, Down 
> state
>
> <-- PFE now gets already the information fom the kernel to destroy
> <-- the interface information
>
> [Oct  6 14:01:33.736 LOG: Debug] topo_disconnect:[87.0.0.1] from 
> [nh(Local,351)]
> [Oct  6 14:01:33.736 LOG: Debug] topo_action:<disconnect> [87.0.0.1]
> [Oct  6 14:01:33.736 LOG: Debug] topo_action:Done..
> [Oct  6 14:01:33.736 LOG: Debug] topo_disconnect:Done. Status = 0
> [Oct  6 14:01:33.736 LOG: Debug] topo_connect:[87.0.0.1] . Connecting to 
> [nh(Reject,14)]
> [Oct  6 14:01:33.736 LOG: Debug] topo_action:<connect> [87.0.0.1]
> [Oct  6 14:01:33.736 LOG: Debug] topo_action:Done..
> [Oct  6 14:01:33.736 LOG: Debug] topo_connect:Done. Status = 0
> [Oct  6 14:01:33.736 LOG: Debug] topo_destroy:[87.0.0.0/30](refs = 0)
> [Oct  6 14:01:33.737 LOG: Debug] topo_destroy:[87.0.0.0](refs = 0)
> [Oct  6 14:01:33.737 LOG: Debug] topo_destroy:[87.0.0.3](refs = 0)
>
> <-- kernel get the interface changes from rpd
>
> Oct  6 16:01:33.747459 task_process_events: recv ready for KRT Ifstate
> Oct  6 16:01:33.747642 task_receive_packet: task KRT Ifstate socket 6 
> length 212
> Oct  6 16:01:33.747702 KRT Ifstate: recv len 212 v63 seq 149 CHANGE ifdev 
> devindex 132 flags 0xb 0xc001 gen 15
> Oct  6 16:01:33.747882 EVENT <UpDown> ge-0/1/0.0 index 72 <Broadcast 
> Multicast> address #0 0.5.85.dc.44.1f
> Oct  6 16:01:33.750953 enqueue ifdest down -- kqp 0x871d000: op add queue 
> low-add
> Oct  6 16:01:33.751009 task_timer_reset: reset KRT_Route Install (Timeout)
> Oct  6 16:01:33.751054 task_job_create_background: create prio 5 job Route 
> Install for task KRT
> Oct  6 16:01:33.751089 krt_q_schedule reset sleep, go for immediate job
> Oct  6 16:01:33.751371 EVENT <UpDown> index 132 <Broadcast Multicast> 
> address #0 0.5.85.dc.44.1f
> Oct  6 16:01:33.751974 rt_flash_update_callback: flash KRT (inet.0) start
> Oct  6 16:01:33.752059 enqueue krt_flash_rth -- kqp 0x871d050: op add 
> queue high-add 87.0.0.1/32 -> {}
> Oct  6 16:01:33.752120 task_timer_reset: reset KRT_Route Install (Timeout)
> Oct  6 16:01:33.752163 task_job_delete: delete background job Route 
> Install for task KRT
> Oct  6 16:01:33.752209 task_job_create_background: create prio 3 job Route 
> Install for task KRT
> Oct  6 16:01:33.752243 krt_q_schedule reset sleep, go for immediate job
> Oct  6 16:01:33.752291 rt_flash_update_callback: flash KRT (inet.0) done
> Oct  6 16:01:33.752823 background dispatch running job Route Install for 
> task KRT
> Oct  6 16:01:33.752863 krt_q_run starting...
>
> <-- the request to move the local route into reject mode
>
> Oct  6 16:01:33.752958 KRT Request: send len 88 v63 seq 0 ADD route/user 
> af 2 table 0 addr 87.0.0.1 nhop-type reject filtidx 0
> Oct  6 16:01:33.760498 dequeue complete -- kqp 0x871d050: op add queue 
> high-add 87.0.0.1/32 -> {}
> Oct  6 16:01:33.760609 task_alloc: allocated task block for KRT IF Change 
> priority 60
> Oct  6 16:01:33.760765 task_get_socket: domain AF_Route  type 
> SOCK_SEQPACKET  protocol 0  socket 33
> Oct  6 16:01:33.760842 task_set_option_internal: task KRT IF Change socket 
> 33 option RecvBuffer(0) value 524288
> Oct  6 16:01:33.760893 task_set_option_internal: task KRT IF Change socket 
> 33 option SendBuffer(1) value 524288
> Oct  6 16:01:33.760944 task_set_option_internal: task KRT IF Change socket 
> 33 option NonBlocking(8) value 1
> Oct  6 16:01:33.760987 task_set_socket: task KRT IF Change socket 33
> Oct  6 16:01:33.761049 task_create: KRT IF Change  socket 33
> Oct  6 16:01:33.761125 KRT IF Change: send len 44 v59 seq 0 GET ifdest 
> devindex 132 index 72  af 2 prefix 87.0.0.0 up
> Oct  6 16:01:33.761219 task_receive_packet: task KRT IF Change socket 33 
> length 56
> Oct  6 16:01:33.761287 KRT IF Change: recv len 56 v59 seq 0 GET ifdest 
> devindex 132 index 72 ge-0/1/0 af 2 prefix 87.0.0.0 up
> Oct  6 16:01:33.761346 KRT IF Change: send len 44 v59 seq 0 CHANGE ifdest 
> devindex 132 index 72  af 2 prefix 87.0.0.0 down
> Oct  6 16:01:33.762573 krt_if_change_state: processed 1 prefix(es) for 1 
> subunits of ifd 132 ifl 72
> Oct  6 16:01:33.762665 task_delete: deleting task KRT IF Change
> Oct  6 16:01:33.762706 task_close: close socket 33 task KRT IF Change
> Oct  6 16:01:33.762817 task_reset_socket: task KRT IF Change socket 33
> Oct  6 16:01:33.762899 task_job_delete_task: deleting all jobs for task 
> KRT IF Change
> Oct  6 16:01:33.762941 task_job_deleted_task: no jobs found for task KRT 
> IF Change
>
>
> Oct  6 16:01:33.763124 dequeue complete -- kqp 0x871d000: op add queue 
> low-add
> Oct  6 16:01:33.763192 krt_q_run end success 2 routes
> Oct  6 16:01:33.763243 task_job_delete: delete background job Route 
> Install for task KRT
> Oct  6 16:01:33.764121 task_timer_reset: reset KRT_Route Install (Timeout)
> Oct  6 16:01:33.764203 krt_q_schedule cancel sleep
> Oct  6 16:01:33.764244 background dispatch completed job Route Install for 
> task KRT
> Oct  6 16:01:33.764578 task_process_events: recv ready for KRT Ifstate
>
> Tuesday, October 4, 2005, 3:37:30 PM, you wrote:
> FC>
> FC>
> FC> Josef,
> FC>  we measured the loss of light spilling part of the signal and using 
> an
> FC>  external photodiode. At the same time we collected the snmp messages 
> sent by
> FC>  the router (and also polling with the XML Junoscript) referring to 
> the
> FC>  operative status (up/down) of the GE interface. Results show that M10
> FC>  (Junos 7.1) checks the operative status of the GE interface once a 
> second.
> FC>  The measured delay has been confirmed evaluating the performance of 
> the mpls
> FC>  recovery solutions.
> FC>  I can send you a more detailed document if you want.
> FC>  Thank you for your effort
> FC>  Filippo
> FC>
> FC>
> FC>
> FC>
> FC>
> FC>  ----- Original Message -----
> FC>  From: "Josef Buchsteiner" <josefb at juniper.net>
> FC>  To: "Filippo Cugini" <filippo.cugini at cnit.it>
> FC>  Cc: <juniper-nsp at puck.nether.net>
> FC>  Sent: Tuesday, October 04, 2005 2:37 PM
> FC>  Subject: Re: [j-nsp] Failure detection on Ge interfaces
> FC>
> FC>
> >> Filippo,
> >>
> >> [ last question I promise]
> >>
> >> Can you illustrate to me how you performed the
> >> check/measurement that the link down is not interrupt
> >> driven?
> >>
> >> The reason I'm interested in this one is that we recently
> >> fixed a problem on the 1 or 2port GE SFP PIC or the IQ GE
> >> PIC where the link down event was done in polling cycles
> >> which has been fixed in PR62682 and is available in
> >> versions 7.3R3 7.2R3 7.3R2 7.1R4 and higher.
> >>
> >>
> >>
> >> thanks
> >> josef
> >>
> >> Monday, October 3, 2005, 5:25:28 PM, you wrote:
> >> FC>
> >> FC>
> >> FC> Josef,
> >> FC> we have PE-1GE-LX-B cards (1 port Gigabit Ethernet PIC, LX Optics
> >> (1310nm,
> >> FC> single mode fiber) with PIC ejector)
> >> FC> Thanx
> >> FC> Filippo
> >> FC>
> >> FC>
> >> FC>
> >> FC> ----- Original Message -----
> >> FC> From: "Josef Buchsteiner" <josefb at juniper.net>
> >> FC> To: "Filippo Cugini" <filippo.cugini at cnit.it>
> >> FC> Cc: <juniper-nsp at puck.nether.net>
> >> FC> Sent: Monday, October 03, 2005 4:57 PM
> >> FC> Subject: Re: [j-nsp] Failure detection on Ge interfaces
> >> FC>
> >> FC>
> >> >> Filippo,
> >> >> is the a PIC with SFP connectors ?
> >> >> thanks
> >> >> Josef
> >> >>
> >> >> Monday, October 3, 2005, 2:38:24 PM, you wrote:
> >> >>
> >> >> FC>
> >> >> FC>
> >> >> FC> Hi all,
> >> >> FC> we have M10 with GE LX optical interfaces running ospf and mpls
> >> >> (junos 7.1).
> >> >> FC> We evaluated the M10 behavior in case of failure (fiber cut)
> >> >> FC> We have noticed that routers check the operative status of the 
> >> >> GE
> >> >> interface
> >> >> FC> once a second, thus determining a significant failure detection
> >> >> time
> >> >> (from
> >> >> FC> 0,1 to 1 sec).
> >> >> FC> Is it possible to force the router to check the status more
> >> >> frequently?
> >> >> FC> Thanx in advance
> >> >> FC> Filippo
> >> FC>
> >> FC>
> >> FC>
> >>
> >>
> FC>
> FC>
> FC>
>
> 



More information about the juniper-nsp mailing list