[j-nsp] Failure detection on Ge interfaces

Josef Buchsteiner josefb at juniper.net
Thu Oct 6 10:31:41 EDT 2005


Filippo,


        below are some measurements with time-stamps. Yes traps and
        interface status via the cli is not interrupt driven however
        all the layers down where it is most important are.

        From here you can see the the time from link-down detection
        and PFE local-repair is 20msec.

        hope this helps
        Josef



        1)16:01:33.716188 interrupt link down is invoked
        2)16:01:33.716341 pic driver does process the linkstate change
        3)16:01:33.716540 chassis manager send the flap to the kernel
        4)16:01:33.736    pfe received from the kernel to destroy route
                          
          (This is already the point where local repair happen on FRR)
        
        5)16:01:33.752958 The direct route will be moved into reject mode
        6)16:01:33.763124 all is finished.
        
        
        All the rest is with sending the sending the trap to mib2d which
        in turns get it to snmp daemon which then send the trap to the client
        is really not interrupt driven.
        
        


<-- on the FEB ( time-zones are a bit screwed but thats a minor )

[Oct  6 14:01:33.716 LOG: Info] MAC: Link-down, time 15:01:33.716188, ge_isr
[Oct  6 14:01:33.716 LOG: Info] MAC: Link-down, time 15:01:33.716341, ge_gem_process_linkstate
[Oct  6 14:01:33.716 LOG: Debug] IFD: ge-0/1/0, activated alarm 0x00000001
[Oct  6 14:01:33.716 LOG: Debug] IFD: ge-0/1/0, sent flap msg to RE, Down state

<-- PFE now gets already the information fom the kernel to destroy
<-- the interface information

[Oct  6 14:01:33.736 LOG: Debug] topo_disconnect:[87.0.0.1] from [nh(Local,351)]
[Oct  6 14:01:33.736 LOG: Debug] topo_action:<disconnect> [87.0.0.1]
[Oct  6 14:01:33.736 LOG: Debug] topo_action:Done..
[Oct  6 14:01:33.736 LOG: Debug] topo_disconnect:Done. Status = 0
[Oct  6 14:01:33.736 LOG: Debug] topo_connect:[87.0.0.1] . Connecting to [nh(Reject,14)]
[Oct  6 14:01:33.736 LOG: Debug] topo_action:<connect> [87.0.0.1]
[Oct  6 14:01:33.736 LOG: Debug] topo_action:Done..
[Oct  6 14:01:33.736 LOG: Debug] topo_connect:Done. Status = 0
[Oct  6 14:01:33.736 LOG: Debug] topo_destroy:[87.0.0.0/30](refs = 0)
[Oct  6 14:01:33.737 LOG: Debug] topo_destroy:[87.0.0.0](refs = 0)
[Oct  6 14:01:33.737 LOG: Debug] topo_destroy:[87.0.0.3](refs = 0)

<-- kernel get the interface changes from rpd

Oct  6 16:01:33.747459 task_process_events: recv ready for KRT Ifstate
Oct  6 16:01:33.747642 task_receive_packet: task KRT Ifstate socket 6 length 212
Oct  6 16:01:33.747702 KRT Ifstate: recv len 212 v63 seq 149 CHANGE ifdev devindex 132 flags 0xb 0xc001 gen 15
Oct  6 16:01:33.747882 EVENT <UpDown> ge-0/1/0.0 index 72 <Broadcast Multicast> address #0 0.5.85.dc.44.1f
Oct  6 16:01:33.750953 enqueue ifdest down -- kqp 0x871d000: op add queue low-add
Oct  6 16:01:33.751009 task_timer_reset: reset KRT_Route Install (Timeout)
Oct  6 16:01:33.751054 task_job_create_background: create prio 5 job Route Install for task KRT
Oct  6 16:01:33.751089 krt_q_schedule reset sleep, go for immediate job
Oct  6 16:01:33.751371 EVENT <UpDown> index 132 <Broadcast Multicast> address #0 0.5.85.dc.44.1f
Oct  6 16:01:33.751974 rt_flash_update_callback: flash KRT (inet.0) start
Oct  6 16:01:33.752059 enqueue krt_flash_rth -- kqp 0x871d050: op add queue high-add 87.0.0.1/32 -> {}
Oct  6 16:01:33.752120 task_timer_reset: reset KRT_Route Install (Timeout)
Oct  6 16:01:33.752163 task_job_delete: delete background job Route Install for task KRT
Oct  6 16:01:33.752209 task_job_create_background: create prio 3 job Route Install for task KRT
Oct  6 16:01:33.752243 krt_q_schedule reset sleep, go for immediate job
Oct  6 16:01:33.752291 rt_flash_update_callback: flash KRT (inet.0) done
Oct  6 16:01:33.752823 background dispatch running job Route Install for task KRT
Oct  6 16:01:33.752863 krt_q_run starting...

<-- the request to move the local route into reject mode

Oct  6 16:01:33.752958 KRT Request: send len 88 v63 seq 0 ADD route/user af 2 table 0 addr 87.0.0.1 nhop-type reject filtidx 0
Oct  6 16:01:33.760498 dequeue complete -- kqp 0x871d050: op add queue high-add 87.0.0.1/32 -> {}
Oct  6 16:01:33.760609 task_alloc: allocated task block for KRT IF Change priority 60
Oct  6 16:01:33.760765 task_get_socket: domain AF_Route  type SOCK_SEQPACKET  protocol 0  socket 33
Oct  6 16:01:33.760842 task_set_option_internal: task KRT IF Change socket 33 option RecvBuffer(0) value 524288
Oct  6 16:01:33.760893 task_set_option_internal: task KRT IF Change socket 33 option SendBuffer(1) value 524288
Oct  6 16:01:33.760944 task_set_option_internal: task KRT IF Change socket 33 option NonBlocking(8) value 1
Oct  6 16:01:33.760987 task_set_socket: task KRT IF Change socket 33
Oct  6 16:01:33.761049 task_create: KRT IF Change  socket 33
Oct  6 16:01:33.761125 KRT IF Change: send len 44 v59 seq 0 GET ifdest devindex 132 index 72  af 2 prefix 87.0.0.0 up
Oct  6 16:01:33.761219 task_receive_packet: task KRT IF Change socket 33 length 56
Oct  6 16:01:33.761287 KRT IF Change: recv len 56 v59 seq 0 GET ifdest devindex 132 index 72 ge-0/1/0 af 2 prefix 87.0.0.0 up
Oct  6 16:01:33.761346 KRT IF Change: send len 44 v59 seq 0 CHANGE ifdest devindex 132 index 72  af 2 prefix 87.0.0.0 down
Oct  6 16:01:33.762573 krt_if_change_state: processed 1 prefix(es) for 1 subunits of ifd 132 ifl 72
Oct  6 16:01:33.762665 task_delete: deleting task KRT IF Change
Oct  6 16:01:33.762706 task_close: close socket 33 task KRT IF Change
Oct  6 16:01:33.762817 task_reset_socket: task KRT IF Change socket 33
Oct  6 16:01:33.762899 task_job_delete_task: deleting all jobs for task KRT IF Change
Oct  6 16:01:33.762941 task_job_deleted_task: no jobs found for task KRT IF Change


Oct  6 16:01:33.763124 dequeue complete -- kqp 0x871d000: op add queue low-add
Oct  6 16:01:33.763192 krt_q_run end success 2 routes
Oct  6 16:01:33.763243 task_job_delete: delete background job Route Install for task KRT
Oct  6 16:01:33.764121 task_timer_reset: reset KRT_Route Install (Timeout)
Oct  6 16:01:33.764203 krt_q_schedule cancel sleep
Oct  6 16:01:33.764244 background dispatch completed job Route Install for task KRT
Oct  6 16:01:33.764578 task_process_events: recv ready for KRT Ifstate    

Tuesday, October 4, 2005, 3:37:30 PM, you wrote:
FC>    
FC>    
FC> Josef,
FC>  we measured the loss of light spilling part of the signal and using an
FC>  external photodiode. At the same time we collected the snmp messages sent by
FC>  the router (and also polling with the XML Junoscript) referring to the
FC>  operative status (up/down) of the GE interface.  Results show that M10
FC>  (Junos 7.1) checks the operative status of the GE interface once a second.
FC>  The measured delay has been confirmed evaluating the performance of the mpls
FC>  recovery solutions.
FC>  I can send you a more detailed document if you want.
FC>  Thank you for your effort
FC>     Filippo
FC>  
FC>  
FC>  
FC>  
FC>  
FC>  ----- Original Message -----
FC>  From: "Josef Buchsteiner" <josefb at juniper.net>
FC>  To: "Filippo Cugini" <filippo.cugini at cnit.it>
FC>  Cc: <juniper-nsp at puck.nether.net>
FC>  Sent: Tuesday, October 04, 2005 2:37 PM
FC>  Subject: Re: [j-nsp] Failure detection on Ge interfaces
FC>  
FC>  
 >>    Filippo,
 >>
 >>             [ last question I promise]
 >>
 >>             Can   you   illustrate   to  me  how  you  performed  the
 >>             check/measurement that the link down is not interrupt
 >>             driven?
 >>
 >>             The reason I'm interested in this one is that we recently
 >>             fixed a problem on the 1 or 2port GE SFP PIC or the IQ GE
 >>             PIC where the link down event was done in polling cycles
 >>             which has been fixed in PR62682 and is available in
 >>             versions 7.3R3 7.2R3 7.3R2 7.1R4 and higher.
 >>
 >>
 >>
 >>            thanks
 >>            josef
 >>
 >> Monday, October 3, 2005, 5:25:28 PM, you wrote:
 >> FC>
 >> FC>
 >> FC> Josef,
 >> FC>  we have PE-1GE-LX-B cards (1 port Gigabit Ethernet PIC, LX Optics
 >> (1310nm,
 >> FC>  single mode fiber) with PIC ejector)
 >> FC>  Thanx
 >> FC>  Filippo
 >> FC>
 >> FC>
 >> FC>
 >> FC>  ----- Original Message -----
 >> FC>  From: "Josef Buchsteiner" <josefb at juniper.net>
 >> FC>  To: "Filippo Cugini" <filippo.cugini at cnit.it>
 >> FC>  Cc: <juniper-nsp at puck.nether.net>
 >> FC>  Sent: Monday, October 03, 2005 4:57 PM
 >> FC>  Subject: Re: [j-nsp] Failure detection on Ge interfaces
 >> FC>
 >> FC>
 >> >> Filippo,
 >> >> is the a PIC with SFP connectors ?
 >> >> thanks
 >> >> Josef
 >> >>
 >> >> Monday, October 3, 2005, 2:38:24 PM, you wrote:
 >> >>
 >> >> FC>
 >> >> FC>
 >> >> FC> Hi all,
 >> >> FC> we have M10 with GE LX optical interfaces running ospf and mpls
 >> >> (junos 7.1).
 >> >> FC> We evaluated the M10 behavior in case of failure (fiber cut)
 >> >> FC> We have noticed that routers check the operative status of the GE
 >> >> interface
 >> >> FC> once a second, thus determining a significant failure detection
 >> >> time
 >> >> (from
 >> >> FC> 0,1 to 1 sec).
 >> >> FC> Is it possible to force the router to check the status more
 >> >> frequently?
 >> >> FC> Thanx in advance
 >> >> FC> Filippo
 >> FC>
 >> FC>
 >> FC>
 >>
 >>
FC>   
FC>   
FC>    
 


More information about the juniper-nsp mailing list