[j-nsp] M160/JunOS 7.6R1.10: ae0 fails to install L2 descriptor

Sat Dec 30 11:26:36 EST 2006

Friday, December 29, 2006, 7:53:55 PM, you wrote:
pb>    
pb>    
pb> Hello Josef,
pb>  
pb>  thank you very much for your detailed reply.
pb>  
 >>pb>  I have a couple of questions:
 >>pb>  a) Is it normal to have 32k L2 Descriptors for 8.2k Next-Hop Entries?
 >>
 >>      yes..  since  this is ethernet and the layer2 header size is
 >>      big for ethernet and you most likely have all three links on
 >>      one FPC. i.e 3 times more resources.
pb>  
pb>  I had one PIC on fpc0 and 1 PIC on fpc1 (both non-e fpc1). I then added a
pb>  second
pb>  PIC to fpc1.
pb>  I'll insert a spare fpc1 and retry with all links spread out.
pb>  
 >>pb>  b) Is there a way to increase the number of available L2 Descriptors?
 >>(how
 >>pb>  many L2 Descriptors does a SFM-16 support?)
 >>
 >>      this has nothing to do with the SFM.. Enhanced FPC will have
 >>      about 160K space... its all about memory
pb>  
pb>  Oh okay, thank you. Is there a linear correlation between L2 Descriptors and
pb>  Next-Hop
pb>  Entries?

     if the L2 data size is the same yes.

pb>  Given that I move the third (and maybe someday fourth) link to separate
pb>  non-e fpcs, and the usage on other interfaces stays about the same, can I
pb>  calculate that we'll be
pb>  able to support about 50k / (32k / 8.2k) = 12.8k Next-Hops over that ae
pb>  interface before we have to move the individual ae-links over to fpc-es?
pb>

     you  need  a  next-hop  on ethernet all the time you want to
     deliver to a destination. If the L2 portion is different you
     will  then  created  a  different next-hop.We also consume a
     next-hop  called  resolve next-hop for each ethernet segment
     however this does not consume L2 Descriptor space. So if you
     have   3000   vlans   you   would  have  also  3000  resolve
     next-hops.Since  Ethernet  is point to multipoint a next-hop
     is created

           o  for every arp entry a unicat next-hop
           o  each  multicast group has a different next-hop since
              the  ethernet  destination address is mapped to the
              mcast group
           o  Junos  does  also treat a mpls label as next-hop so
              we   would  generate  a  next-hop  if  you  have  a
              different label.

     For ethernet you need 3 chunks of L2 Descriptor and for vlan
     you  need  4 chunks which is all in words. The reason why we
     need 4 for vlans is simply the L2 data portion is bigger due
     to  the  vlan header. If you have also mpls labels you would
     need to calculate 5 chunks.

     this how you  can easily calculate and make your math.

     FPC      has 52252 L2 Descriptor /4 is 13063 vlan arp entries aka
              next-hops per FPC. For non-vlan it would be 16750 entries.

     E-FPC has 162891 L2 Descriptor /4 is 40722 vlan arp entries.

     Enhanced Plus FPC or  M10i/M7i  has  362571  L2
                           Descriptors.   There  is  an  upper
                           boundary of 61183 next-hops per FPC

 >>pb>  c) Is there a way to make the router fail with less impact to the
 >>network
 >>pb>  (for example simply shutting down the new interface automatically
 >>instead of
 >>pb>  refusing to update the next-hop table until the interface is taken
 >>down and
 >>pb>  all sfms are restarted manually)
 >>
 >>      there  would  have been never a need to restart any SFM. all
 >>      you  would  have  need  to do is to deactivate the aggregate
 >>      interface and enable it again without the third member link.
pb>  
pb>  I tried removing the one new link but I still got
pb>  Dec 29 03:58:12  ham-cr2-re1 /kernel: ae_link_op: link ge-1/3/0.2 (lidx=2)
pb>  detached from bundle ae0.2
pb>  Dec 29 03:59:36  ham-cr2-re1 /kernel: RT_PFE: NH IPC op 31 (CHANGE AGGREGATE
pb>  NEXTHOP) failed, err 5 (Invalid)
pb>  
pb>  in the logfiles. This is when I decided to restart the SFMs.

     the  upper  layer still thinks the next-hop is installed and
     requests  to the PFE to remove this entry however the PFE is
     complaining  that  it  does  not have such an entry therefor
     removing will also yield to an error entry.

pb>  Taking the entire aggregate interface down amounts to more impact to the
pb>  network
pb>  imho (at least in our setup). With SFMs restarting I've got a couple seconds
pb>  of little
pb>  packetloss, while a deactivated ae0 would mean x bouncing BGP sessions and
pb>  traffic
pb>  stopping completely for a short amount of time.

     thats  ok...  the issues is not always straight forward what
     the  best  thing is to do due to the dynamic of the network.

pb>  
 >>      In  fact you know only that you run out of resource once you
 >>      try to program it in hardware and the only way is to refuse
 >>      it. There is no real good way to know upfront. I believe the
 >>      RSMON  feature  is  able  to monitor such resources and will
 >>      send  an  alarm  if  configured  once  you reached a certain
 >>      threshold so you know that you are moving to the limits.
pb>  
pb>  That makes sense.
pb>  I had actually looked at the nhdb stats beforehand but didn't consider 20k
pb>  free entries
pb>  to be of concern.

     aggregation  links  can  consume  quit some resources as you
     have noticed but you can now make the math how safe it is to
     add another link to the bundle.

     thanks
     Josef

pb>  
pb>  Best Regards, Peter
pb>  
pb> 
pb> _________________________________________________________________
pb>  FREE pop-up blocking with the new MSN Toolbar - get it now!
pb>  http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
pb>   
pb>   
pb>