[f-nsp] OSPF and BGP flapping when enabling a certain amount of BGP neighbors

Frank Menzel menzel at sipgate.de
Mon Jun 25 11:29:11 EDT 2018


Disabling the ICMP redirects looks absolutely promising, we found 
metrics for that and the amount of redirects during our time of testing 
was significant. I'll definitely give that a try. Stay tuned, I'll 
report back.

Thanks for replying!

On 06/22/2018 07:50 PM, Eldon Koyle wrote:
> I'll second Dennis.  Disabling icmp redirects is extremely important if 
> you have multiple addresses on a single interface.
> 
> If you have a lot of routes, you may need to change your system-max 
> values.  Run 'show default values' and look for ip-route and ip-cache 
> values (and ipv6- equivalents).  The defaults are usually quite low 
> (290k routes on our CER2024F's, this needs to fit your entire FIB).  
> Change with 'system-max <parameter> <value>', write mem, then reload.  
> On the MLX, you also have to worry about cam partitioning profiles.  The 
> CER2024F may be able to handle 1.5M routes in the BGP RIB, but it has a 
> HW max of 524288 in the FIB.
> 
> I have also seen a lot of lp-cpu usage caused by multicast traffic, 
> especially with older code.
> 
> If you see high lp cpu again in the future, you can run 'dm pstat' a few 
> times to try to get an idea of what kind of traffic you are receiving.  
> The first run is typically a throwaway, as it shows counts since the 
> last run.  It gives per-PP stats, but I think the CERs only have one PP 
> anyway.  If you are feeling brave, you can use 'rconsole' to connect to 
> the LP and play with 'debug packet capture' (captures/displays packets 
> that are hitting the lp cpu), but beware... I have had devices 
> unexpectedly reboot playing with that.  Always specify a limit.
> 
> -- 
> Eldon
> 
> On Fri, Jun 22, 2018 at 10:06 AM, Dennis op de Weegh <info at bitency.nl 
> <mailto:info at bitency.nl>> wrote:
> 
>     Can you post your confg?
> 
>     LP load looks high.
>     Try to disable icmp redirect in config:
> 
>     no ip icmp redirect
> 
>     It's a Brocade thing...
> 
> 
> 
>     Kind regards/Met vriendelijke groet,
> 
>     Dennis op de Weegh
> 
> 
> 
>     Bitency
>     Willem van Oranjestraat 9
>     4931NJ Geertruidenberg
> 
>     Kvk nummer: 20144338
>     BTW nummer: NL213538519B01
> 
>     W: www.bitency.nl <http://www.bitency.nl>
>     E: info at bitency.nl <mailto:info at bitency.nl>
>     T: +31 (0)162 714066
> 
> 
>     -----Oorspronkelijk bericht-----
>     Van: foundry-nsp <foundry-nsp-bounces at puck.nether.net
>     <mailto:foundry-nsp-bounces at puck.nether.net>> Namens Frank Menzel
>     Verzonden: vrijdag 22 juni 2018 17:57
>     Aan: foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>     Onderwerp: [f-nsp] OSPF and BGP flapping when enabling a certain
>     amount of BGP neighbors
> 
>     Hi,
> 
>        one of our CER2024F routers started to behave weird without
>     noticeable reason, we didn't apply any changes before:
> 
>     A while ago the device showed up in out monitoring with flapping
>     OSPF sessions caused by malformed packets and BGP sessions with
>     expired hold-timers. This made the device to become unresponsive, so
>     we disabled
>        most BGP sessions except the one to our transit partner and 4
>     iBPG sessions. This brought the device to an operational state again.
> 
>     In exchange we received a new identical device from our vendor and
>     applied a configuration backup of the former device, but it behaves
>     just like the old one when we took all sessions in service.
> 
>     To get an idea how many sessions are needed to cause issues we
>     carefully took sessions of small networks in service one by one
>     while observing cpu, memory usage and the number of routes
>     installed. No issues occured, so we took two big sessions in service
>     (DECIX route servers), again, nothing remarkable happened.
>     Encouraged by that we simultaneously took 10 sessions in service and
>     the ospf flapping started, so we disabled them and the device was
>     able to cope with its workload again.
>     To make sure we don't exceed the capabilities of the device we took
>     those sessions in service one by one with a delay of 10 seconds,
>     this did *not* cause OSPF flaps or BGP connections to restart, so we
>     decided to take the last 10 remaining sessions in service at once
>     again, which almost immediately caused OSPF flaps and BGP sessions
>     to restart.
>     Therefore we stopped all sessions we took in service before, except
>     the transit partner and 4 iBGP sessions, but the flapping continued,
>     the only way to get the CER to an operational state again was
>     reloading it with most of the BGP sessions disabled by default.
> 
>     However, we were able to drag some information from the device
>     during the last flapping, we didn't see a significant change in
>     memory usage, but the load increased dramatically:
> 
>     SSH at CER(config-bgp)#sho cpu-utilization
> 
>     00:09:57 GMT+01 Fri Jun 22 2018
> 
>     ... Usage average for all tasks in the last 1 seconds  ...
>     ==========================================================
>     Name                    us/sec          %
> 
>     idle                    0               0
>     con                     35              0
>     mon                     190             0
>     flash                   44              0
>     dbg                     39              0
>     boot                    70              0
>     main                    0               0
>     itc                     0               0
>     tmr                     4358            0
>     ip_rx                   26720           2
>     scp                     54              0
>     lpagent                 357             0
>     console                 324             0
>     vlan                    0               0
>     mac_mgr                 199             0
>     mrp                     241             0
>     vsrp                    0               0
>     erp                     239             0
>     mxrp                    127             0
>     snms                    0               0
>     rtm                     638             0
>     rtm6                    301             0
>     ip_tx                   11100           1
>     rip                     0               0
>     l2vpn                   0               0
>     mpls                    0               0
>     nht                     0               0
>     mpls_glue               0               0
>     pcep                    0               0
>     bgp                     212773          21
>     bgp_io                  240             0
>     ospf                    1005            0
>     ospf_r_calc             1193            0
>     isis                    260             0
>     isis_spf                0               0
>     mcast                   460             0
>     msdp                    23              0
>     vrrp                    0               0
>     ripng                   0               0
>     ospf6                   667             0
>     ospf6_rt                0               0
>     mcast6                  557             0
>     vrrp6                   0               0
>     bfd                     20              0
>     ipsec                   57              0
>     l4                      0               0
>     stp                     0               0
>     gvrp_mgr                0               0
>     snmp                    458             0
>     rmon                    25              0
>     web                     1573            0
>     lacp                    4199            0
>     dot1x                   0               0
>     dot1ag                  177             0
>     loop_detect             127             0
>     ccp                     12              0
>     cluster_mgr             131             0
>     hw_access               0               0
>     ntp                     22              0
>     openflow_ofm            15              0
>     openflow_opm            30              0
>     dhcp6                   0               0
>     sysmon                  0               0
>     ospf_msg_task           0               0
>     ssl                     0               0
>     http_client             0               0
>     lp                      723566          76
>     LP-I2C                  35              0
>     ssh_0                   84              0
>     ssh_1                   2140            0
>     ssh_2                   5072            0
>     ssh_3                   43              0
> 
>     The documentation states the device is able to handle 1.5 Mio routes
>     and we didn't get above this limit:
> 
>     SSH at CER(config-bgp)#show ip bgp route sum
>         Total number of BGP routes (NLRIs) Installed     : 1210135
>         Distinct BGP destination networks                : 697652
>         Filtered bgp routes for soft reconfig            : 394895
>         Routes originated by this router                 : 4
>         Routes selected as BEST routes                   : 410535
>         BEST routes not installed in IP forwarding table : 0
>         Unreachable routes (no IGP route for NEXTHOP)    : 0
>         IBGP routes selected as best routes              : 79640
>         EBGP routes selected as best routes              : 330891
> 
> 
>     SSH at CER(config-bgp)#show ip route sum
>     IP Routing Table - 410845 entries
>         8 connected, 11 static, 0 RIP, 294 OSPF, 410532 BGP, 0 ISIS
>         Number of prefixes:
>         /0: 1 /4: 1 /8: 16 /9: 11 /10: 36 /11: 99 /12: 291 /13: 565 /14:
>     1099
>     /15: 1924 /16: 13355 /17: 7910 /18: 13673 /19: 24926 /20: 38033 /21:
>     44870 /22: 86917 /23: 70274 /24: 106572 /25: 12 /26: 11 /27: 25 /28: 21
>     /29: 21 /30: 67 /32: 115
>     Nexthop Table Entry - 682 entries
> 
>     Can anybody give me some hint what could cause the behaviour
>     described above or what to investigate to tackle that issue?
> 
> 
>        --
>        Frank Menzel - menzel at sipgate.de <mailto:menzel at sipgate.de>
> 
>        sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
>        HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
>        Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
> 
>     http://www.sipgate.de - http://www.sipgate.co.uk
>     _______________________________________________
>     foundry-nsp mailing list
>     foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>     http://puck.nether.net/mailman/listinfo/foundry-nsp
>     <http://puck.nether.net/mailman/listinfo/foundry-nsp>
>     _______________________________________________
>     foundry-nsp mailing list
>     foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>     http://puck.nether.net/mailman/listinfo/foundry-nsp
>     <http://puck.nether.net/mailman/listinfo/foundry-nsp>
> 
> 

-- 
  Frank Menzel - menzel at sipgate.de
  Telefon: +49 (0)211-63 55 55-98
  Telefax: +49 (0)211-63 55 55-22

  sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
  HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
  Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391

http://www.sipgate.de - http://www.sipgate.co.uk


More information about the foundry-nsp mailing list