[f-nsp] OSPF and BGP flapping when enabling a certain amount of BGP neighbors

Thu Jun 28 04:15:14 EDT 2018

Finally, we gave the disabling of ICMP redirects a try and it worked 
like a charm!
I was asked about the metrics mentioned, we are using observium for this.
We could not set that within the global context, but in the interface:

interface ethernet 1/1
no ip redirect

Thank you so much!

On 06/25/2018 05:29 PM, Frank Menzel wrote:
> Disabling the ICMP redirects looks absolutely promising, we found 
> metrics for that and the amount of redirects during our time of testing 
> was significant. I'll definitely give that a try. Stay tuned, I'll 
> report back.
> 
> Thanks for replying!
> 
> On 06/22/2018 07:50 PM, Eldon Koyle wrote:
>> I'll second Dennis.  Disabling icmp redirects is extremely important 
>> if you have multiple addresses on a single interface.
>>
>> If you have a lot of routes, you may need to change your system-max 
>> values.  Run 'show default values' and look for ip-route and ip-cache 
>> values (and ipv6- equivalents).  The defaults are usually quite low 
>> (290k routes on our CER2024F's, this needs to fit your entire FIB). 
>> Change with 'system-max <parameter> <value>', write mem, then reload. 
>> On the MLX, you also have to worry about cam partitioning profiles.  
>> The CER2024F may be able to handle 1.5M routes in the BGP RIB, but it 
>> has a HW max of 524288 in the FIB.
>>
>> I have also seen a lot of lp-cpu usage caused by multicast traffic, 
>> especially with older code.
>>
>> If you see high lp cpu again in the future, you can run 'dm pstat' a 
>> few times to try to get an idea of what kind of traffic you are 
>> receiving. The first run is typically a throwaway, as it shows counts 
>> since the last run.  It gives per-PP stats, but I think the CERs only 
>> have one PP anyway.  If you are feeling brave, you can use 'rconsole' 
>> to connect to the LP and play with 'debug packet capture' 
>> (captures/displays packets that are hitting the lp cpu), but beware... 
>> I have had devices unexpectedly reboot playing with that.  Always 
>> specify a limit.
>>
>> -- 
>> Eldon
>>
>> On Fri, Jun 22, 2018 at 10:06 AM, Dennis op de Weegh <info at bitency.nl 
>> <mailto:info at bitency.nl>> wrote:
>>
>>     Can you post your confg?
>>
>>     LP load looks high.
>>     Try to disable icmp redirect in config:
>>
>>     no ip icmp redirect
>>
>>     It's a Brocade thing...
>>
>>
>>
>>     Kind regards/Met vriendelijke groet,
>>
>>     Dennis op de Weegh
>>
>>
>>
>>     Bitency
>>     Willem van Oranjestraat 9
>>     4931NJ Geertruidenberg
>>
>>     Kvk nummer: 20144338
>>     BTW nummer: NL213538519B01
>>
>>     W: www.bitency.nl <http://www.bitency.nl>
>>     E: info at bitency.nl <mailto:info at bitency.nl>
>>     T: +31 (0)162 714066
>>
>>
>>     -----Oorspronkelijk bericht-----
>>     Van: foundry-nsp <foundry-nsp-bounces at puck.nether.net
>>     <mailto:foundry-nsp-bounces at puck.nether.net>> Namens Frank Menzel
>>     Verzonden: vrijdag 22 juni 2018 17:57
>>     Aan: foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>>     Onderwerp: [f-nsp] OSPF and BGP flapping when enabling a certain
>>     amount of BGP neighbors
>>
>>     Hi,
>>
>>        one of our CER2024F routers started to behave weird without
>>     noticeable reason, we didn't apply any changes before:
>>
>>     A while ago the device showed up in out monitoring with flapping
>>     OSPF sessions caused by malformed packets and BGP sessions with
>>     expired hold-timers. This made the device to become unresponsive, so
>>     we disabled
>>        most BGP sessions except the one to our transit partner and 4
>>     iBPG sessions. This brought the device to an operational state again.
>>
>>     In exchange we received a new identical device from our vendor and
>>     applied a configuration backup of the former device, but it behaves
>>     just like the old one when we took all sessions in service.
>>
>>     To get an idea how many sessions are needed to cause issues we
>>     carefully took sessions of small networks in service one by one
>>     while observing cpu, memory usage and the number of routes
>>     installed. No issues occured, so we took two big sessions in service
>>     (DECIX route servers), again, nothing remarkable happened.
>>     Encouraged by that we simultaneously took 10 sessions in service and
>>     the ospf flapping started, so we disabled them and the device was
>>     able to cope with its workload again.
>>     To make sure we don't exceed the capabilities of the device we took
>>     those sessions in service one by one with a delay of 10 seconds,
>>     this did *not* cause OSPF flaps or BGP connections to restart, so we
>>     decided to take the last 10 remaining sessions in service at once
>>     again, which almost immediately caused OSPF flaps and BGP sessions
>>     to restart.
>>     Therefore we stopped all sessions we took in service before, except
>>     the transit partner and 4 iBGP sessions, but the flapping continued,
>>     the only way to get the CER to an operational state again was
>>     reloading it with most of the BGP sessions disabled by default.
>>
>>     However, we were able to drag some information from the device
>>     during the last flapping, we didn't see a significant change in
>>     memory usage, but the load increased dramatically:
>>
>>     SSH at CER(config-bgp)#sho cpu-utilization
>>
>>     00:09:57 GMT+01 Fri Jun 22 2018
>>
>>     ... Usage average for all tasks in the last 1 seconds  ...
>>     ==========================================================
>>     Name                    us/sec          %
>>
>>     idle                    0               0
>>     con                     35              0
>>     mon                     190             0
>>     flash                   44              0
>>     dbg                     39              0
>>     boot                    70              0
>>     main                    0               0
>>     itc                     0               0
>>     tmr                     4358            0
>>     ip_rx                   26720           2
>>     scp                     54              0
>>     lpagent                 357             0
>>     console                 324             0
>>     vlan                    0               0
>>     mac_mgr                 199             0
>>     mrp                     241             0
>>     vsrp                    0               0
>>     erp                     239             0
>>     mxrp                    127             0
>>     snms                    0               0
>>     rtm                     638             0
>>     rtm6                    301             0
>>     ip_tx                   11100           1
>>     rip                     0               0
>>     l2vpn                   0               0
>>     mpls                    0               0
>>     nht                     0               0
>>     mpls_glue               0               0
>>     pcep                    0               0
>>     bgp                     212773          21
>>     bgp_io                  240             0
>>     ospf                    1005            0
>>     ospf_r_calc             1193            0
>>     isis                    260             0
>>     isis_spf                0               0
>>     mcast                   460             0
>>     msdp                    23              0
>>     vrrp                    0               0
>>     ripng                   0               0
>>     ospf6                   667             0
>>     ospf6_rt                0               0
>>     mcast6                  557             0
>>     vrrp6                   0               0
>>     bfd                     20              0
>>     ipsec                   57              0
>>     l4                      0               0
>>     stp                     0               0
>>     gvrp_mgr                0               0
>>     snmp                    458             0
>>     rmon                    25              0
>>     web                     1573            0
>>     lacp                    4199            0
>>     dot1x                   0               0
>>     dot1ag                  177             0
>>     loop_detect             127             0
>>     ccp                     12              0
>>     cluster_mgr             131             0
>>     hw_access               0               0
>>     ntp                     22              0
>>     openflow_ofm            15              0
>>     openflow_opm            30              0
>>     dhcp6                   0               0
>>     sysmon                  0               0
>>     ospf_msg_task           0               0
>>     ssl                     0               0
>>     http_client             0               0
>>     lp                      723566          76
>>     LP-I2C                  35              0
>>     ssh_0                   84              0
>>     ssh_1                   2140            0
>>     ssh_2                   5072            0
>>     ssh_3                   43              0
>>
>>     The documentation states the device is able to handle 1.5 Mio routes
>>     and we didn't get above this limit:
>>
>>     SSH at CER(config-bgp)#show ip bgp route sum
>>         Total number of BGP routes (NLRIs) Installed     : 1210135
>>         Distinct BGP destination networks                : 697652
>>         Filtered bgp routes for soft reconfig            : 394895
>>         Routes originated by this router                 : 4
>>         Routes selected as BEST routes                   : 410535
>>         BEST routes not installed in IP forwarding table : 0
>>         Unreachable routes (no IGP route for NEXTHOP)    : 0
>>         IBGP routes selected as best routes              : 79640
>>         EBGP routes selected as best routes              : 330891
>>
>>
>>     SSH at CER(config-bgp)#show ip route sum
>>     IP Routing Table - 410845 entries
>>         8 connected, 11 static, 0 RIP, 294 OSPF, 410532 BGP, 0 ISIS
>>         Number of prefixes:
>>         /0: 1 /4: 1 /8: 16 /9: 11 /10: 36 /11: 99 /12: 291 /13: 565 /14:
>>     1099
>>     /15: 1924 /16: 13355 /17: 7910 /18: 13673 /19: 24926 /20: 38033 /21:
>>     44870 /22: 86917 /23: 70274 /24: 106572 /25: 12 /26: 11 /27: 25 
>> /28: 21
>>     /29: 21 /30: 67 /32: 115
>>     Nexthop Table Entry - 682 entries
>>
>>     Can anybody give me some hint what could cause the behaviour
>>     described above or what to investigate to tackle that issue?
>>
>>
>>        --
>>        Frank Menzel - menzel at sipgate.de <mailto:menzel at sipgate.de>
>>
>>        sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
>>        HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
>>        Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
>>
>>     http://www.sipgate.de - http://www.sipgate.co.uk
>>     _______________________________________________
>>     foundry-nsp mailing list
>>     foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>>     http://puck.nether.net/mailman/listinfo/foundry-nsp
>>     <http://puck.nether.net/mailman/listinfo/foundry-nsp>
>>     _______________________________________________
>>     foundry-nsp mailing list
>>     foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>>     http://puck.nether.net/mailman/listinfo/foundry-nsp
>>     <http://puck.nether.net/mailman/listinfo/foundry-nsp>
>>
>>
> 

-- 
  Frank Menzel - menzel at sipgate.de
  Telefon: +49 (0)211-63 55 55-98
  Telefax: +49 (0)211-63 55 55-22

  sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
  HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
  Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391

http://www.sipgate.de - http://www.sipgate.co.uk