[f-nsp] OSPF and BGP flapping when enabling a certain amount of BGP neighbors
Frank Menzel
menzel at sipgate.de
Thu Jun 28 04:15:14 EDT 2018
Finally, we gave the disabling of ICMP redirects a try and it worked
like a charm!
I was asked about the metrics mentioned, we are using observium for this.
We could not set that within the global context, but in the interface:
interface ethernet 1/1
no ip redirect
Thank you so much!
On 06/25/2018 05:29 PM, Frank Menzel wrote:
> Disabling the ICMP redirects looks absolutely promising, we found
> metrics for that and the amount of redirects during our time of testing
> was significant. I'll definitely give that a try. Stay tuned, I'll
> report back.
>
> Thanks for replying!
>
> On 06/22/2018 07:50 PM, Eldon Koyle wrote:
>> I'll second Dennis. Disabling icmp redirects is extremely important
>> if you have multiple addresses on a single interface.
>>
>> If you have a lot of routes, you may need to change your system-max
>> values. Run 'show default values' and look for ip-route and ip-cache
>> values (and ipv6- equivalents). The defaults are usually quite low
>> (290k routes on our CER2024F's, this needs to fit your entire FIB).
>> Change with 'system-max <parameter> <value>', write mem, then reload.
>> On the MLX, you also have to worry about cam partitioning profiles.
>> The CER2024F may be able to handle 1.5M routes in the BGP RIB, but it
>> has a HW max of 524288 in the FIB.
>>
>> I have also seen a lot of lp-cpu usage caused by multicast traffic,
>> especially with older code.
>>
>> If you see high lp cpu again in the future, you can run 'dm pstat' a
>> few times to try to get an idea of what kind of traffic you are
>> receiving. The first run is typically a throwaway, as it shows counts
>> since the last run. It gives per-PP stats, but I think the CERs only
>> have one PP anyway. If you are feeling brave, you can use 'rconsole'
>> to connect to the LP and play with 'debug packet capture'
>> (captures/displays packets that are hitting the lp cpu), but beware...
>> I have had devices unexpectedly reboot playing with that. Always
>> specify a limit.
>>
>> --
>> Eldon
>>
>> On Fri, Jun 22, 2018 at 10:06 AM, Dennis op de Weegh <info at bitency.nl
>> <mailto:info at bitency.nl>> wrote:
>>
>> Can you post your confg?
>>
>> LP load looks high.
>> Try to disable icmp redirect in config:
>>
>> no ip icmp redirect
>>
>> It's a Brocade thing...
>>
>>
>>
>> Kind regards/Met vriendelijke groet,
>>
>> Dennis op de Weegh
>>
>>
>>
>> Bitency
>> Willem van Oranjestraat 9
>> 4931NJ Geertruidenberg
>>
>> Kvk nummer: 20144338
>> BTW nummer: NL213538519B01
>>
>> W: www.bitency.nl <http://www.bitency.nl>
>> E: info at bitency.nl <mailto:info at bitency.nl>
>> T: +31 (0)162 714066
>>
>>
>> -----Oorspronkelijk bericht-----
>> Van: foundry-nsp <foundry-nsp-bounces at puck.nether.net
>> <mailto:foundry-nsp-bounces at puck.nether.net>> Namens Frank Menzel
>> Verzonden: vrijdag 22 juni 2018 17:57
>> Aan: foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>> Onderwerp: [f-nsp] OSPF and BGP flapping when enabling a certain
>> amount of BGP neighbors
>>
>> Hi,
>>
>> one of our CER2024F routers started to behave weird without
>> noticeable reason, we didn't apply any changes before:
>>
>> A while ago the device showed up in out monitoring with flapping
>> OSPF sessions caused by malformed packets and BGP sessions with
>> expired hold-timers. This made the device to become unresponsive, so
>> we disabled
>> most BGP sessions except the one to our transit partner and 4
>> iBPG sessions. This brought the device to an operational state again.
>>
>> In exchange we received a new identical device from our vendor and
>> applied a configuration backup of the former device, but it behaves
>> just like the old one when we took all sessions in service.
>>
>> To get an idea how many sessions are needed to cause issues we
>> carefully took sessions of small networks in service one by one
>> while observing cpu, memory usage and the number of routes
>> installed. No issues occured, so we took two big sessions in service
>> (DECIX route servers), again, nothing remarkable happened.
>> Encouraged by that we simultaneously took 10 sessions in service and
>> the ospf flapping started, so we disabled them and the device was
>> able to cope with its workload again.
>> To make sure we don't exceed the capabilities of the device we took
>> those sessions in service one by one with a delay of 10 seconds,
>> this did *not* cause OSPF flaps or BGP connections to restart, so we
>> decided to take the last 10 remaining sessions in service at once
>> again, which almost immediately caused OSPF flaps and BGP sessions
>> to restart.
>> Therefore we stopped all sessions we took in service before, except
>> the transit partner and 4 iBGP sessions, but the flapping continued,
>> the only way to get the CER to an operational state again was
>> reloading it with most of the BGP sessions disabled by default.
>>
>> However, we were able to drag some information from the device
>> during the last flapping, we didn't see a significant change in
>> memory usage, but the load increased dramatically:
>>
>> SSH at CER(config-bgp)#sho cpu-utilization
>>
>> 00:09:57 GMT+01 Fri Jun 22 2018
>>
>> ... Usage average for all tasks in the last 1 seconds ...
>> ==========================================================
>> Name us/sec %
>>
>> idle 0 0
>> con 35 0
>> mon 190 0
>> flash 44 0
>> dbg 39 0
>> boot 70 0
>> main 0 0
>> itc 0 0
>> tmr 4358 0
>> ip_rx 26720 2
>> scp 54 0
>> lpagent 357 0
>> console 324 0
>> vlan 0 0
>> mac_mgr 199 0
>> mrp 241 0
>> vsrp 0 0
>> erp 239 0
>> mxrp 127 0
>> snms 0 0
>> rtm 638 0
>> rtm6 301 0
>> ip_tx 11100 1
>> rip 0 0
>> l2vpn 0 0
>> mpls 0 0
>> nht 0 0
>> mpls_glue 0 0
>> pcep 0 0
>> bgp 212773 21
>> bgp_io 240 0
>> ospf 1005 0
>> ospf_r_calc 1193 0
>> isis 260 0
>> isis_spf 0 0
>> mcast 460 0
>> msdp 23 0
>> vrrp 0 0
>> ripng 0 0
>> ospf6 667 0
>> ospf6_rt 0 0
>> mcast6 557 0
>> vrrp6 0 0
>> bfd 20 0
>> ipsec 57 0
>> l4 0 0
>> stp 0 0
>> gvrp_mgr 0 0
>> snmp 458 0
>> rmon 25 0
>> web 1573 0
>> lacp 4199 0
>> dot1x 0 0
>> dot1ag 177 0
>> loop_detect 127 0
>> ccp 12 0
>> cluster_mgr 131 0
>> hw_access 0 0
>> ntp 22 0
>> openflow_ofm 15 0
>> openflow_opm 30 0
>> dhcp6 0 0
>> sysmon 0 0
>> ospf_msg_task 0 0
>> ssl 0 0
>> http_client 0 0
>> lp 723566 76
>> LP-I2C 35 0
>> ssh_0 84 0
>> ssh_1 2140 0
>> ssh_2 5072 0
>> ssh_3 43 0
>>
>> The documentation states the device is able to handle 1.5 Mio routes
>> and we didn't get above this limit:
>>
>> SSH at CER(config-bgp)#show ip bgp route sum
>> Total number of BGP routes (NLRIs) Installed : 1210135
>> Distinct BGP destination networks : 697652
>> Filtered bgp routes for soft reconfig : 394895
>> Routes originated by this router : 4
>> Routes selected as BEST routes : 410535
>> BEST routes not installed in IP forwarding table : 0
>> Unreachable routes (no IGP route for NEXTHOP) : 0
>> IBGP routes selected as best routes : 79640
>> EBGP routes selected as best routes : 330891
>>
>>
>> SSH at CER(config-bgp)#show ip route sum
>> IP Routing Table - 410845 entries
>> 8 connected, 11 static, 0 RIP, 294 OSPF, 410532 BGP, 0 ISIS
>> Number of prefixes:
>> /0: 1 /4: 1 /8: 16 /9: 11 /10: 36 /11: 99 /12: 291 /13: 565 /14:
>> 1099
>> /15: 1924 /16: 13355 /17: 7910 /18: 13673 /19: 24926 /20: 38033 /21:
>> 44870 /22: 86917 /23: 70274 /24: 106572 /25: 12 /26: 11 /27: 25
>> /28: 21
>> /29: 21 /30: 67 /32: 115
>> Nexthop Table Entry - 682 entries
>>
>> Can anybody give me some hint what could cause the behaviour
>> described above or what to investigate to tackle that issue?
>>
>>
>> --
>> Frank Menzel - menzel at sipgate.de <mailto:menzel at sipgate.de>
>>
>> sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
>> HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
>> Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
>>
>> http://www.sipgate.de - http://www.sipgate.co.uk
>> _______________________________________________
>> foundry-nsp mailing list
>> foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>> http://puck.nether.net/mailman/listinfo/foundry-nsp
>> <http://puck.nether.net/mailman/listinfo/foundry-nsp>
>> _______________________________________________
>> foundry-nsp mailing list
>> foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
>> http://puck.nether.net/mailman/listinfo/foundry-nsp
>> <http://puck.nether.net/mailman/listinfo/foundry-nsp>
>>
>>
>
--
Frank Menzel - menzel at sipgate.de
Telefon: +49 (0)211-63 55 55-98
Telefax: +49 (0)211-63 55 55-22
sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
http://www.sipgate.de - http://www.sipgate.co.uk
More information about the foundry-nsp
mailing list