[f-nsp] OSPF and BGP flapping when enabling a certain amount of BGP neighbors
Frank Menzel
menzel at sipgate.de
Mon Jun 25 11:29:11 EDT 2018
Disabling the ICMP redirects looks absolutely promising, we found
metrics for that and the amount of redirects during our time of testing
was significant. I'll definitely give that a try. Stay tuned, I'll
report back.
Thanks for replying!
On 06/22/2018 07:50 PM, Eldon Koyle wrote:
> I'll second Dennis. Disabling icmp redirects is extremely important if
> you have multiple addresses on a single interface.
>
> If you have a lot of routes, you may need to change your system-max
> values. Run 'show default values' and look for ip-route and ip-cache
> values (and ipv6- equivalents). The defaults are usually quite low
> (290k routes on our CER2024F's, this needs to fit your entire FIB).
> Change with 'system-max <parameter> <value>', write mem, then reload.
> On the MLX, you also have to worry about cam partitioning profiles. The
> CER2024F may be able to handle 1.5M routes in the BGP RIB, but it has a
> HW max of 524288 in the FIB.
>
> I have also seen a lot of lp-cpu usage caused by multicast traffic,
> especially with older code.
>
> If you see high lp cpu again in the future, you can run 'dm pstat' a few
> times to try to get an idea of what kind of traffic you are receiving.
> The first run is typically a throwaway, as it shows counts since the
> last run. It gives per-PP stats, but I think the CERs only have one PP
> anyway. If you are feeling brave, you can use 'rconsole' to connect to
> the LP and play with 'debug packet capture' (captures/displays packets
> that are hitting the lp cpu), but beware... I have had devices
> unexpectedly reboot playing with that. Always specify a limit.
>
> --
> Eldon
>
> On Fri, Jun 22, 2018 at 10:06 AM, Dennis op de Weegh <info at bitency.nl
> <mailto:info at bitency.nl>> wrote:
>
> Can you post your confg?
>
> LP load looks high.
> Try to disable icmp redirect in config:
>
> no ip icmp redirect
>
> It's a Brocade thing...
>
>
>
> Kind regards/Met vriendelijke groet,
>
> Dennis op de Weegh
>
>
>
> Bitency
> Willem van Oranjestraat 9
> 4931NJ Geertruidenberg
>
> Kvk nummer: 20144338
> BTW nummer: NL213538519B01
>
> W: www.bitency.nl <http://www.bitency.nl>
> E: info at bitency.nl <mailto:info at bitency.nl>
> T: +31 (0)162 714066
>
>
> -----Oorspronkelijk bericht-----
> Van: foundry-nsp <foundry-nsp-bounces at puck.nether.net
> <mailto:foundry-nsp-bounces at puck.nether.net>> Namens Frank Menzel
> Verzonden: vrijdag 22 juni 2018 17:57
> Aan: foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
> Onderwerp: [f-nsp] OSPF and BGP flapping when enabling a certain
> amount of BGP neighbors
>
> Hi,
>
> one of our CER2024F routers started to behave weird without
> noticeable reason, we didn't apply any changes before:
>
> A while ago the device showed up in out monitoring with flapping
> OSPF sessions caused by malformed packets and BGP sessions with
> expired hold-timers. This made the device to become unresponsive, so
> we disabled
> most BGP sessions except the one to our transit partner and 4
> iBPG sessions. This brought the device to an operational state again.
>
> In exchange we received a new identical device from our vendor and
> applied a configuration backup of the former device, but it behaves
> just like the old one when we took all sessions in service.
>
> To get an idea how many sessions are needed to cause issues we
> carefully took sessions of small networks in service one by one
> while observing cpu, memory usage and the number of routes
> installed. No issues occured, so we took two big sessions in service
> (DECIX route servers), again, nothing remarkable happened.
> Encouraged by that we simultaneously took 10 sessions in service and
> the ospf flapping started, so we disabled them and the device was
> able to cope with its workload again.
> To make sure we don't exceed the capabilities of the device we took
> those sessions in service one by one with a delay of 10 seconds,
> this did *not* cause OSPF flaps or BGP connections to restart, so we
> decided to take the last 10 remaining sessions in service at once
> again, which almost immediately caused OSPF flaps and BGP sessions
> to restart.
> Therefore we stopped all sessions we took in service before, except
> the transit partner and 4 iBGP sessions, but the flapping continued,
> the only way to get the CER to an operational state again was
> reloading it with most of the BGP sessions disabled by default.
>
> However, we were able to drag some information from the device
> during the last flapping, we didn't see a significant change in
> memory usage, but the load increased dramatically:
>
> SSH at CER(config-bgp)#sho cpu-utilization
>
> 00:09:57 GMT+01 Fri Jun 22 2018
>
> ... Usage average for all tasks in the last 1 seconds ...
> ==========================================================
> Name us/sec %
>
> idle 0 0
> con 35 0
> mon 190 0
> flash 44 0
> dbg 39 0
> boot 70 0
> main 0 0
> itc 0 0
> tmr 4358 0
> ip_rx 26720 2
> scp 54 0
> lpagent 357 0
> console 324 0
> vlan 0 0
> mac_mgr 199 0
> mrp 241 0
> vsrp 0 0
> erp 239 0
> mxrp 127 0
> snms 0 0
> rtm 638 0
> rtm6 301 0
> ip_tx 11100 1
> rip 0 0
> l2vpn 0 0
> mpls 0 0
> nht 0 0
> mpls_glue 0 0
> pcep 0 0
> bgp 212773 21
> bgp_io 240 0
> ospf 1005 0
> ospf_r_calc 1193 0
> isis 260 0
> isis_spf 0 0
> mcast 460 0
> msdp 23 0
> vrrp 0 0
> ripng 0 0
> ospf6 667 0
> ospf6_rt 0 0
> mcast6 557 0
> vrrp6 0 0
> bfd 20 0
> ipsec 57 0
> l4 0 0
> stp 0 0
> gvrp_mgr 0 0
> snmp 458 0
> rmon 25 0
> web 1573 0
> lacp 4199 0
> dot1x 0 0
> dot1ag 177 0
> loop_detect 127 0
> ccp 12 0
> cluster_mgr 131 0
> hw_access 0 0
> ntp 22 0
> openflow_ofm 15 0
> openflow_opm 30 0
> dhcp6 0 0
> sysmon 0 0
> ospf_msg_task 0 0
> ssl 0 0
> http_client 0 0
> lp 723566 76
> LP-I2C 35 0
> ssh_0 84 0
> ssh_1 2140 0
> ssh_2 5072 0
> ssh_3 43 0
>
> The documentation states the device is able to handle 1.5 Mio routes
> and we didn't get above this limit:
>
> SSH at CER(config-bgp)#show ip bgp route sum
> Total number of BGP routes (NLRIs) Installed : 1210135
> Distinct BGP destination networks : 697652
> Filtered bgp routes for soft reconfig : 394895
> Routes originated by this router : 4
> Routes selected as BEST routes : 410535
> BEST routes not installed in IP forwarding table : 0
> Unreachable routes (no IGP route for NEXTHOP) : 0
> IBGP routes selected as best routes : 79640
> EBGP routes selected as best routes : 330891
>
>
> SSH at CER(config-bgp)#show ip route sum
> IP Routing Table - 410845 entries
> 8 connected, 11 static, 0 RIP, 294 OSPF, 410532 BGP, 0 ISIS
> Number of prefixes:
> /0: 1 /4: 1 /8: 16 /9: 11 /10: 36 /11: 99 /12: 291 /13: 565 /14:
> 1099
> /15: 1924 /16: 13355 /17: 7910 /18: 13673 /19: 24926 /20: 38033 /21:
> 44870 /22: 86917 /23: 70274 /24: 106572 /25: 12 /26: 11 /27: 25 /28: 21
> /29: 21 /30: 67 /32: 115
> Nexthop Table Entry - 682 entries
>
> Can anybody give me some hint what could cause the behaviour
> described above or what to investigate to tackle that issue?
>
>
> --
> Frank Menzel - menzel at sipgate.de <mailto:menzel at sipgate.de>
>
> sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
> HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
> Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
>
> http://www.sipgate.de - http://www.sipgate.co.uk
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
> http://puck.nether.net/mailman/listinfo/foundry-nsp
> <http://puck.nether.net/mailman/listinfo/foundry-nsp>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
> http://puck.nether.net/mailman/listinfo/foundry-nsp
> <http://puck.nether.net/mailman/listinfo/foundry-nsp>
>
>
--
Frank Menzel - menzel at sipgate.de
Telefon: +49 (0)211-63 55 55-98
Telefax: +49 (0)211-63 55 55-22
sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
http://www.sipgate.de - http://www.sipgate.co.uk
More information about the foundry-nsp
mailing list