[f-nsp] OSPF and BGP flapping when enabling a certain amount of BGP neighbors
Frank Menzel
menzel at sipgate.de
Fri Jun 22 11:56:47 EDT 2018
Hi,
one of our CER2024F routers started to behave weird without noticeable
reason, we didn't apply any changes before:
A while ago the device showed up in out monitoring with flapping OSPF
sessions caused by malformed packets and BGP sessions with expired
hold-timers. This made the device to become unresponsive, so we disabled
most BGP sessions except the one to our transit partner and 4 iBPG
sessions. This brought the device to an operational state again.
In exchange we received a new identical device from our vendor and
applied a configuration backup of the former device, but it behaves just
like the old one when we took all sessions in service.
To get an idea how many sessions are needed to cause issues we carefully
took sessions of small networks in service one by one while observing
cpu, memory usage and the number of routes installed. No issues occured,
so we took two big sessions in service (DECIX route servers), again,
nothing remarkable happened.
Encouraged by that we simultaneously took 10 sessions in service and the
ospf flapping started, so we disabled them and the device was able to
cope with its workload again.
To make sure we don't exceed the capabilities of the device we took
those sessions in service one by one with a delay of 10 seconds, this
did *not* cause OSPF flaps or BGP connections to restart, so we decided
to take the last 10 remaining sessions in service at once again, which
almost immediately caused OSPF flaps and BGP sessions to restart.
Therefore we stopped all sessions we took in service before, except the
transit partner and 4 iBGP sessions, but the flapping continued, the
only way to get the CER to an operational state again was reloading it
with most of the BGP sessions disabled by default.
However, we were able to drag some information from the device during
the last flapping, we didn't see a significant change in memory usage,
but the load increased dramatically:
SSH at CER(config-bgp)#sho cpu-utilization
00:09:57 GMT+01 Fri Jun 22 2018
... Usage average for all tasks in the last 1 seconds ...
==========================================================
Name us/sec %
idle 0 0
con 35 0
mon 190 0
flash 44 0
dbg 39 0
boot 70 0
main 0 0
itc 0 0
tmr 4358 0
ip_rx 26720 2
scp 54 0
lpagent 357 0
console 324 0
vlan 0 0
mac_mgr 199 0
mrp 241 0
vsrp 0 0
erp 239 0
mxrp 127 0
snms 0 0
rtm 638 0
rtm6 301 0
ip_tx 11100 1
rip 0 0
l2vpn 0 0
mpls 0 0
nht 0 0
mpls_glue 0 0
pcep 0 0
bgp 212773 21
bgp_io 240 0
ospf 1005 0
ospf_r_calc 1193 0
isis 260 0
isis_spf 0 0
mcast 460 0
msdp 23 0
vrrp 0 0
ripng 0 0
ospf6 667 0
ospf6_rt 0 0
mcast6 557 0
vrrp6 0 0
bfd 20 0
ipsec 57 0
l4 0 0
stp 0 0
gvrp_mgr 0 0
snmp 458 0
rmon 25 0
web 1573 0
lacp 4199 0
dot1x 0 0
dot1ag 177 0
loop_detect 127 0
ccp 12 0
cluster_mgr 131 0
hw_access 0 0
ntp 22 0
openflow_ofm 15 0
openflow_opm 30 0
dhcp6 0 0
sysmon 0 0
ospf_msg_task 0 0
ssl 0 0
http_client 0 0
lp 723566 76
LP-I2C 35 0
ssh_0 84 0
ssh_1 2140 0
ssh_2 5072 0
ssh_3 43 0
The documentation states the device is able to handle 1.5 Mio routes and
we didn't get above this limit:
SSH at CER(config-bgp)#show ip bgp route sum
Total number of BGP routes (NLRIs) Installed : 1210135
Distinct BGP destination networks : 697652
Filtered bgp routes for soft reconfig : 394895
Routes originated by this router : 4
Routes selected as BEST routes : 410535
BEST routes not installed in IP forwarding table : 0
Unreachable routes (no IGP route for NEXTHOP) : 0
IBGP routes selected as best routes : 79640
EBGP routes selected as best routes : 330891
SSH at CER(config-bgp)#show ip route sum
IP Routing Table - 410845 entries
8 connected, 11 static, 0 RIP, 294 OSPF, 410532 BGP, 0 ISIS
Number of prefixes:
/0: 1 /4: 1 /8: 16 /9: 11 /10: 36 /11: 99 /12: 291 /13: 565 /14: 1099
/15: 1924 /16: 13355 /17: 7910 /18: 13673 /19: 24926 /20: 38033 /21:
44870 /22: 86917 /23: 70274 /24: 106572 /25: 12 /26: 11 /27: 25 /28: 21
/29: 21 /30: 67 /32: 115
Nexthop Table Entry - 682 entries
Can anybody give me some hint what could cause the behaviour described
above or what to investigate to tackle that issue?
--
Frank Menzel - menzel at sipgate.de
sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391
http://www.sipgate.de - http://www.sipgate.co.uk
More information about the foundry-nsp
mailing list