[f-nsp] OSPF and BGP flapping when enabling a certain amount of BGP neighbors

Frank Menzel menzel at sipgate.de
Fri Jun 22 11:56:47 EDT 2018


Hi,

  one of our CER2024F routers started to behave weird without noticeable 
reason, we didn't apply any changes before:

A while ago the device showed up in out monitoring with flapping OSPF 
sessions caused by malformed packets and BGP sessions with expired 
hold-timers. This made the device to become unresponsive, so we disabled 
  most BGP sessions except the one to our transit partner and 4 iBPG 
sessions. This brought the device to an operational state again.

In exchange we received a new identical device from our vendor and 
applied a configuration backup of the former device, but it behaves just 
like the old one when we took all sessions in service.

To get an idea how many sessions are needed to cause issues we carefully 
took sessions of small networks in service one by one while observing 
cpu, memory usage and the number of routes installed. No issues occured, 
so we took two big sessions in service (DECIX route servers), again, 
nothing remarkable happened.
Encouraged by that we simultaneously took 10 sessions in service and the 
ospf flapping started, so we disabled them and the device was able to 
cope with its workload again.
To make sure we don't exceed the capabilities of the device we took 
those sessions in service one by one with a delay of 10 seconds, this 
did *not* cause OSPF flaps or BGP connections to restart, so we decided 
to take the last 10 remaining sessions in service at once again, which 
almost immediately caused OSPF flaps and BGP sessions to restart.
Therefore we stopped all sessions we took in service before, except the 
transit partner and 4 iBGP sessions, but the flapping continued, the 
only way to get the CER to an operational state again was reloading it 
with most of the BGP sessions disabled by default.

However, we were able to drag some information from the device during 
the last flapping, we didn't see a significant change in memory usage, 
but the load increased dramatically:

SSH at CER(config-bgp)#sho cpu-utilization

00:09:57 GMT+01 Fri Jun 22 2018

... Usage average for all tasks in the last 1 seconds  ...
==========================================================
Name                    us/sec          %

idle                    0               0
con                     35              0
mon                     190             0
flash                   44              0
dbg                     39              0
boot                    70              0
main                    0               0
itc                     0               0
tmr                     4358            0
ip_rx                   26720           2
scp                     54              0
lpagent                 357             0
console                 324             0
vlan                    0               0
mac_mgr                 199             0
mrp                     241             0
vsrp                    0               0
erp                     239             0
mxrp                    127             0
snms                    0               0
rtm                     638             0
rtm6                    301             0
ip_tx                   11100           1
rip                     0               0
l2vpn                   0               0
mpls                    0               0
nht                     0               0
mpls_glue               0               0
pcep                    0               0
bgp                     212773          21
bgp_io                  240             0
ospf                    1005            0
ospf_r_calc             1193            0
isis                    260             0
isis_spf                0               0
mcast                   460             0
msdp                    23              0
vrrp                    0               0
ripng                   0               0
ospf6                   667             0
ospf6_rt                0               0
mcast6                  557             0
vrrp6                   0               0
bfd                     20              0
ipsec                   57              0
l4                      0               0
stp                     0               0
gvrp_mgr                0               0
snmp                    458             0
rmon                    25              0
web                     1573            0
lacp                    4199            0
dot1x                   0               0
dot1ag                  177             0
loop_detect             127             0
ccp                     12              0
cluster_mgr             131             0
hw_access               0               0
ntp                     22              0
openflow_ofm            15              0
openflow_opm            30              0
dhcp6                   0               0
sysmon                  0               0
ospf_msg_task           0               0
ssl                     0               0
http_client             0               0
lp                      723566          76
LP-I2C                  35              0
ssh_0                   84              0
ssh_1                   2140            0
ssh_2                   5072            0
ssh_3                   43              0

The documentation states the device is able to handle 1.5 Mio routes and 
we didn't get above this limit:

SSH at CER(config-bgp)#show ip bgp route sum
   Total number of BGP routes (NLRIs) Installed     : 1210135
   Distinct BGP destination networks                : 697652
   Filtered bgp routes for soft reconfig            : 394895
   Routes originated by this router                 : 4
   Routes selected as BEST routes                   : 410535
   BEST routes not installed in IP forwarding table : 0
   Unreachable routes (no IGP route for NEXTHOP)    : 0
   IBGP routes selected as best routes              : 79640
   EBGP routes selected as best routes              : 330891


SSH at CER(config-bgp)#show ip route sum
IP Routing Table - 410845 entries
   8 connected, 11 static, 0 RIP, 294 OSPF, 410532 BGP, 0 ISIS
   Number of prefixes:
   /0: 1 /4: 1 /8: 16 /9: 11 /10: 36 /11: 99 /12: 291 /13: 565 /14: 1099 
/15: 1924 /16: 13355 /17: 7910 /18: 13673 /19: 24926 /20: 38033 /21: 
44870 /22: 86917 /23: 70274 /24: 106572 /25: 12 /26: 11 /27: 25 /28: 21 
/29: 21 /30: 67 /32: 115
Nexthop Table Entry - 682 entries

Can anybody give me some hint what could cause the behaviour described 
above or what to investigate to tackle that issue?


  --
  Frank Menzel - menzel at sipgate.de

  sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
  HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
  Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391

http://www.sipgate.de - http://www.sipgate.co.uk


More information about the foundry-nsp mailing list