[j-nsp] MX104 with full BGP table problems
Brad Fleming
bdflemin at gmail.com
Fri May 16 14:00:05 EDT 2014
We’ve been working with a handful of MX104s on the bench in preparation of putting them into a live network. We started pushing a full BGP table into the device and stumbled across some CPU utilization problems.
We tried pushing a full table into the box three different ways:
1) via an eBGP session
2) via a reflected session on an iBGP session
3) via a full mesh of iBGP sessions (11 other routers)
In situation #1: RE CPU was slightly elevated but remained ~60% idle and 1min load averages were around 0.3.
In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s for our next-hops (I know, not best practice for many networks) which results in a total of about 50-65 next-hops network-wide.
In situation #3: RE CPU is saturated at all times. In this case we configured the mesh sessions to advertise routes with “next-hop-self” so the number of next-hops is reduced to 11 total.
It appears that RPD Is the process actually killing the CPU; nearly always running 75+% and in a “RUN” state. If we enable task accounting it shows “Resolve Tree 2” as the task consuming tons of CPU time. (see below) There’s plenty of RAM remaining, we’re not using any swap space, and we’ve not exceed the number of routes licensed for the system; we paid for the full 1Million+ route scaling. Logs are full of lost communication with the backup RE; however, if we disable all the BGP sessions that issue goes away completely (for days on end).
Has anyone else tried shoving a full BGP table into one of these routers yet? Have you noticed anything similar?
I’ve opened a JTAC case for the issue but I’m wondering if anyone with more experience in multi-RE setups has seen similar. Thanks in advance for any thoughts, suggestions, or insights.
Incoming command output dump….
netadm at test-MX104> show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Master
Election priority Master (default)
Temperature 39 degrees C / 102 degrees F
CPU temperature 42 degrees C / 107 degrees F
DRAM 3968 MB (4096 MB installed)
Memory utilization 32 percent
CPU utilization:
User 87 percent
Background 0 percent
Kernel 11 percent
Interrupt 2 percent
Idle 0 percent
Model RE-MX-104
Serial ID CACH2444
Start time 2009-12-31 18:05:43 CST
Uptime 21 hours, 31 minutes, 32 seconds
Last reboot reason 0x200:normal shutdown
Load averages: 1 minute 5 minute 15 minute
1.06 1.12 1.23
Routing Engine status:
Slot 1:
Current state Backup
Election priority Backup (default)
Temperature 37 degrees C / 98 degrees F
CPU temperature 38 degrees C / 100 degrees F
DRAM 3968 MB (4096 MB installed)
Memory utilization 30 percent
CPU utilization:
User 62 percent
Background 0 percent
Kernel 15 percent
Interrupt 24 percent
Idle 0 percent
Model RE-MX-104
Serial ID CACD1529
Start time 2010-03-18 05:16:34 CDT
Uptime 21 hours, 45 minutes, 26 seconds
Last reboot reason 0x200:normal shutdown
Load averages: 1 minute 5 minute 15 minute
1.22 1.19 1.20
netadm at test-MX104> show system processes extensive
last pid: 20303; load averages: 1.18, 1.14, 1.22 up 0+21:33:35 03:03:41
127 processes: 8 running, 99 sleeping, 20 waiting
Mem: 796M Active, 96M Inact, 308M Wired, 270M Cache, 112M Buf, 2399M Free
Swap: 1025M Total, 1025M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
3217 root 1 132 0 485M 432M RUN 120:56 72.85% rpd
netadm at test-MX104> show task accounting
Task accounting is enabled.
Task Started User Time System Time Longest Run
Scheduler 32294 0.924 0.148 0.000
Memory 26 0.001 0.000 0.000
RT 5876 0.947 0.162 0.003
hakr 6 0.000 0.000 0.000
OSPF I/O./var/run/ppmd_co 117 0.002 0.000 0.000
BGP rsync 192 0.007 0.001 0.000
BGP_RT_Background 78 0.001 0.000 0.000
BGP_Listen.0.0.0.0+179 2696 1.101 0.218 0.009
PIM I/O./var/run/ppmd_con 117 0.003 0.000 0.000
OSPF 629 0.005 0.000 0.000
BGP Standby Cache Task 26 0.000 0.000 0.000
BFD I/O./var/run/bfdd_con 117 0.003 0.000 0.000
BGP_2495_2495.164.113.199 1947 0.072 0.012 0.000
BGP_2495_2495.164.113.199 1566 0.056 0.010 0.000
BGP_2495_2495.164.113.199 1388 0.053 0.008 0.000
Resolve tree 3 1421 24.523 13.270 0.102
Resolve tree 2 14019 16:33.079 20.983 0.101
Mirror Task.128.0.0.6+584 464 0.018 0.004 0.000
KRT 1074 0.157 0.159 0.004
Redirect 9 0.000 0.000 0.000
MGMT_Listen./var/run/rpd_ 54 0.009 0.005 0.000
SNMP Subagent./var/run/sn 258 0.052 0.052 0.001
More information about the juniper-nsp
mailing list