[j-nsp] MX104 with full BGP table problems
Tyler Christiansen
tyler.christiansen at adap.tv
Fri May 16 14:58:04 EDT 2014
I don't have experience with the MX104s but do with the rest of the line
(MX80 to MX2010 [excluding MX104, of course]). MX80 isn't dual RE, but the
CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500
or 600 Mhz faster. And the MX80 kind of chokes when receiving a full feed
(even just one at a time can easily send it up to ~40% during the initial
feed consumption). ;)
The MX80 and MX104 being sold as edge BGP routers is pretty much only
because it has enough memory to do it...not because it's a good idea.
It's pretty odd for the backup RE to have CPU utilization (based on
experience with the other dual RE MX devices). Some, yes, but not 100%
utilization as you show there. I would buy 100% utilization during initial
feed consumption on the master. After you have some stability in the
network, though, the CPU should be back down to ~5-15% (depending on what
you have going on).
How aggressive are your BGP timers? You may want to consider BFD instead
of BGP timers for aggressive keepalives.
Are you doing just plain IPv4 BGP, or are you utilizing MBGP extensions?
MBGP extensions can inflate the size of the BGP tables and make the router
do more work.
In all scenarios, you really should probably have loopback IPs in the IGP
and have the nexthop set to the loopback IPs for iBGP sessions. I'm not
sure why you have /30 P2P links as the next-hops as they're potentially
unstable (even if they're not now, they can easily become unstable once in
production). I assume that since you mentioned you know it's not
recommended, you're going to be changing that.
In scenario #2, how many RRs does the MX104 peer with? And are they
sending full routes or full routes + more?
Finally, in scenario #3, if you're trying to do a full mesh with 11 other
peers, the MX104 will choke if they're all trying to load full tables.
There are about 500,000 routes in the global table, so you're trying to
load 5,500,000 routes into a box with a 1.8Ghz CPU and 4GB RAM.
Regardless, I would think that the MX104 should be perfectly capable of
scaling to at least five or six full feeds. I would suspect either a bug
in the software or very aggressive timers.
On Fri, May 16, 2014 at 11:00 AM, Brad Fleming <bdflemin at gmail.com> wrote:
> We’ve been working with a handful of MX104s on the bench in preparation of
> putting them into a live network. We started pushing a full BGP table into
> the device and stumbled across some CPU utilization problems.
>
> We tried pushing a full table into the box three different ways:
> 1) via an eBGP session
> 2) via a reflected session on an iBGP session
> 3) via a full mesh of iBGP sessions (11 other routers)
>
> In situation #1: RE CPU was slightly elevated but remained ~60% idle and
> 1min load averages were around 0.3.
>
> In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s
> for our next-hops (I know, not best practice for many networks) which
> results in a total of about 50-65 next-hops network-wide.
>
> In situation #3: RE CPU is saturated at all times. In this case we
> configured the mesh sessions to advertise routes with “next-hop-self” so
> the number of next-hops is reduced to 11 total.
>
> It appears that RPD Is the process actually killing the CPU; nearly always
> running 75+% and in a “RUN” state. If we enable task accounting it shows
> “Resolve Tree 2” as the task consuming tons of CPU time. (see below)
> There’s plenty of RAM remaining, we’re not using any swap space, and we’ve
> not exceed the number of routes licensed for the system; we paid for the
> full 1Million+ route scaling. Logs are full of lost communication with the
> backup RE; however, if we disable all the BGP sessions that issue goes away
> completely (for days on end).
>
> Has anyone else tried shoving a full BGP table into one of these routers
> yet? Have you noticed anything similar?
>
> I’ve opened a JTAC case for the issue but I’m wondering if anyone with
> more experience in multi-RE setups has seen similar. Thanks in advance for
> any thoughts, suggestions, or insights.
>
>
> Incoming command output dump….
>
> netadm at test-MX104> show chassis routing-engine
> Routing Engine status:
> Slot 0:
> Current state Master
> Election priority Master (default)
> Temperature 39 degrees C / 102 degrees F
> CPU temperature 42 degrees C / 107 degrees F
> DRAM 3968 MB (4096 MB installed)
> Memory utilization 32 percent
> CPU utilization:
> User 87 percent
> Background 0 percent
> Kernel 11 percent
> Interrupt 2 percent
> Idle 0 percent
> Model RE-MX-104
> Serial ID CACH2444
> Start time 2009-12-31 18:05:43 CST
> Uptime 21 hours, 31 minutes, 32 seconds
> Last reboot reason 0x200:normal shutdown
> Load averages: 1 minute 5 minute 15 minute
> 1.06 1.12 1.23
> Routing Engine status:
> Slot 1:
> Current state Backup
> Election priority Backup (default)
> Temperature 37 degrees C / 98 degrees F
> CPU temperature 38 degrees C / 100 degrees F
> DRAM 3968 MB (4096 MB installed)
> Memory utilization 30 percent
> CPU utilization:
> User 62 percent
> Background 0 percent
> Kernel 15 percent
> Interrupt 24 percent
> Idle 0 percent
> Model RE-MX-104
> Serial ID CACD1529
> Start time 2010-03-18 05:16:34 CDT
> Uptime 21 hours, 45 minutes, 26 seconds
> Last reboot reason 0x200:normal shutdown
> Load averages: 1 minute 5 minute 15 minute
> 1.22 1.19 1.20
>
> netadm at test-MX104> show system processes extensive
> last pid: 20303; load averages: 1.18, 1.14, 1.22 up 0+21:33:35
> 03:03:41
> 127 processes: 8 running, 99 sleeping, 20 waiting
> Mem: 796M Active, 96M Inact, 308M Wired, 270M Cache, 112M Buf, 2399M Free
> Swap: 1025M Total, 1025M Free
> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU
> COMMAND
> 3217 root 1 132 0 485M 432M RUN 120:56 72.85% rpd
>
> netadm at test-MX104> show task accounting
> Task accounting is enabled.
>
> Task Started User Time System Time Longest Run
> Scheduler 32294 0.924 0.148 0.000
> Memory 26 0.001 0.000 0.000
> RT 5876 0.947 0.162 0.003
> hakr 6 0.000 0.000 0.000
> OSPF I/O./var/run/ppmd_co 117 0.002 0.000 0.000
> BGP rsync 192 0.007 0.001 0.000
> BGP_RT_Background 78 0.001 0.000 0.000
> BGP_Listen.0.0.0.0+179 2696 1.101 0.218 0.009
> PIM I/O./var/run/ppmd_con 117 0.003 0.000 0.000
> OSPF 629 0.005 0.000 0.000
> BGP Standby Cache Task 26 0.000 0.000 0.000
> BFD I/O./var/run/bfdd_con 117 0.003 0.000 0.000
> BGP_2495_2495.164.113.199 1947 0.072 0.012 0.000
> BGP_2495_2495.164.113.199 1566 0.056 0.010 0.000
> BGP_2495_2495.164.113.199 1388 0.053 0.008 0.000
> Resolve tree 3 1421 24.523 13.270 0.102
> Resolve tree 2 14019 16:33.079 20.983 0.101
> Mirror Task.128.0.0.6+584 464 0.018 0.004 0.000
> KRT 1074 0.157 0.159 0.004
> Redirect 9 0.000 0.000 0.000
> MGMT_Listen./var/run/rpd_ 54 0.009 0.005 0.000
> SNMP Subagent./var/run/sn 258 0.052 0.052 0.001
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
More information about the juniper-nsp
mailing list