[j-nsp] MX104 with full BGP table problems

Fri May 16 14:58:04 EDT 2014

I don't have experience with the MX104s but do with the rest of the line
(MX80 to MX2010 [excluding MX104, of course]).  MX80 isn't dual RE, but the
CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500
or 600 Mhz faster.  And the MX80 kind of chokes when receiving a full feed
(even just one at a time can easily send it up to ~40% during the initial
feed consumption).  ;)

The MX80 and MX104 being sold as edge BGP routers is pretty much only
because it has enough memory to do it...not because it's a good idea.

It's pretty odd for the backup RE to have CPU utilization (based on
experience with the other dual RE MX devices).  Some, yes, but not 100%
utilization as you show there.  I would buy 100% utilization during initial
feed consumption on the master.  After you have some stability in the
network, though, the CPU should be back down to ~5-15% (depending on what
you have going on).

How aggressive are your BGP timers?  You may want to consider BFD instead
of BGP timers for aggressive keepalives.

Are you doing just plain IPv4 BGP, or are you utilizing MBGP extensions?
 MBGP extensions can inflate the size of the BGP tables and make the router
do more work.

In all scenarios, you really should probably have loopback IPs in the IGP
and have the nexthop set to the loopback IPs for iBGP sessions.  I'm not
sure why you have /30 P2P links as the next-hops as they're potentially
unstable (even if they're not now, they can easily become unstable once in
production).  I assume that since you mentioned you know it's not
recommended, you're going to be changing that.

In scenario #2, how many RRs does the MX104 peer with?  And are they
sending full routes or full routes + more?

Finally, in scenario #3, if you're trying to do a full mesh with 11 other
peers, the MX104 will choke if they're all trying to load full tables.
 There are about 500,000 routes in the global table, so you're trying to
load 5,500,000 routes into a box with a 1.8Ghz CPU and 4GB RAM.

Regardless, I would think that the MX104 should be perfectly capable of
scaling to at least five or six full feeds.  I would suspect either a bug
in the software or very aggressive timers.

On Fri, May 16, 2014 at 11:00 AM, Brad Fleming <bdflemin at gmail.com> wrote:

> We’ve been working with a handful of MX104s on the bench in preparation of
> putting them into a live network. We started pushing a full BGP table into
> the device and stumbled across some CPU utilization problems.
>
> We tried pushing a full table into the box three different ways:
> 1) via an eBGP session
> 2) via a reflected session on an iBGP session
> 3) via a full mesh of iBGP sessions (11 other routers)
>
> In situation #1: RE CPU was slightly elevated but remained ~60% idle and
> 1min load averages were around 0.3.
>
> In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s
> for our next-hops (I know, not best practice for many networks) which
> results in a total of about 50-65 next-hops network-wide.
>
> In situation #3: RE CPU is saturated at all times. In this case we
> configured the mesh sessions to advertise routes with “next-hop-self” so
> the number of next-hops is reduced to 11 total.
>
> It appears that RPD Is the process actually killing the CPU; nearly always
> running 75+% and in a “RUN” state. If we enable task accounting it shows
> “Resolve Tree 2” as the task consuming tons of CPU time. (see below)
> There’s plenty of RAM remaining, we’re not using any swap space, and we’ve
> not exceed the number of routes licensed for the system; we paid for the
> full 1Million+ route scaling. Logs are full of lost communication with the
> backup RE; however, if we disable all the BGP sessions that issue goes away
> completely (for days on end).
>
> Has anyone else tried shoving a full BGP table into one of these routers
> yet? Have you noticed anything similar?
>
> I’ve opened a JTAC case for the issue but I’m wondering if anyone with
> more experience in multi-RE setups has seen similar. Thanks in advance for
> any thoughts, suggestions, or insights.
>
>
> Incoming command output dump….
>
> netadm at test-MX104> show chassis routing-engine
> Routing Engine status:
>   Slot 0:
>     Current state                  Master
>     Election priority              Master (default)
>     Temperature                 39 degrees C / 102 degrees F
>     CPU temperature             42 degrees C / 107 degrees F
>     DRAM                      3968 MB (4096 MB installed)
>     Memory utilization          32 percent
>     CPU utilization:
>       User                      87 percent
>       Background                 0 percent
>       Kernel                    11 percent
>       Interrupt                  2 percent
>       Idle                       0 percent
>     Model                          RE-MX-104
>     Serial ID                      CACH2444
>     Start time                     2009-12-31 18:05:43 CST
>     Uptime                         21 hours, 31 minutes, 32 seconds
>     Last reboot reason             0x200:normal shutdown
>     Load averages:                 1 minute   5 minute  15 minute
>                                        1.06       1.12       1.23
> Routing Engine status:
>   Slot 1:
>     Current state                  Backup
>     Election priority              Backup (default)
>     Temperature                 37 degrees C / 98 degrees F
>     CPU temperature             38 degrees C / 100 degrees F
>     DRAM                      3968 MB (4096 MB installed)
>     Memory utilization          30 percent
>     CPU utilization:
>       User                      62 percent
>       Background                 0 percent
>       Kernel                    15 percent
>       Interrupt                 24 percent
>       Idle                       0 percent
>     Model                          RE-MX-104
>     Serial ID                      CACD1529
>     Start time                     2010-03-18 05:16:34 CDT
>     Uptime                         21 hours, 45 minutes, 26 seconds
>     Last reboot reason             0x200:normal shutdown
>     Load averages:                 1 minute   5 minute  15 minute
>                                        1.22       1.19       1.20
>
> netadm at test-MX104> show system processes extensive
> last pid: 20303;  load averages:  1.18,  1.14,  1.22  up 0+21:33:35
>  03:03:41
> 127 processes: 8 running, 99 sleeping, 20 waiting
> Mem: 796M Active, 96M Inact, 308M Wired, 270M Cache, 112M Buf, 2399M Free
> Swap: 1025M Total, 1025M Free
>   PID USERNAME         THR PRI NICE   SIZE    RES STATE    TIME   WCPU
> COMMAND
>  3217 root               1 132    0   485M   432M RUN    120:56 72.85% rpd
>
> netadm at test-MX104> show task accounting
> Task accounting is enabled.
>
> Task                       Started    User Time  System Time  Longest Run
> Scheduler                    32294        0.924        0.148        0.000
> Memory                          26        0.001        0.000        0.000
> RT                            5876        0.947        0.162        0.003
> hakr                             6        0.000        0.000        0.000
> OSPF I/O./var/run/ppmd_co      117        0.002        0.000        0.000
> BGP rsync                      192        0.007        0.001        0.000
> BGP_RT_Background               78        0.001        0.000        0.000
> BGP_Listen.0.0.0.0+179        2696        1.101        0.218        0.009
> PIM I/O./var/run/ppmd_con      117        0.003        0.000        0.000
> OSPF                           629        0.005        0.000        0.000
> BGP Standby Cache Task          26        0.000        0.000        0.000
> BFD I/O./var/run/bfdd_con      117        0.003        0.000        0.000
> BGP_2495_2495.164.113.199     1947        0.072        0.012        0.000
> BGP_2495_2495.164.113.199     1566        0.056        0.010        0.000
> BGP_2495_2495.164.113.199     1388        0.053        0.008        0.000
> Resolve tree 3                1421       24.523       13.270        0.102
> Resolve tree 2               14019    16:33.079       20.983        0.101
> Mirror Task.128.0.0.6+584      464        0.018        0.004        0.000
> KRT                           1074        0.157        0.159        0.004
> Redirect                         9        0.000        0.000        0.000
> MGMT_Listen./var/run/rpd_       54        0.009        0.005        0.000
> SNMP Subagent./var/run/sn      258        0.052        0.052        0.001
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>