[j-nsp] MX104 with full BGP table problems
Brad Fleming
bdflemin at gmail.com
Fri May 16 15:28:01 EDT 2014
Thanks for the response; answers inline...
On May 16, 2014, at 1:58 PM, Tyler Christiansen <tyler.christiansen at adap.tv> wrote:
> I don't have experience with the MX104s but do with the rest of the line (MX80 to MX2010 [excluding MX104, of course]). MX80 isn't dual RE, but the CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500 or 600 Mhz faster. And the MX80 kind of chokes when receiving a full feed (even just one at a time can easily send it up to ~40% during the initial feed consumption). ;)
>
> The MX80 and MX104 being sold as edge BGP routers is pretty much only because it has enough memory to do it...not because it's a good idea.
>
> It's pretty odd for the backup RE to have CPU utilization (based on experience with the other dual RE MX devices). Some, yes, but not 100% utilization as you show there. I would buy 100% utilization during initial feed consumption on the master. After you have some stability in the network, though, the CPU should be back down to ~5-15% (depending on what you have going on).
I agree; we’ve run a few M10is and never had this issue, but.. totally different platform, and much older version of Junos made me generally discount it. These are the first multi-RE boxes we’ve had running any Junos newer then 10.0. Thanks for pointing it out, it’s something I missed in my previous email. As the previous output shows, 15min load averages for each RE are ~1.20 so the load remains elevated. I just confirmed that the 15min load average after about 2hours of “sitting” remains ~1.22.
>
> How aggressive are your BGP timers? You may want to consider BFD instead of BGP timers for aggressive keepalives.
BGP timers are default; however, we’ve tried relaxing them with no change in behavior.
>
> Are you doing just plain IPv4 BGP, or are you utilizing MBGP extensions? MBGP extensions can inflate the size of the BGP tables and make the router do more work.
We’ve tried both with no difference in performance. The example outputs in my original message were with MBGP extensions enabled but doing only IPv4 unicast on the session produces the same result.
>
> In all scenarios, you really should probably have loopback IPs in the IGP and have the nexthop set to the loopback IPs for iBGP sessions. I'm not sure why you have /30 P2P links as the next-hops as they're potentially unstable (even if they're not now, they can easily become unstable once in production). I assume that since you mentioned you know it's not recommended, you're going to be changing that.
This is a bit of a legacy issue within our network. We’ve operated for nearly 12 years using the actual PtP in our IGP and retaining it in BGP advertisements. It is something we plan to resolve with the deployment of this gear (as well as several new MX960s that were part of the same PO).
>
> In scenario #2, how many RRs does the MX104 peer with? And are they sending full routes or full routes + more?
The box was only peering with a single RR. The RR was only sending the standard, full table (~496K routes), no VPN, no mcast, etc.
>
> Finally, in scenario #3, if you're trying to do a full mesh with 11 other peers, the MX104 will choke if they're all trying to load full tables. There are about 500,000 routes in the global table, so you're trying to load 5,500,000 routes into a box with a 1.8Ghz CPU and 4GB RAM.
In scenario #3 the total number of routes entering the RE was ~867K with ~496K active.
>
> Regardless, I would think that the MX104 should be perfectly capable of scaling to at least five or six full feeds. I would suspect either a bug in the software or very aggressive timers.
>
> On Fri, May 16, 2014 at 11:00 AM, Brad Fleming <bdflemin at gmail.com> wrote:
> We’ve been working with a handful of MX104s on the bench in preparation of putting them into a live network. We started pushing a full BGP table into the device and stumbled across some CPU utilization problems.
>
> We tried pushing a full table into the box three different ways:
> 1) via an eBGP session
> 2) via a reflected session on an iBGP session
> 3) via a full mesh of iBGP sessions (11 other routers)
>
> In situation #1: RE CPU was slightly elevated but remained ~60% idle and 1min load averages were around 0.3.
>
> In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s for our next-hops (I know, not best practice for many networks) which results in a total of about 50-65 next-hops network-wide.
>
> In situation #3: RE CPU is saturated at all times. In this case we configured the mesh sessions to advertise routes with “next-hop-self” so the number of next-hops is reduced to 11 total.
>
> It appears that RPD Is the process actually killing the CPU; nearly always running 75+% and in a “RUN” state. If we enable task accounting it shows “Resolve Tree 2” as the task consuming tons of CPU time. (see below) There’s plenty of RAM remaining, we’re not using any swap space, and we’ve not exceed the number of routes licensed for the system; we paid for the full 1Million+ route scaling. Logs are full of lost communication with the backup RE; however, if we disable all the BGP sessions that issue goes away completely (for days on end).
>
> Has anyone else tried shoving a full BGP table into one of these routers yet? Have you noticed anything similar?
>
> I’ve opened a JTAC case for the issue but I’m wondering if anyone with more experience in multi-RE setups has seen similar. Thanks in advance for any thoughts, suggestions, or insights.
>
>
> Incoming command output dump….
>
> netadm at test-MX104> show chassis routing-engine
> Routing Engine status:
> Slot 0:
> Current state Master
> Election priority Master (default)
> Temperature 39 degrees C / 102 degrees F
> CPU temperature 42 degrees C / 107 degrees F
> DRAM 3968 MB (4096 MB installed)
> Memory utilization 32 percent
> CPU utilization:
> User 87 percent
> Background 0 percent
> Kernel 11 percent
> Interrupt 2 percent
> Idle 0 percent
> Model RE-MX-104
> Serial ID CACH2444
> Start time 2009-12-31 18:05:43 CST
> Uptime 21 hours, 31 minutes, 32 seconds
> Last reboot reason 0x200:normal shutdown
> Load averages: 1 minute 5 minute 15 minute
> 1.06 1.12 1.23
> Routing Engine status:
> Slot 1:
> Current state Backup
> Election priority Backup (default)
> Temperature 37 degrees C / 98 degrees F
> CPU temperature 38 degrees C / 100 degrees F
> DRAM 3968 MB (4096 MB installed)
> Memory utilization 30 percent
> CPU utilization:
> User 62 percent
> Background 0 percent
> Kernel 15 percent
> Interrupt 24 percent
> Idle 0 percent
> Model RE-MX-104
> Serial ID CACD1529
> Start time 2010-03-18 05:16:34 CDT
> Uptime 21 hours, 45 minutes, 26 seconds
> Last reboot reason 0x200:normal shutdown
> Load averages: 1 minute 5 minute 15 minute
> 1.22 1.19 1.20
>
> netadm at test-MX104> show system processes extensive
> last pid: 20303; load averages: 1.18, 1.14, 1.22 up 0+21:33:35 03:03:41
> 127 processes: 8 running, 99 sleeping, 20 waiting
> Mem: 796M Active, 96M Inact, 308M Wired, 270M Cache, 112M Buf, 2399M Free
> Swap: 1025M Total, 1025M Free
> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
> 3217 root 1 132 0 485M 432M RUN 120:56 72.85% rpd
>
> netadm at test-MX104> show task accounting
> Task accounting is enabled.
>
> Task Started User Time System Time Longest Run
> Scheduler 32294 0.924 0.148 0.000
> Memory 26 0.001 0.000 0.000
> RT 5876 0.947 0.162 0.003
> hakr 6 0.000 0.000 0.000
> OSPF I/O./var/run/ppmd_co 117 0.002 0.000 0.000
> BGP rsync 192 0.007 0.001 0.000
> BGP_RT_Background 78 0.001 0.000 0.000
> BGP_Listen.0.0.0.0+179 2696 1.101 0.218 0.009
> PIM I/O./var/run/ppmd_con 117 0.003 0.000 0.000
> OSPF 629 0.005 0.000 0.000
> BGP Standby Cache Task 26 0.000 0.000 0.000
> BFD I/O./var/run/bfdd_con 117 0.003 0.000 0.000
> BGP_2495_2495.164.113.199 1947 0.072 0.012 0.000
> BGP_2495_2495.164.113.199 1566 0.056 0.010 0.000
> BGP_2495_2495.164.113.199 1388 0.053 0.008 0.000
> Resolve tree 3 1421 24.523 13.270 0.102
> Resolve tree 2 14019 16:33.079 20.983 0.101
> Mirror Task.128.0.0.6+584 464 0.018 0.004 0.000
> KRT 1074 0.157 0.159 0.004
> Redirect 9 0.000 0.000 0.000
> MGMT_Listen./var/run/rpd_ 54 0.009 0.005 0.000
> SNMP Subagent./var/run/sn 258 0.052 0.052 0.001
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
More information about the juniper-nsp
mailing list