[j-nsp] 512MB ought to be enough for anybody

Tue Mar 9 16:32:27 EST 2010

Ok so, I'm currently beating my head against the inpenetrable wall of
anti-clue that is JTAC (yes I know what you're asking, when am I not? 
:P), and I've apparently reached a point of impasse where I need to 
solicit some external assistance to help get the point across.

The other day we discovered a neat little issue on the EX8200 (all
available code), there is a hard coded resource limit being set by RPD
(not even in the usual places like login.conf class settings that you
can hack around) that limits the size of the data segment to 512MB. When
you try to exceed that limit, rpd coredumps like so:

Process (55002,rpd) attempted to exceed RLIMIT_DATA: attempted 524412 KB Max 524288 KB 
pid 55002 (rpd), uid 0: exited on signal 6 (core dumped)

Now, while sane and rational people might see this as a pretty big
problem, Juniper actually believes that this is working as designed and
a perfectly good thing. Here is the response I got back from Advanced
JTAC:

> As per my communication with the engineering, the current limitation  
> of the memory allocation for "RPD" process is sufficient enough handle 
> 500k+ routes in EX switch, so theoretically we should not see any 
> memory usage issue here. But, there could be other issues such memory 
> leak etc. which can cause process to hog more memory. It is important 
> to analyze core dump of "rpd" so that we can look into root cause of 
> the issue. 

I of course tried to explain the concept of multiple paths learned from
multiple neighbors in the RIB vs the routes exported to the FIB, and
that my 512MB of rpd utilization was perfectly normal considering the
number of BGP paths we had (which for us is actually pretty darn small,
most of our MX960 routers are doing closer to 1GB in rpd):

Groups: 15 Peers: 14 Down peers: 1
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0            933817     332257          0          0          0          0
inetflow.0            50         25          0          0          0          0
inet6.0             4774       2545          0          0          0          0

But they've flatly refused to believe me that this is normal and that a 
512MB cap is very very broken, and continue to try and "find the source 
of the memory leak". That I'm still having this argument with them, and 
that EX engineering doesn't understand 512MB doesn't support that many 
paths, frankly boggles the mind.

I sortof understand why they think they need to cap the memory usage of 
rpd. One of the problems with the EX platform is that they don't ship 
any real storage on the RE, for example this 8208 RE has only 2GB TOTAL, 
with very little free space:

Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/da0s1a      366M    123M    214M    37%    /
/dev/da0s1f      244M     20M    205M     9%    /var
/dev/da0s3d      630M    612K    579M     0%    /var/tmp
/dev/da0s3e      111M    1.8M    100M     2%    /config

How they plan to handle writing 2GB dumps to disk when the kernel panics
is beyond me, this available space (after I removed EVERYTHING possible)
wasn't even enough for me to untar the rpd coredump and gdb it locally.
But the other consequence to no real storage is no swap, so when the
router does run out of memory things are going to go south in a hurry.
That said, at the point rpd is crashing there is almost 1GB of ram left
in the free state, so clearly 512MB is far too low of a limit for
practical use. The problem itself is bad enough, but the bigger problem
here is that these guys really don't seem to understand why this is a
bad thing.

So, can somebody at Juniper please go break the glass on the emergency 
cluebat, go find the EX guys, and beat them upside the head with it 
until they get detached retinas? Pretty please? :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)