[j-nsp] 512MB ought to be enough for anybody

Truman Boyes truman at suspicious.org
Wed Mar 10 03:23:52 EST 2010


Hi Richard, you bring up some good points. I will chat with some ex  
people on the rpd memory limitation on ex. It doesn't seem to be  
necessary but there may be some design considerations on the static  
value.

Truman


On 10/03/2010, at 8:32 AM, Richard A Steenbergen <ras at e-gerbil.net>  
wrote:

> Ok so, I'm currently beating my head against the inpenetrable wall of
> anti-clue that is JTAC (yes I know what you're asking, when am I not?
> :P), and I've apparently reached a point of impasse where I need to
> solicit some external assistance to help get the point across.
>
> The other day we discovered a neat little issue on the EX8200 (all
> available code), there is a hard coded resource limit being set by RPD
> (not even in the usual places like login.conf class settings that you
> can hack around) that limits the size of the data segment to 512MB.  
> When
> you try to exceed that limit, rpd coredumps like so:
>
> Process (55002,rpd) attempted to exceed RLIMIT_DATA: attempted  
> 524412 KB Max 524288 KB
> pid 55002 (rpd), uid 0: exited on signal 6 (core dumped)
>
> Now, while sane and rational people might see this as a pretty big
> problem, Juniper actually believes that this is working as designed  
> and
> a perfectly good thing. Here is the response I got back from Advanced
> JTAC:
>
>> As per my communication with the engineering, the current limitation
>> of the memory allocation for "RPD" process is sufficient enough  
>> handle
>> 500k+ routes in EX switch, so theoretically we should not see any
>> memory usage issue here. But, there could be other issues such memory
>> leak etc. which can cause process to hog more memory. It is important
>> to analyze core dump of "rpd" so that we can look into root cause of
>> the issue.
>
> I of course tried to explain the concept of multiple paths learned  
> from
> multiple neighbors in the RIB vs the routes exported to the FIB, and
> that my 512MB of rpd utilization was perfectly normal considering the
> number of BGP paths we had (which for us is actually pretty darn  
> small,
> most of our MX960 routers are doing closer to 1GB in rpd):
>
> Groups: 15 Peers: 14 Down peers: 1
> Table          Tot Paths  Act Paths Suppressed    History Damp  
> State    Pending
> inet.0            933817     332257          0          0           
> 0          0
> inetflow.0            50         25          0          0           
> 0          0
> inet6.0             4774       2545          0          0           
> 0          0
>
> But they've flatly refused to believe me that this is normal and  
> that a
> 512MB cap is very very broken, and continue to try and "find the  
> source
> of the memory leak". That I'm still having this argument with them,  
> and
> that EX engineering doesn't understand 512MB doesn't support that many
> paths, frankly boggles the mind.
>
> I sortof understand why they think they need to cap the memory usage  
> of
> rpd. One of the problems with the EX platform is that they don't ship
> any real storage on the RE, for example this 8208 RE has only 2GB  
> TOTAL,
> with very little free space:
>
> Filesystem       Size    Used   Avail Capacity  Mounted on
> /dev/da0s1a      366M    123M    214M    37%    /
> /dev/da0s1f      244M     20M    205M     9%    /var
> /dev/da0s3d      630M    612K    579M     0%    /var/tmp
> /dev/da0s3e      111M    1.8M    100M     2%    /config
>
> How they plan to handle writing 2GB dumps to disk when the kernel  
> panics
> is beyond me, this available space (after I removed EVERYTHING  
> possible)
> wasn't even enough for me to untar the rpd coredump and gdb it  
> locally.
> But the other consequence to no real storage is no swap, so when the
> router does run out of memory things are going to go south in a hurry.
> That said, at the point rpd is crashing there is almost 1GB of ram  
> left
> in the free state, so clearly 512MB is far too low of a limit for
> practical use. The problem itself is bad enough, but the bigger  
> problem
> here is that these guys really don't seem to understand why this is a
> bad thing.
>
> So, can somebody at Juniper please go break the glass on the emergency
> cluebat, go find the EX guys, and beat them upside the head with it
> until they get detached retinas? Pretty please? :)
>
> -- 
> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1  
> 2CBC)
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list