[j-nsp] MX80 Sampling - High CPU
Graham Brown
juniper-nsp at grahambrown.info
Tue Sep 23 16:28:52 EDT 2014
12.3R8 and 13.3R4 are due out anytime now with the fixes in place. I think
there are many people waiting for these two releases...
Cheers,
Graham Brown
Twitter - @mountainrescuer <https://twitter.com/#!/mountainrescuer>
LinkedIn <http://www.linkedin.com/in/grahamcbrown>
On 24 September 2014 03:18, Justin M. Streiner <streiner at cluebyfour.org>
wrote:
> Sounds like you are running into bugs PR963060 or PR671136.
>
> This is supposed to be fixed in 12.3R8 which is supposed to be released
> very soon.
>
> We ran into this behavior on a pair of MX480s and had to disable sampling
> for the time being.
>
> jms
>
>
> On Tue, 23 Sep 2014, Ritz Rojas wrote:
>
> We have a few MX80s (MX80-48T) that we're looking to deploy in certain
>> applications where they'll be taking full Internet tables (v4 and v6). We
>> also have a need to gather flow data on our routers, and have noticed an
>> interesting trend in the lab.
>>
>> We are not using an MS-MIC currently.
>>
>> This test box is running 12.3R7.7 at the moment, but we've seen this same
>> thing in 11.4 too.
>>
>> When set up with full Internet routes and sampling is enabled, each time a
>> commit is made for any change at all, RPD and sampled take turns grinding
>> the CPU up to 100%, for up to 5-10 minutes or more post-commit, and we see
>> changes to BGP policy sometimes stall and take a decent amount of time (on
>> the order of several minutes or more) to actually take effect.
>>
>> First RPD will climb up to almost 100% CPU utilization, chew it for a few
>> minutes, then it'll go down and sampled will climb up to almost 100% for
>> it's couple minutes turn and chew a bit. Then sampled goes back down and
>> RPD takes back over to 100% for a few more minutes. Eventually it all
>> finally calms back down and normalizes back to expected levels.
>>
>> Turn off sampling, and any CPU spikes post-commit are only on the order of
>> seconds, not minutes, and any policy changes take effect pretty much
>> immediately.
>>
>> We've seen this regardless of how flow is configured; we've configured
>> flow
>> with a "simple" config, as well as inline jflow, pretty much with the same
>> results. We're curious if anyone's had any of these same problems with
>> jflow killing the CPU on MX80s (yeah, I know these PPC boxes are pretty
>> weak sisters), and if there's any fix beyond the usual "Doctor, it hurts
>> when I do this, what should I do?". "Don't do that!".
>>
>> It's a nice feature, shame that using it seems to come with this heavy a
>> price.
>>
>> As an aside, we also see a bit of a slowdown in the RIB/FIB
>> learning/purging on BGP session turnup/reset, which we're well aware is a
>> known issue with sampling enabled, so I won't be shocked if this is just
>> "how it is". I'd love to be wrong.
>>
>> Here's our sampling config, quick and dirty, regular and inline jflow, in
>> case we're missing something.
>>
>> "Normal" Sampling:
>>
>> router> show configuration forwarding-options
>> sampling {
>> input {
>> rate 8192;
>> run-length 0;
>> max-packets-per-second 20000;
>> }
>> family inet {
>> output {
>> flow-server x.x.x.x {
>> port xxxxx;
>> version 5;
>> }
>> }
>> }
>> }
>>
>> router> show configuration interfaces xe-0/0/0
>> unit xxx {
>> vlan-id xxx;
>> family inet {
>> sampling {
>> input;
>> output;
>> }
>> }
>>
>>
>> Inline Jflow Sampling:
>>
>> router> show configuration forwarding-options
>> sampling {
>> instance {
>> BLAH-INSTANCE {
>> input {
>> rate 5000;
>> }
>> family inet {
>> output {
>> flow-server x.x.x.x {
>> port xxxx;
>> autonomous-system-type origin;
>> no-local-dump;
>> version-ipfix {
>> template {
>> BLAH-TEMPLATE;
>> }
>> }
>> }
>> inline-jflow {
>> source-address x.x.x.x;
>> }
>> }
>> }
>> }
>> }
>> }
>>
>> router> show configuration chassis
>> tfeb {
>> slot 0 {
>> sampling-instance BLAH-INSTANCE;
>> }
>> }
>>
>>
>> router> show configuration services
>> flow-monitoring {
>> version-ipfix {
>> template BLAH-TEMPLATE {
>> flow-active-timeout 10;
>> flow-inactive-timeout 10;
>> template-refresh-rate {
>> packets 10000;
>> seconds 10;
>> }
>> option-refresh-rate {
>> packets 10000;
>> seconds 10;
>> }
>> ipv4-template;
>> }
>> }
>> }
>>
>>
>> router> show configuration interfaces xe-0/0/0
>> unit xxx {
>> vlan-id xxx;
>> family inet {
>> sampling {
>> input;
>> output;
>> }
>> }
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
More information about the juniper-nsp
mailing list