[j-nsp] MX80 Sampling - High CPU

Graham Brown juniper-nsp at grahambrown.info
Tue Sep 23 16:28:52 EDT 2014


12.3R8 and 13.3R4 are due out anytime now with the fixes in place. I think
there are many people waiting for these two releases...

Cheers,

Graham Brown
Twitter - @mountainrescuer <https://twitter.com/#!/mountainrescuer>
LinkedIn <http://www.linkedin.com/in/grahamcbrown>

On 24 September 2014 03:18, Justin M. Streiner <streiner at cluebyfour.org>
wrote:

> Sounds like you are running into bugs PR963060 or PR671136.
>
> This is supposed to be fixed in 12.3R8 which is supposed to be released
> very soon.
>
> We ran into this behavior on a pair of MX480s and had to disable sampling
> for the time being.
>
> jms
>
>
> On Tue, 23 Sep 2014, Ritz Rojas wrote:
>
>  We have a few MX80s (MX80-48T) that we're looking to deploy in certain
>> applications where they'll be taking full Internet tables (v4 and v6).  We
>> also have a need to gather flow data on our routers, and have noticed an
>> interesting trend in the lab.
>>
>> We are not using an MS-MIC currently.
>>
>> This test box is running 12.3R7.7 at the moment, but we've seen this same
>> thing in 11.4 too.
>>
>> When set up with full Internet routes and sampling is enabled, each time a
>> commit is made for any change at all, RPD and sampled take turns grinding
>> the CPU up to 100%, for up to 5-10 minutes or more post-commit, and we see
>> changes to BGP policy sometimes stall and take a decent amount of time (on
>> the order of several minutes or more) to actually take effect.
>>
>> First RPD will climb up to almost 100% CPU utilization, chew it for a few
>> minutes, then it'll go down and sampled will climb up to almost 100% for
>> it's couple minutes turn and chew a bit.  Then sampled goes back down and
>> RPD takes back over to 100% for a few more minutes.  Eventually it all
>> finally calms back down and normalizes back to expected levels.
>>
>> Turn off sampling, and any CPU spikes post-commit are only on the order of
>> seconds, not minutes, and any policy changes take effect pretty much
>> immediately.
>>
>> We've seen this regardless of how flow is configured; we've configured
>> flow
>> with a "simple" config, as well as inline jflow, pretty much with the same
>> results.  We're curious if anyone's had any of these same problems with
>> jflow killing the CPU on MX80s (yeah, I know these PPC boxes are pretty
>> weak sisters), and if there's any fix beyond the usual "Doctor, it hurts
>> when I do this, what should I do?".  "Don't do that!".
>>
>> It's a nice feature, shame that using it seems to come with this heavy a
>> price.
>>
>> As an aside, we also see a bit of a slowdown in the RIB/FIB
>> learning/purging on BGP session turnup/reset, which we're well aware is a
>> known issue with sampling enabled, so I won't be shocked if this is just
>> "how it is".  I'd love to be wrong.
>>
>> Here's our sampling config, quick and dirty, regular and inline jflow, in
>> case we're missing something.
>>
>> "Normal" Sampling:
>>
>> router> show configuration forwarding-options
>> sampling {
>>    input {
>>        rate 8192;
>>        run-length 0;
>>        max-packets-per-second 20000;
>>    }
>>    family inet {
>>        output {
>>            flow-server x.x.x.x {
>>                port xxxxx;
>>                version 5;
>>            }
>>        }
>>    }
>> }
>>
>> router> show configuration interfaces xe-0/0/0
>> unit xxx {
>>    vlan-id xxx;
>>    family inet {
>>        sampling {
>>            input;
>>            output;
>>        }
>> }
>>
>>
>> Inline Jflow Sampling:
>>
>> router> show configuration forwarding-options
>> sampling {
>>    instance {
>>        BLAH-INSTANCE {
>>            input {
>>                rate 5000;
>>            }
>>            family inet {
>>                output {
>>                    flow-server x.x.x.x {
>>                        port xxxx;
>>                        autonomous-system-type origin;
>>                        no-local-dump;
>>                        version-ipfix {
>>                            template {
>>                                BLAH-TEMPLATE;
>>                            }
>>                        }
>>                    }
>>                    inline-jflow {
>>                        source-address x.x.x.x;
>>                    }
>>                }
>>            }
>>        }
>>    }
>> }
>>
>> router> show configuration chassis
>> tfeb {
>>    slot 0 {
>>        sampling-instance BLAH-INSTANCE;
>>    }
>> }
>>
>>
>> router> show configuration services
>> flow-monitoring {
>>    version-ipfix {
>>        template BLAH-TEMPLATE {
>>            flow-active-timeout 10;
>>            flow-inactive-timeout 10;
>>            template-refresh-rate {
>>                packets 10000;
>>                seconds 10;
>>            }
>>            option-refresh-rate {
>>                packets 10000;
>>                seconds 10;
>>            }
>>            ipv4-template;
>>        }
>>    }
>> }
>>
>>
>> router> show configuration interfaces xe-0/0/0
>> unit xxx {
>>    vlan-id xxx;
>>    family inet {
>>        sampling {
>>            input;
>>            output;
>>        }
>> }
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>>  _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list