[j-nsp] MX80 Sampling - High CPU
Scott Granados
scott at granados-llc.net
Wed Sep 24 09:54:41 EDT 2014
+1 here, definitely awaiting these releases.
On Sep 23, 2014, at 4:28 PM, Graham Brown <juniper-nsp at grahambrown.info> wrote:
> 12.3R8 and 13.3R4 are due out anytime now with the fixes in place. I think
> there are many people waiting for these two releases...
>
> Cheers,
>
> Graham Brown
> Twitter - @mountainrescuer <https://twitter.com/#!/mountainrescuer>
> LinkedIn <http://www.linkedin.com/in/grahamcbrown>
>
> On 24 September 2014 03:18, Justin M. Streiner <streiner at cluebyfour.org>
> wrote:
>
>> Sounds like you are running into bugs PR963060 or PR671136.
>>
>> This is supposed to be fixed in 12.3R8 which is supposed to be released
>> very soon.
>>
>> We ran into this behavior on a pair of MX480s and had to disable sampling
>> for the time being.
>>
>> jms
>>
>>
>> On Tue, 23 Sep 2014, Ritz Rojas wrote:
>>
>> We have a few MX80s (MX80-48T) that we're looking to deploy in certain
>>> applications where they'll be taking full Internet tables (v4 and v6). We
>>> also have a need to gather flow data on our routers, and have noticed an
>>> interesting trend in the lab.
>>>
>>> We are not using an MS-MIC currently.
>>>
>>> This test box is running 12.3R7.7 at the moment, but we've seen this same
>>> thing in 11.4 too.
>>>
>>> When set up with full Internet routes and sampling is enabled, each time a
>>> commit is made for any change at all, RPD and sampled take turns grinding
>>> the CPU up to 100%, for up to 5-10 minutes or more post-commit, and we see
>>> changes to BGP policy sometimes stall and take a decent amount of time (on
>>> the order of several minutes or more) to actually take effect.
>>>
>>> First RPD will climb up to almost 100% CPU utilization, chew it for a few
>>> minutes, then it'll go down and sampled will climb up to almost 100% for
>>> it's couple minutes turn and chew a bit. Then sampled goes back down and
>>> RPD takes back over to 100% for a few more minutes. Eventually it all
>>> finally calms back down and normalizes back to expected levels.
>>>
>>> Turn off sampling, and any CPU spikes post-commit are only on the order of
>>> seconds, not minutes, and any policy changes take effect pretty much
>>> immediately.
>>>
>>> We've seen this regardless of how flow is configured; we've configured
>>> flow
>>> with a "simple" config, as well as inline jflow, pretty much with the same
>>> results. We're curious if anyone's had any of these same problems with
>>> jflow killing the CPU on MX80s (yeah, I know these PPC boxes are pretty
>>> weak sisters), and if there's any fix beyond the usual "Doctor, it hurts
>>> when I do this, what should I do?". "Don't do that!".
>>>
>>> It's a nice feature, shame that using it seems to come with this heavy a
>>> price.
>>>
>>> As an aside, we also see a bit of a slowdown in the RIB/FIB
>>> learning/purging on BGP session turnup/reset, which we're well aware is a
>>> known issue with sampling enabled, so I won't be shocked if this is just
>>> "how it is". I'd love to be wrong.
>>>
>>> Here's our sampling config, quick and dirty, regular and inline jflow, in
>>> case we're missing something.
>>>
>>> "Normal" Sampling:
>>>
>>> router> show configuration forwarding-options
>>> sampling {
>>> input {
>>> rate 8192;
>>> run-length 0;
>>> max-packets-per-second 20000;
>>> }
>>> family inet {
>>> output {
>>> flow-server x.x.x.x {
>>> port xxxxx;
>>> version 5;
>>> }
>>> }
>>> }
>>> }
>>>
>>> router> show configuration interfaces xe-0/0/0
>>> unit xxx {
>>> vlan-id xxx;
>>> family inet {
>>> sampling {
>>> input;
>>> output;
>>> }
>>> }
>>>
>>>
>>> Inline Jflow Sampling:
>>>
>>> router> show configuration forwarding-options
>>> sampling {
>>> instance {
>>> BLAH-INSTANCE {
>>> input {
>>> rate 5000;
>>> }
>>> family inet {
>>> output {
>>> flow-server x.x.x.x {
>>> port xxxx;
>>> autonomous-system-type origin;
>>> no-local-dump;
>>> version-ipfix {
>>> template {
>>> BLAH-TEMPLATE;
>>> }
>>> }
>>> }
>>> inline-jflow {
>>> source-address x.x.x.x;
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>> router> show configuration chassis
>>> tfeb {
>>> slot 0 {
>>> sampling-instance BLAH-INSTANCE;
>>> }
>>> }
>>>
>>>
>>> router> show configuration services
>>> flow-monitoring {
>>> version-ipfix {
>>> template BLAH-TEMPLATE {
>>> flow-active-timeout 10;
>>> flow-inactive-timeout 10;
>>> template-refresh-rate {
>>> packets 10000;
>>> seconds 10;
>>> }
>>> option-refresh-rate {
>>> packets 10000;
>>> seconds 10;
>>> }
>>> ipv4-template;
>>> }
>>> }
>>> }
>>>
>>>
>>> router> show configuration interfaces xe-0/0/0
>>> unit xxx {
>>> vlan-id xxx;
>>> family inet {
>>> sampling {
>>> input;
>>> output;
>>> }
>>> }
>>> _______________________________________________
>>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>>
>>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
More information about the juniper-nsp
mailing list