[j-nsp] MX80 Sampling - High CPU

Scott Granados scott at granados-llc.net
Wed Sep 24 09:54:41 EDT 2014


+1 here, definitely awaiting these releases.

On Sep 23, 2014, at 4:28 PM, Graham Brown <juniper-nsp at grahambrown.info> wrote:

> 12.3R8 and 13.3R4 are due out anytime now with the fixes in place. I think
> there are many people waiting for these two releases...
> 
> Cheers,
> 
> Graham Brown
> Twitter - @mountainrescuer <https://twitter.com/#!/mountainrescuer>
> LinkedIn <http://www.linkedin.com/in/grahamcbrown>
> 
> On 24 September 2014 03:18, Justin M. Streiner <streiner at cluebyfour.org>
> wrote:
> 
>> Sounds like you are running into bugs PR963060 or PR671136.
>> 
>> This is supposed to be fixed in 12.3R8 which is supposed to be released
>> very soon.
>> 
>> We ran into this behavior on a pair of MX480s and had to disable sampling
>> for the time being.
>> 
>> jms
>> 
>> 
>> On Tue, 23 Sep 2014, Ritz Rojas wrote:
>> 
>> We have a few MX80s (MX80-48T) that we're looking to deploy in certain
>>> applications where they'll be taking full Internet tables (v4 and v6).  We
>>> also have a need to gather flow data on our routers, and have noticed an
>>> interesting trend in the lab.
>>> 
>>> We are not using an MS-MIC currently.
>>> 
>>> This test box is running 12.3R7.7 at the moment, but we've seen this same
>>> thing in 11.4 too.
>>> 
>>> When set up with full Internet routes and sampling is enabled, each time a
>>> commit is made for any change at all, RPD and sampled take turns grinding
>>> the CPU up to 100%, for up to 5-10 minutes or more post-commit, and we see
>>> changes to BGP policy sometimes stall and take a decent amount of time (on
>>> the order of several minutes or more) to actually take effect.
>>> 
>>> First RPD will climb up to almost 100% CPU utilization, chew it for a few
>>> minutes, then it'll go down and sampled will climb up to almost 100% for
>>> it's couple minutes turn and chew a bit.  Then sampled goes back down and
>>> RPD takes back over to 100% for a few more minutes.  Eventually it all
>>> finally calms back down and normalizes back to expected levels.
>>> 
>>> Turn off sampling, and any CPU spikes post-commit are only on the order of
>>> seconds, not minutes, and any policy changes take effect pretty much
>>> immediately.
>>> 
>>> We've seen this regardless of how flow is configured; we've configured
>>> flow
>>> with a "simple" config, as well as inline jflow, pretty much with the same
>>> results.  We're curious if anyone's had any of these same problems with
>>> jflow killing the CPU on MX80s (yeah, I know these PPC boxes are pretty
>>> weak sisters), and if there's any fix beyond the usual "Doctor, it hurts
>>> when I do this, what should I do?".  "Don't do that!".
>>> 
>>> It's a nice feature, shame that using it seems to come with this heavy a
>>> price.
>>> 
>>> As an aside, we also see a bit of a slowdown in the RIB/FIB
>>> learning/purging on BGP session turnup/reset, which we're well aware is a
>>> known issue with sampling enabled, so I won't be shocked if this is just
>>> "how it is".  I'd love to be wrong.
>>> 
>>> Here's our sampling config, quick and dirty, regular and inline jflow, in
>>> case we're missing something.
>>> 
>>> "Normal" Sampling:
>>> 
>>> router> show configuration forwarding-options
>>> sampling {
>>>   input {
>>>       rate 8192;
>>>       run-length 0;
>>>       max-packets-per-second 20000;
>>>   }
>>>   family inet {
>>>       output {
>>>           flow-server x.x.x.x {
>>>               port xxxxx;
>>>               version 5;
>>>           }
>>>       }
>>>   }
>>> }
>>> 
>>> router> show configuration interfaces xe-0/0/0
>>> unit xxx {
>>>   vlan-id xxx;
>>>   family inet {
>>>       sampling {
>>>           input;
>>>           output;
>>>       }
>>> }
>>> 
>>> 
>>> Inline Jflow Sampling:
>>> 
>>> router> show configuration forwarding-options
>>> sampling {
>>>   instance {
>>>       BLAH-INSTANCE {
>>>           input {
>>>               rate 5000;
>>>           }
>>>           family inet {
>>>               output {
>>>                   flow-server x.x.x.x {
>>>                       port xxxx;
>>>                       autonomous-system-type origin;
>>>                       no-local-dump;
>>>                       version-ipfix {
>>>                           template {
>>>                               BLAH-TEMPLATE;
>>>                           }
>>>                       }
>>>                   }
>>>                   inline-jflow {
>>>                       source-address x.x.x.x;
>>>                   }
>>>               }
>>>           }
>>>       }
>>>   }
>>> }
>>> 
>>> router> show configuration chassis
>>> tfeb {
>>>   slot 0 {
>>>       sampling-instance BLAH-INSTANCE;
>>>   }
>>> }
>>> 
>>> 
>>> router> show configuration services
>>> flow-monitoring {
>>>   version-ipfix {
>>>       template BLAH-TEMPLATE {
>>>           flow-active-timeout 10;
>>>           flow-inactive-timeout 10;
>>>           template-refresh-rate {
>>>               packets 10000;
>>>               seconds 10;
>>>           }
>>>           option-refresh-rate {
>>>               packets 10000;
>>>               seconds 10;
>>>           }
>>>           ipv4-template;
>>>       }
>>>   }
>>> }
>>> 
>>> 
>>> router> show configuration interfaces xe-0/0/0
>>> unit xxx {
>>>   vlan-id xxx;
>>>   family inet {
>>>       sampling {
>>>           input;
>>>           output;
>>>       }
>>> }
>>> _______________________________________________
>>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>> 
>>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>> 
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp




More information about the juniper-nsp mailing list