[j-nsp] MX80 Sampling - High CPU

Justin M. Streiner streiner at cluebyfour.org
Tue Sep 23 11:18:25 EDT 2014


Sounds like you are running into bugs PR963060 or PR671136.

This is supposed to be fixed in 12.3R8 which is supposed to be released 
very soon.

We ran into this behavior on a pair of MX480s and had to disable sampling 
for the time being.

jms

On Tue, 23 Sep 2014, Ritz Rojas wrote:

> We have a few MX80s (MX80-48T) that we're looking to deploy in certain
> applications where they'll be taking full Internet tables (v4 and v6).  We
> also have a need to gather flow data on our routers, and have noticed an
> interesting trend in the lab.
>
> We are not using an MS-MIC currently.
>
> This test box is running 12.3R7.7 at the moment, but we've seen this same
> thing in 11.4 too.
>
> When set up with full Internet routes and sampling is enabled, each time a
> commit is made for any change at all, RPD and sampled take turns grinding
> the CPU up to 100%, for up to 5-10 minutes or more post-commit, and we see
> changes to BGP policy sometimes stall and take a decent amount of time (on
> the order of several minutes or more) to actually take effect.
>
> First RPD will climb up to almost 100% CPU utilization, chew it for a few
> minutes, then it'll go down and sampled will climb up to almost 100% for
> it's couple minutes turn and chew a bit.  Then sampled goes back down and
> RPD takes back over to 100% for a few more minutes.  Eventually it all
> finally calms back down and normalizes back to expected levels.
>
> Turn off sampling, and any CPU spikes post-commit are only on the order of
> seconds, not minutes, and any policy changes take effect pretty much
> immediately.
>
> We've seen this regardless of how flow is configured; we've configured flow
> with a "simple" config, as well as inline jflow, pretty much with the same
> results.  We're curious if anyone's had any of these same problems with
> jflow killing the CPU on MX80s (yeah, I know these PPC boxes are pretty
> weak sisters), and if there's any fix beyond the usual "Doctor, it hurts
> when I do this, what should I do?".  "Don't do that!".
>
> It's a nice feature, shame that using it seems to come with this heavy a
> price.
>
> As an aside, we also see a bit of a slowdown in the RIB/FIB
> learning/purging on BGP session turnup/reset, which we're well aware is a
> known issue with sampling enabled, so I won't be shocked if this is just
> "how it is".  I'd love to be wrong.
>
> Here's our sampling config, quick and dirty, regular and inline jflow, in
> case we're missing something.
>
> "Normal" Sampling:
>
> router> show configuration forwarding-options
> sampling {
>    input {
>        rate 8192;
>        run-length 0;
>        max-packets-per-second 20000;
>    }
>    family inet {
>        output {
>            flow-server x.x.x.x {
>                port xxxxx;
>                version 5;
>            }
>        }
>    }
> }
>
> router> show configuration interfaces xe-0/0/0
> unit xxx {
>    vlan-id xxx;
>    family inet {
>        sampling {
>            input;
>            output;
>        }
> }
>
>
> Inline Jflow Sampling:
>
> router> show configuration forwarding-options
> sampling {
>    instance {
>        BLAH-INSTANCE {
>            input {
>                rate 5000;
>            }
>            family inet {
>                output {
>                    flow-server x.x.x.x {
>                        port xxxx;
>                        autonomous-system-type origin;
>                        no-local-dump;
>                        version-ipfix {
>                            template {
>                                BLAH-TEMPLATE;
>                            }
>                        }
>                    }
>                    inline-jflow {
>                        source-address x.x.x.x;
>                    }
>                }
>            }
>        }
>    }
> }
>
> router> show configuration chassis
> tfeb {
>    slot 0 {
>        sampling-instance BLAH-INSTANCE;
>    }
> }
>
>
> router> show configuration services
> flow-monitoring {
>    version-ipfix {
>        template BLAH-TEMPLATE {
>            flow-active-timeout 10;
>            flow-inactive-timeout 10;
>            template-refresh-rate {
>                packets 10000;
>                seconds 10;
>            }
>            option-refresh-rate {
>                packets 10000;
>                seconds 10;
>            }
>            ipv4-template;
>        }
>    }
> }
>
>
> router> show configuration interfaces xe-0/0/0
> unit xxx {
>    vlan-id xxx;
>    family inet {
>        sampling {
>            input;
>            output;
>        }
> }
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list