[j-nsp] auto b/w mpls best practice -- cpu spikes

Saku Ytti saku at ytti.fi
Thu Sep 13 13:11:07 EDT 2018


RPD was single threaded application until very recently, so your RSVP
would compete to access to single core with every other task. Perhaps
not huge deal if you are BGP free MPLS core, but if you are not, then
you're going to see massive improvement by running later JunOS with
multithreaded RPD, there are only very few threads, one of them
happens to be RSVP, it was added in initial release of multithreaded
RPD, because that is where practical deployments will most benefit
from multicore.

You can talk to your account team about what code improvements have
come after RSVP was multithreaded, there are orders of magnitude
convergence benefits in real customer networks by changing nothing but
JunOS release.
OP's release does not support lsp self ping nor adaptive tear down,
both which are needed for make-before-break to actually work, and not
just hope it works.

Juniper has also done fundamental changes on how they develop and
release and have gone back to single branch model, from which you can
start capitalising on after 17.3, IIRC. You can talk to your account
team to substantiate quality improvements with data, such as how many
bugs found in releases over time at specific spot of release cycle.

Generally strategy should be when you need new software

- pick latest long term supporter to test
- if it fails your test, go back to step1 with latest-1
- if it succeeds test, change to newer rebuild if you have bugs and if
you need new features restart the process

New software is bad, old software is good adage is not data driven. We
also need to understand what vendor is doing, how are they developing,
how are they testing, how are they releasing and when they are
changing something, in which release will the changes appear and what
does data tell about success of their efforts.
In my mind all major vendors have significantly improved their story
in past few years, it won't say anything meaningful about success in
any specific deployment, but I buy vendors' story and I believe on
average new releases are more successful today than they were say just
3 year ago.


On Thu, 13 Sep 2018 at 18:12, Tom Beecher <beecher at beecher.cc> wrote:
>
> There's no one magic knob that fixes CPU spikes in an MPLS environment. They're all different. What I change to optimize mine might knock your network over in 5 minutes. You need to determine what is triggering the churn before you can reasonable optimize it. Take a look at logs and see what is causing path changes that cause CPU spikes, work from there.
>
> Having pre-signaled secondary paths will generally always be a good idea, although with those try to use the sync-active-path-bandwidth command too to prevent stale secondary RSVP reservations. Make-before-break is almost universally a good idea too.
>
> On code, personally I wouldn't ever go latest and greatest. It usually means you just find the latest and greatest bugs. :) I go with the newest stable version that doesnt have bugs that screw me, and upgrade only when it's a feature I need, optimization I want, or security reasons.
>
> Find your churn causes, work from there.
>
> On Thu, Sep 13, 2018 at 7:40 AM Saku Ytti <saku at ytti.fi> wrote:
>>
>> I think 16.1 was first.
>>
>> ps Haux|grep rpd should show multiple rpd lines.
>>
>> Also
>> ytti at r41.labxtx01.us.bb> show task io |match {
>>  KRT IO task                          0       0       0       0
>> 0         {krtio-th}
>>  krtio-th                             0       0       0       0
>> 0         {krtio-th}
>>  krt ioth solic client                0       0     869       0
>> 0         {krtio-th}
>>  KRT IO                               0       0       0       0
>> 0         {krtio-th}
>>  bgpio-0-th                           0       0       0       0
>> 0         {bgpio-0-th}
>>  rsvp-io                              0       0       0       0
>> 0         {rsvp-io}
>>  jtrace_jthr_task                     0       0       0       0
>> 0         {TraceThread}
>>
>> I'd just go latest and greatest.
>> On Thu, 13 Sep 2018 at 12:13, tim tiriche <tim.tiriche at gmail.com> wrote:
>> >
>> > .o issues with convergence or suboptimal paths.  The noc is constantly seeing high cpu alerts and that was concerning.  Is this normal in other networks?
>> >
>> > Running 14.1R7.4 with mx480/240 mix.
>> > I usually follow the code listed here: https://kb.juniper.net/InfoCenter/index?page=content&id=KB21476
>> >
>> > Which code version have these optimization happened in?
>> >
>> >
>> > On Wed, Sep 12, 2018 at 2:11 AM Saku Ytti <saku at ytti.fi> wrote:
>> >>
>> >> Hey Tim,
>> >>
>> >> I'd optimise for customer experience, not CPU utilisation. Do you have
>> >> issues with convergence time, suboptimal paths?
>> >>
>> >> Which JunOS you're running? There are quite good reasons to jump in
>> >> recent JunOS for RSVP, as you can get RSVP its own core, and you can
>> >> get make-before-break LSP reoptimisation, which actually works
>> >> event-driven rather than timer based (like what you have, causing LSP
>> >> blackholing if LSP convergence lasts longer than timers).
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, 12 Sep 2018 at 08:05, tim tiriche <tim.tiriche at gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Attached is my MPLS Auto B/w Configuration and i see frequent path changes
>> >> > and cpu spikes.  I have a small network and wanted to know if there is any
>> >> > optimization/best practices i could follow to reduce the churn.
>> >> >
>> >> > protocols {
>> >> >     mpls {
>> >> >         statistics {
>> >> >             file mpls.statistics size 1m files 10;
>> >> >             interval 300;
>> >> >             auto-bandwidth;
>> >> >         }
>> >> >         log-updown {
>> >> >             syslog;
>> >> >             trap;
>> >> >             trap-path-down;
>> >> >             trap-path-up;
>> >> >         }
>> >> >         traffic-engineering mpls-forwarding;
>> >> >
>> >> >         rsvp-error-hold-time 25;
>> >> >         smart-optimize-timer 180;
>> >> >         ipv6-tunneling;
>> >> >         optimize-timer 3600;
>> >> >         label-switched-path <*> {
>> >> >             retry-timer 600;
>> >> >             random;
>> >> >             node-link-protection;
>> >> >             adaptive;
>> >> >             auto-bandwidth {
>> >> >                 adjust-interval 7200;
>> >> >                 adjust-threshold 20;
>> >> >                 minimum-bandwidth 1m;
>> >> >                 maximum-bandwidth 9g;
>> >> >                 adjust-threshold-overflow-limit 2;
>> >> >                 adjust-threshold-underflow-limit 4;
>> >> >             }
>> >> >             primary <*> {
>> >> >                 priority 5 5;
>> >> >             }
>> >> >         }
>> >> > _______________________________________________
>> >> > juniper-nsp mailing list juniper-nsp at puck.nether.net
>> >> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>> >>
>> >>
>> >>
>> >> --
>> >>   ++ytti
>>
>>
>>
>> --
>>   ++ytti
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti


More information about the juniper-nsp mailing list