[j-nsp] Auto-bandwidth Accuracy

Phil Bedard philxor at gmail.com
Tue May 25 16:04:50 EDT 2010


Do you have the detail on PR 457767, it doesn't seem to show up in the system.   I tried to duplicate 438157 and could never successfully do so.  

Thanks, 
Phil  

On May 25, 2010, at 3:23 PM, Olson, Martin wrote:

> Yeah, I found the same behavior.  Sometimes the Max AvgBW would go up by 5X-7X the value it should've, which would lead to really high reservations after the next adjust-interval.  I opened case 2009-0610-0697 about the issue, and after a while they traced the problem to PRs 438157 and 457767.  The first code with the fix for both PRs is 9.6R2/9.5R3/9.4R4/9.3R5.  They told us that if we disabled the adjust-threshold-overflow-limit in the meantime, that would alleviate the problem until we upgrade code.
> 
> -MO
> 
> 
> -----Original Message-----
> From: Danny Vernals [mailto:danny.vernals at gmail.com] 
> Sent: Tuesday, May 25, 2010 5:18 AM
> To: Richard A Steenbergen
> Cc: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] Auto-bandwidth Accuracy
> 
> On Sun, May 23, 2010 at 7:52 AM, Richard A Steenbergen <ras at e-gerbil.net> wrote:
>> Recently I've been noticing some really odd auto-bandwidth behavior on
>> several different routers, and I'm wondering if anybody knows if this is
>> a known bug or if I'm doing something really wrong in my autobw config.
>> 
>> Specifically, I'm seeing many cases where the rsvp reservations on an
>> interface are vastly higher than the actual traffic going over it. I
>> started comparing autobw measures bandwidth value vs rsvp resv bandwidth
>> across my LSPs (with an op script :P), and noticed that a large number
>> of LSPs that were ingress on Juniper routers were consistently reserving
>> more bandwidth than they were actually passing.
>> 
>> To troubleshoot this further, I picked one LSP at random and followed it
>> through the course of an entire adjust-interval. I also watched it in
>> "monitor label-switched-path", and followed the bandwidth recorded for
>> it in the mpls stats file. The mpls stats file pretty consistently
>> recorded a bandwidth of around 900Mbps. Some samples were up to 1G, some
>> were down in the 800Mb's, but nothing was significantly outside this
>> range:
>> 
>> xxx.xxxx-xxx.xxxx-BRONZE-1     20442770 pkt    21800398308 Byte  91864 pps 97826023 Bps Util 43.47%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     25748678 pkt    27500224526 Byte  89930 pps 96607224 Bps Util 42.93%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     31309754 pkt    33516047564 Byte  95880 pps 103721086 Bps Util 46.09%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     36934965 pkt    39389728013 Byte  90729 pps 94736781 Bps Util 42.10%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     41323164 pkt    44001156442 Byte  86043 pps 90420165 Bps Util 40.18%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     46229207 pkt    49166295068 Byte  84586 pps 89054114 Bps Util 39.58%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     51764861 pkt    55023074603 Byte  92260 pps 97612992 Bps Util 43.38%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     57091315 pkt    60691783494 Byte  90278 pps 96079811 Bps Util 42.70%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     62138489 pkt    66009079194 Byte  90128 pps 94951708 Bps Util 42.20%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     67697838 pkt    72030553645 Byte  92655 pps 100357907 Bps Util 44.60%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     73083250 pkt    77870203449 Byte  89756 pps 97327496 Bps Util 43.25%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     78530642 pkt    83799427998 Byte  90789 pps 98820409 Bps Util 43.91%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     84166327 pkt    89767404007 Byte  85389 pps 90423878 Bps Util 40.18%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     89990750 pkt    96052103366 Byte  85653 pps 92422049 Bps Util 41.07%
>> xxx.xxxx-xxx.xxxx-BRONZE-1     94808838 pkt   101299936674 Byte  87601 pps 95415151 Bps Util 42.40%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    100044983 pkt   106918990604 Byte  83113 pps 89191332 Bps Util 39.64%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    104706036 pkt   111928263183 Byte  86315 pps 92764307 Bps Util 41.22%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    109664547 pkt   117256403183 Byte  81287 pps 87346557 Bps Util 38.82%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    115001230 pkt   123065374817 Byte  84709 pps 92205898 Bps Util 40.98%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    120197917 pkt   128761293505 Byte  85191 pps 93375716 Bps Util 41.50%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    124790487 pkt   133783111501 Byte  79182 pps 86583068 Bps Util 38.48%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    129450091 pkt   138908431043 Byte  84720 pps 93187628 Bps Util 41.41%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    134048794 pkt   143940227806 Byte  82119 pps 89853513 Bps Util 39.93%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    138900130 pkt   149257983679 Byte  80855 pps 88629264 Bps Util 39.39%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    143665805 pkt   154447812210 Byte  79427 pps 86497142 Bps Util 38.44%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    148501587 pkt   159667032930 Byte  80596 pps 86987012 Bps Util 38.66%
>> xxx.xxxx-xxx.xxxx-BRONZE-1    153971586 pkt   165650360517 Byte  78142 pps 85476108 Bps Util 37.99%
>> 
>> Next, I watched the output of "show mpls lsp name BLAH detail", looking
>> at the autobw measured amount (Max AvgBW) and the reserved bandwidth.
>> I'm using a stats interval of 60 seconds, an adjust-interval of 900
>> seconds, and in this instance no overflow samples occured. After the
>> previous adjust-interval completes the measured bw is reset to 0, and
>> then starts updating again after the first 60 sec stats interval is up.
>> For around the first 700 seconds the Max AvgBW was pretty close to what
>> one would expect (around 900Mbps), then it jumped to ~1.6Gbps for no
>> reason that I can determine. The stats file for this LSP (above) never
>> showed anything above 1.0G, and a monitor of the lsp never showed any
>> sample thatever got anywhere near that high (let alone enough to make an
>> entire 60 sec sample interval report that high). At the end of the 900
>> seconds, te 1.6G value is what was signaled to RSVP, and the cycle
>> repeated itself. I watched it for several more cycles, and saw the same
>> behavior happening over and over again, with measured values of 1.8G
>> plus, while the stats file continued to show an average of around
>> 800-900Mbps and no sample that ever went above 1G.
>> 
> 
> I've seen something similar on 9.5R2 although I didn't pay it much
> heed at the time as I was investigating other issues.  My guess (and
> it is definitely a guess) is that there is an internal data structure
> which stores the LSP usage which is then divided by the sampling
> interval and written to the statistics file after the sampling
> interval.  If something (rpd scheduling issue, CPU at 100%?) prevents
> this value from being written to the statistics file after the
> sampling interval it gets a default value of 0.  The data structure
> keeps the stats from the previous sampling interval and is added to.
> When the next sampling interval expires this value is then divided by
> 1 x sampling interval leading to an average bps value roughly double
> what it should be.
> 
> I'll keep an eye out and report back if I see this behaviour again.
> 
> 
>> This particular router is running 9.4R3, but I've seen similar behavior
>> on some other 9.5R4 routers as well. This really seems like some kind of
>> bug, but honestly I'd sooner slit my wrists with a rusty PIC than try to
>> explain the above to JTAC (besides, they would probably just ask me for
>> 50 irrelevent log files then do nothing for the next 6 months like all
>> of my other cases :P). I'm wondering if this is some kind of known
>> issue, or if there is some reason why this config wouldn't work well.
>> 
>> The stats interval of 60 seconds is because I snmp poll and graph the
>> mplsLspOctets every 60 seconds, and snmp is updated based on the stats
>> interval. Any value other than 60 secs makes the graphs wildly jitter.
>> But in the JUNOS documentation for auto-bandwidth, there is the
>> following warning:
>> 
>> http://www.juniper.net/techpubs/en_US/junos9.5/information-products/topic-collections/config-guide-mpls-applications/mpls-configuring-automatic-bandwidth-allocation-for-lsps.html
>> 
>> Note: To prevent unnecessary resignaling of LSPs, it is best to
>> configure an MPLS automatic bandwidth statistics interval of no more
>> than one third the corresponding LSP adjustment interval. For example,
>> if you configure a value of 30 seconds for the interval statement at the
>> [edit protocols mpls statistics] hierarchy level, you should configure a
>> value of no more than 90 seconds for the adjust-interval statement at
>> the [edit protocols mpls label-switched-path label-switched-path-name
>> auto-bandwidth] hierarchy level.
>> 
>> I could never figure this one out, and personally I always thought it
>> was some kind of documentation error. What possible reason could there
>> be for not having an adjust-interval of more than 3x the statistics
>> value? I'm running 900 sec adjust-intervals with 300 sec overflow
>> detection (the lowest value you can configure) to try and reduce RSVP
>> resignaling load on the network. Every time an LDP resignals, it tears
>> down the bypass LSPs as well, and at one point (prior to 9.4 I think) it
>> took over 50 seconds before JUNOS would even try to start resignaling
>> the bypass LSPs. There were some optimizations made to make it kick off
>> the bypass LSP resignal within ~15 secs instead of ~50 secs, but we're
>> still trying to keep it from resignaling excessively.
>> 
> 
> I've never seen this advice before but I've certainly seen networks
> operate fine with adjust-interval much greater than 3x statistics
> interval.
> 
>> I'll gladly accept any clue anyone can offer on this one. :)
>> 
>> --
>> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
>> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>> 
> 
> 
> 
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp




More information about the juniper-nsp mailing list