[j-nsp] Auto-bandwidth Accuracy

Richard A Steenbergen ras at e-gerbil.net
Tue May 25 21:41:16 EDT 2010


On Tue, May 25, 2010 at 03:23:51PM -0400, Olson, Martin wrote:
> Yeah, I found the same behavior.  Sometimes the Max AvgBW would go up
> by 5X-7X the value it should've, which would lead to really high
> reservations after the next adjust-interval.  I opened case
> 2009-0610-0697 about the issue, and after a while they traced the
> problem to PRs 438157 and 457767.  The first code with the fix for
> both PRs is 9.6R2/9.5R3/9.4R4/9.3R5.  They told us that if we disabled
> the adjust-threshold-overflow-limit in the meantime, that would
> alleviate the problem until we upgrade code.

Hrmmm interesting. That would definitely explain a few things, since the
router I noticed that issue on was running 9.4R3. The funny thing is, I
think I'm actually seeing another different bug on 9.5R4 too. I'm
watching one particular interface which is actually doing about 4.5Gbps,
but the RSVP reservations keep jumping between ~4.5G (accurate), ~6.5G,
and ~8.5G.  The LSPs themselves are all staying put, it's just the
reservations that are changing.

The two biggest LSPs going over this interface are two parallel LSPs
ingressing on this router, and going to the same destination (so they
should be roughly balancing. The actual traffic passing through each of
the LSPs is ~1.8G, but when autobw runs it sees a value of 3.6G for LSP
1, and 1.8G for LSP 2. The next time autobw runs, it sees 1.8G for LSP 1
and 3.6G for LSP 2. Sometimes it will see 3.6G for both, but mostly it
flips back and forth between double-counting or counting correctly. But
if you watch the ~2 sec updates in monitor label-switched-path, 1.8G is
definitely the accurate number and it doesn't actually swing traffic one
direction or the other. The MPLS stats file shows:

xxx.xxxx-xxx.xxxx-BRONZE-1    104486155 pkt   105218354455 Byte 205197 pps 212583416 Bps Util 46.00%
xxx.xxxx-xxx.xxxx-BRONZE-2    105659898 pkt   107686862770 Byte 209566 pps 217065966 Bps Util 45.73%
xxx.xxxx-xxx.xxxx-BRONZE-1     21935227 pkt    22685800299 Byte
xxx.xxxx-xxx.xxxx-BRONZE-2     21461156 pkt    22692594499 Byte
xxx.xxxx-xxx.xxxx-BRONZE-1     52299821 pkt    54299830347 Byte 206561 pps 215061428 Bps Util 106.20%
xxx.xxxx-xxx.xxxx-BRONZE-2     53080366 pkt    56207186696 Byte 215096 pps 227990423 Bps Util 108.82%
xxx.xxxx-xxx.xxxx-BRONZE-1     86602166 pkt    90675115910 Byte 221305 pps 234679261 Bps Util 115.88%
xxx.xxxx-xxx.xxxx-BRONZE-2     86986993 pkt    92238787605 Byte 218752 pps 232461941 Bps Util 110.95%
xxx.xxxx-xxx.xxxx-BRONZE-1    117133847 pkt   122194831966 Byte 204910 pps 211541718 Bps Util 104.46%
xxx.xxxx-xxx.xxxx-BRONZE-2    117353905 pkt   123860169099 Byte 203804 pps 212224036 Bps Util 101.29%
xxx.xxxx-xxx.xxxx-BRONZE-1     21913950 pkt    22743057380 Byte
xxx.xxxx-xxx.xxxx-BRONZE-2     20881955 pkt    21600868242 Byte
xxx.xxxx-xxx.xxxx-BRONZE-1     55061824 pkt    57134656425 Byte 211132 pps 219054770 Bps Util 48.40%
xxx.xxxx-xxx.xxxx-BRONZE-2     55272067 pkt    57108577385 Byte 217658 pps 224732336 Bps Util 93.56%
xxx.xxxx-xxx.xxxx-BRONZE-1     86342525 pkt    88887335246 Byte 211356 pps 214545127 Bps Util 47.40%
xxx.xxxx-xxx.xxxx-BRONZE-2     86516244 pkt    88738005437 Byte 212545 pps 215166177 Bps Util 89.58%
xxx.xxxx-xxx.xxxx-BRONZE-1    118927914 pkt   122308515694 Byte 217235 pps 222807869 Bps Util 49.23%
xxx.xxxx-xxx.xxxx-BRONZE-2    118667459 pkt   121951530715 Byte 214341 pps 221423501 Bps Util 92.18%
xxx.xxxx-xxx.xxxx-BRONZE-1     22959765 pkt    23199542774 Byte
xxx.xxxx-xxx.xxxx-BRONZE-2    149388166 pkt   153557687831 Byte 213338 pps 219487202 Bps Util 91.38%
xxx.xxxx-xxx.xxxx-BRONZE-1     56033558 pkt    56192853447 Byte 209327 pps 208818421 Bps Util 91.08%
xxx.xxxx-xxx.xxxx-BRONZE-2    183035335 pkt   187926595586 Byte 212956 pps 217524732 Bps Util 90.56%
xxx.xxxx-xxx.xxxx-BRONZE-1     87924814 pkt    87973385065 Byte 216947 pps 216194092 Bps Util 94.30%
xxx.xxxx-xxx.xxxx-BRONZE-2    214599264 pkt   219857751942 Byte 214720 pps 217218750 Bps Util 90.43%
xxx.xxxx-xxx.xxxx-BRONZE-1    121190549 pkt   121835543642 Byte 220302 pps 224252705 Bps Util 97.81%
xxx.xxxx-xxx.xxxx-BRONZE-2    248148738 pkt   254423414558 Byte 222181 pps 228911672 Bps Util 95.30%
xxx.xxxx-xxx.xxxx-BRONZE-1    152810797 pkt   154153805842 Byte 219585 pps 224432376 Bps Util 97.89%
xxx.xxxx-xxx.xxxx-BRONZE-2     22968975 pkt    23490382846 Byte
xxx.xxxx-xxx.xxxx-BRONZE-1    184734907 pkt   186441578904 Byte 207299 pps 209660864 Bps Util 91.45%
xxx.xxxx-xxx.xxxx-BRONZE-2     55927141 pkt    57401286923 Byte 214014 pps 220200675 Bps Util 50.62%
xxx.xxxx-xxx.xxxx-BRONZE-1    217200366 pkt   219434061321 Byte 213588 pps 217055805 Bps Util 94.67%
xxx.xxxx-xxx.xxxx-BRONZE-2     88453546 pkt    90549599056 Byte 212590 pps 216655634 Bps Util 49.81%
xxx.xxxx-xxx.xxxx-BRONZE-1    249255138 pkt   251947784814 Byte 215132 pps 218212909 Bps Util 95.18%
xxx.xxxx-xxx.xxxx-BRONZE-2    120676066 pkt   123570161138 Byte 217719 pps 223111905 Bps Util 51.29%

Note that the bandwidth stays consistent across both both LSPs, but you
can see that one will be measuring pretty close to 100%, the other will
be measuring pretty close to 50%, and then it switches. This matches
precisely with the the behavior I'm seeing. I'm not sure what is causing 
the lines where there is no pps/bps/% data.

Sigh... Why do I get the distinct feeling Juniper has hired the guy who
wrote the counters in IOS.

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


More information about the juniper-nsp mailing list