[c-nsp] strange RTT increase in ASR1006
Tassos Chatzithomaoglou
achatz at forthnetgroup.gr
Wed Feb 20 11:54:15 EST 2013
For everyone interested, the problem origin is nbar (probably something changed
dramatically in latest releases).
With it enabled on the core facing interfaces we were hitting a limit of 8-12 Gbps (!) in
ESP40 with 2x10G SPAs in each direction. After removing it completely, the rtt issue seems
to be solved.
Now we're waiting for tac to recreate the issue in their lab, come out with a
you're-reaching-platform-limits conclusion ...and then sue Miercom for creating fake
reports (http://www.miercom.com/2009/04/cisco-asr-1002-1004-and-1006-series-routers/). :-P
Many thanks to Pete for pointing out the "show plat hard slot x plim buffer settings"
command. Without it we wouldn't have found the full buffer issue (near full buffer=> rtt
increases).
Unfortunately (but not unexpectedly) tac was focusing on various other -mostly unrelated-
things.
Without nbar
------------
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:48:42.545 EET Wed Feb 20 2013
Qstat count: 0 Flow ctrl count: 183465285890
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:48:47.107 EET Wed Feb 20 2013
Qstat count: 0 Flow ctrl count: 183465285891
router#show platform hardware qfp active data utilization
...
Processing: Load (pct) 25 25 25 56
router#show plat hard slot 1 plim buffer settings
Interface 1/0/0
RX Low
Buffer Size 28901376 Bytes
Drop Threshold Low 28891200 Bytes Hi 28891200 Bytes
Fill Status Curr/Max 2048 Bytes / 0 Bytes <===
With nbar
---------
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:36:41.586 EET Wed Feb 20 2013
Qstat count: 0 Flow ctrl count: 183191857498
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:36:53.828 EET Wed Feb 20 2013
Qstat count: 0 Flow ctrl count: 183196841558
router#show platform hardware qfp active data utilization
...
Processing: Load (pct) 63 63 63 64
router#show plat hard slot 1 plim buffer settings
...
Interface 1/0/0
RX Low
Buffer Size 28901376 Bytes
Drop Threshold Low 28891200 Bytes Hi 28891200 Bytes
Fill Status Curr/Max 25407488 Bytes / 0 Bytes <===
--
Tassos
Tassos Chatzithomaoglou wrote on 2/2/2013 13:33:
> I have the following setup on a ASR1006 (RP2/ESP40/SIP40) and i'm trying to find out if
> the following behavior is expected.
>
>
> +-----+
> Te1/0/0 (4G/1G) ---| |---Te1/1/0 (2G/8G)
> | |
> Te1/2/0 (4G/1G) ---| |
> +-----+
>
>
> When the output of Te1/1/0 goes above 8G, RTT for packets flowing from Te1/0/0 to Te1/2/0
> increases by 50-100ms.
>
> The same happens in the following scenario on another ASR1006 (RP2/ESP20/SIP10); when the
> output of Te1/1/0 goes above 6G, RTT for packets flowing from Te1/0/0to router's loopback
> increases by 50-100ms (RP is at ~30% all the time).
>
> +-----+
> Te1/0/0 (6G/1G) ---| |---Te1/1/0 (1G/6G)
> | |
> | |
> +-----+
>
>
> Most of the times, RTT increase is followed by packet loss
>
> This reminds me of HOL blocking, but i had the impression this was applicable mostly to
> switches with small buffers.
> At the same time, the ESP sends thousands of flow control signals to the SIP that it can't
> cope with this traffic rate.
>
> ASR1006#show platform hardware slot 1 serdes statistics
> >From Slot F0-Link A
> Pkts High: 1687702827 Low: 391241384970 Bad: 0 Dropped: 0
> Bytes High: 326483900940 Low: 291059939598187 Bad: 0 Dropped: 0
> Pkts Looped: 0 Error: 0
> Bytes Looped 0
> Qstat count: 0 Flow ctrl count: 40306518521 <===
>
> >From Slot F1-Link A
> Pkts High: 0 Low: 0 Bad: 0 Dropped: 0
> Bytes High: 0 Low: 0 Bad: 0 Dropped: 0
> Pkts Looped: 0 Error: 0
> Bytes Looped 0
> Qstat count: 0 Flow ctrl count: 80093
>
> -after 1 sec-
>
> ASR1006#show platform hardware slot 1 serdes statistics
> >From Slot F0-Link A
> Pkts High: 1687721691 Low: 391244370772 Bad: 0 Dropped: 0
> Bytes High: 326487553571 Low: 291062458884384 Bad: 0 Dropped: 0
> Pkts Looped: 0 Error: 0
> Bytes Looped 0
> Qstat count: 0 Flow ctrl count: 40307432319 <===
>
> >From Slot F1-Link A
> Pkts High: 0 Low: 0 Bad: 0 Dropped: 0
> Bytes High: 0 Low: 0 Bad: 0 Dropped: 0
> Pkts Looped: 0 Error: 0
> Bytes Looped 0
> Qstat count: 0 Flow ctrl count: 80094
>
>
>
> ASR1006#show platform hardware slot 1 plim status internal
> FCM Status
> XON/XOFF 0x0000000000000003
> ECC Status
> Data Path Config
> MaxBurst1 256, MaxBurst2 128, DataMaxT 32768
> Cal Length RX 0x0002, TX 0x0002
> Repetitions RX 0x0010, TX 0x0010
> Data Path Status
> RX in sync, TX in sync
> Spi4 Channel 0, Rx Channel Status Full, Tx Channel Status Hungry <===
> Spi4 Channel 1, Rx Channel Status Starving, Tx Channel Status Starving
> RX Pkts 391387121421 Bytes 285048619167994
> TX Pkts 393073127218 Bytes 291507927293959
> Hypertransport Status
> RX Pkts 0 Bytes 0
> TX Pkts 0 Bytes 0
>
>
>
> TAC is talking about microbursts (how unusual), and although i can't measure 10G traffic
> per ms, QFP's 5-sec data doesn't agree with them.
>
> ASR1006#show platform hardware qfp active data utilization
> CPP 0: Subdev 0 5 secs 1 min 5 min 60 min
> Input: Priority (pps) 884 840 838 835
> (bps) 783344 750704 746360 725664
> Non-Priority (pps) 1291098 1282015 1261073 1265900
> (bps) 7544814904 7465944936 7322679592 7345058240
> Total (pps) 1291982 1282855 1261911 1266735
> (bps) 7545598248 7466695640 7323425952 7345783904
> Output: Priority (pps) 9065 9191 9184 8897
> (bps) 11485520 11659776 11730880 11357888
> Non-Priority (pps) 1281141 1271903 1251199 1256512
> (bps) 7560289312 7481701984 7338684976 7360568224
> Total (pps) 1290206 1281094 1260383 1265409
> (bps) 7571774832 7493361760 7350415856 7371926112
> Processing: Load (pct) 59 59 59 59
>
>
>
> At the same time different IOS releases give different results (15.2(4)S2 is far worsethan
> 15.1(3)S2) and i'm starting to believe that ASR1006 is another hype scheduled to go down...
>
>
>
>
More information about the cisco-nsp
mailing list