[c-nsp] strange RTT increase in ASR1006

Tassos Chatzithomaoglou achatz at forthnetgroup.gr
Wed Feb 20 11:54:15 EST 2013


For everyone interested, the problem origin is nbar (probably something changed
dramatically in latest releases).
With it enabled on the core facing interfaces we were hitting a limit of 8-12 Gbps (!) in
ESP40 with 2x10G SPAs in each direction. After removing it completely, the rtt issue seems
to be solved.

Now we're waiting for tac to recreate the issue in their lab, come out with a
you're-reaching-platform-limits conclusion ...and then sue Miercom for creating fake
reports (http://www.miercom.com/2009/04/cisco-asr-1002-1004-and-1006-series-routers/). :-P

Many thanks to Pete for pointing out the "show plat hard slot x plim buffer settings"
command. Without it we wouldn't have found the full buffer issue (near full buffer=> rtt
increases).
Unfortunately (but not unexpectedly) tac was focusing on various other -mostly unrelated-
things.


Without nbar
------------
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:48:42.545 EET Wed Feb 20 2013
  Qstat count: 0          Flow ctrl count: 183465285890

router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:48:47.107 EET Wed Feb 20 2013
  Qstat count: 0          Flow ctrl count: 183465285891


router#show platform hardware qfp active data utilization
...
Processing: Load (pct)          25          25          25          56


router#show plat hard slot 1 plim buffer settings
Interface 1/0/0
  RX Low
    Buffer Size 28901376 Bytes
    Drop Threshold Low 28891200 Bytes Hi 28891200 Bytes
    Fill Status Curr/Max 2048 Bytes / 0 Bytes                <===



With nbar
---------
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:36:41.586 EET Wed Feb 20 2013
  Qstat count: 0          Flow ctrl count: 183191857498
 
router#sh platform hardware slot 1 serdes statistics | i Flow|NTP
Time source is NTP, 20:36:53.828 EET Wed Feb 20 2013
  Qstat count: 0          Flow ctrl count: 183196841558

  
router#show platform hardware qfp active data utilization
...
Processing: Load (pct)          63          63          63          64


router#show plat hard slot 1 plim buffer settings
...
Interface 1/0/0
  RX Low
    Buffer Size 28901376 Bytes
    Drop Threshold Low 28891200 Bytes Hi 28891200 Bytes
    Fill Status Curr/Max 25407488 Bytes / 0 Bytes            <===

--
Tassos

Tassos Chatzithomaoglou wrote on 2/2/2013 13:33:
> I have the following setup on a ASR1006 (RP2/ESP40/SIP40) and i'm trying to find out if
> the following behavior is expected.
>
>
>                    +-----+
> Te1/0/0 (4G/1G) ---|     |---Te1/1/0 (2G/8G)
>                    |     |
> Te1/2/0 (4G/1G) ---|     |
>                    +-----+
>
>
> When the output of Te1/1/0 goes above 8G, RTT for packets flowing from Te1/0/0 to Te1/2/0
> increases by 50-100ms.
>
> The same happens in the following scenario on another ASR1006 (RP2/ESP20/SIP10); when the
> output of Te1/1/0 goes above 6G, RTT for packets flowing from Te1/0/0to router's loopback
> increases by 50-100ms (RP is at ~30% all the time).
>
>                    +-----+
> Te1/0/0 (6G/1G) ---|     |---Te1/1/0 (1G/6G)
>                    |     |
>                    |     |
>                    +-----+
>
>
> Most of the times, RTT increase is followed by packet loss
>
> This reminds me of HOL blocking, but i had the impression this was applicable mostly to
> switches with small buffers.
> At the same time, the ESP sends thousands of flow control signals to the SIP that it can't
> cope with this traffic rate.
>
> ASR1006#show platform hardware slot 1 serdes statistics
> >From Slot F0-Link A
>   Pkts  High: 1687702827 Low: 391241384970 Bad: 0          Dropped: 0
>   Bytes High: 326483900940 Low: 291059939598187 Bad: 0          Dropped: 0
>   Pkts  Looped: 0          Error: 0
>   Bytes Looped 0
>   Qstat count: 0          Flow ctrl count: 40306518521        <===
>
> >From Slot F1-Link A
>   Pkts  High: 0          Low: 0          Bad: 0          Dropped: 0
>   Bytes High: 0          Low: 0          Bad: 0          Dropped: 0
>   Pkts  Looped: 0          Error: 0
>   Bytes Looped 0
>   Qstat count: 0          Flow ctrl count: 80093
>
> -after 1 sec-
>
> ASR1006#show platform hardware slot 1 serdes statistics
> >From Slot F0-Link A
>   Pkts  High: 1687721691 Low: 391244370772 Bad: 0          Dropped: 0
>   Bytes High: 326487553571 Low: 291062458884384 Bad: 0          Dropped: 0
>   Pkts  Looped: 0          Error: 0
>   Bytes Looped 0
>   Qstat count: 0          Flow ctrl count: 40307432319        <===
>
> >From Slot F1-Link A
>   Pkts  High: 0          Low: 0          Bad: 0          Dropped: 0
>   Bytes High: 0          Low: 0          Bad: 0          Dropped: 0
>   Pkts  Looped: 0          Error: 0
>   Bytes Looped 0
>   Qstat count: 0          Flow ctrl count: 80094
>
>
>
> ASR1006#show platform hardware slot 1 plim status internal
> FCM Status
>   XON/XOFF 0x0000000000000003
> ECC Status
> Data Path Config
>   MaxBurst1 256, MaxBurst2 128, DataMaxT 32768
>   Cal Length RX 0x0002, TX 0x0002
>   Repetitions RX 0x0010, TX 0x0010
> Data Path Status
>   RX in sync, TX in sync
>   Spi4 Channel 0, Rx Channel Status Full, Tx Channel Status Hungry	       <===
>   Spi4 Channel 1, Rx Channel Status Starving, Tx Channel Status Starving
>   RX Pkts 391387121421 Bytes 285048619167994
>   TX Pkts 393073127218 Bytes 291507927293959
>   Hypertransport Status
>   RX Pkts 0           Bytes 0
>   TX Pkts 0           Bytes 0
>
>
>
> TAC is talking about microbursts (how unusual), and although i can't measure 10G traffic
> per ms, QFP's 5-sec data doesn't agree with them.
>
> ASR1006#show platform hardware qfp active data utilization
>   CPP 0: Subdev 0           5 secs       1 min       5 min      60 min
> Input:  Priority (pps)         884         840         838         835
>                  (bps)      783344      750704      746360      725664
>     Non-Priority (pps)     1291098     1282015     1261073     1265900
>                  (bps)  7544814904  7465944936  7322679592  7345058240
>            Total (pps)     1291982     1282855     1261911     1266735
>                  (bps)  7545598248  7466695640  7323425952  7345783904
> Output: Priority (pps)        9065        9191        9184        8897
>                  (bps)    11485520    11659776    11730880    11357888
>     Non-Priority (pps)     1281141     1271903     1251199     1256512
>                  (bps)  7560289312  7481701984  7338684976  7360568224
>            Total (pps)     1290206     1281094     1260383     1265409
>                  (bps)  7571774832  7493361760  7350415856  7371926112
> Processing: Load (pct)          59          59          59          59
>
>
>
> At the same time different IOS releases give different results (15.2(4)S2 is far worsethan
> 15.1(3)S2) and i'm starting to believe that ASR1006 is another hype scheduled to go down...
>
>
>
>



More information about the cisco-nsp mailing list