[c-nsp] 6500 - SUP720 - IOS - traffic problem

Gabriel Mateiciuc mgabi at ase.ro
Sat Jan 5 15:03:21 EST 2008


For the record … I seem to be “hit” by some bugs:

        CSCsl70634 Bug Details

 

Headline        67xx EC tx/rx traffic dependency resulting in low throughput

Product         IOS

Feature         OTHERS        Duplicate of

Severity        1  Severity help      Status  Resolved  Status help

First Found-in Version          12.2(18)SXF12   All affected versions        First

Fixed-in Version        8.7(0.22)BUB19, 8.7(0.22)SRC4, 12.2(18.12.5)SXF

Version help

Release Notes

 

 

Symptom : Port Channel experiences over runs.

Condition : Seen on 67xx card.

Trigger : When port receives +6Gbps of ip2tag traffic.

Frequency : Found internally. No service requests.

Root Cause : This is caused by flow-control asserted by fabric interface

asic.

The impact : Impacts traffic.

Workaround : None.

Issue verification : None.

 

And :

 

CSCeh08451 Bug Details
 
Headline        Excessive Overruns and lbusDrops due heavy flow control over
fabric
Product         IOS
Feature         OTHERS        Duplicate of
Severity        1  Severity help      Status   Resolved  Status help
First Found-in Version          12.2(17d)SXB02   All affected versions       First
Fixed-in Version        12.2(18)SXE, 12.2(18)SXD05, 12.2(17d)SXB08  Version help
Release Notes
 
Symptoms
 
Sup720 system running in flow through mode (may not be not limited to this
mode) may get to constant flow controlling situation under certain traffic
profile which reduces the through put of the system
 
Workaround
 
A command has been to added to reserve ASIC buffers in Line card to improve
the through-put of the system
 
[no] fabric buffer-reserve [high | low | medium | value]
 
high - 0x5050
medium - 0x4040
low - 0x3030
 
value any 16 bit value from 0x0 to 0x5050

 

 

 

From: cheddar cheese [mailto:cheddar3 at gmail.com] 
Sent: 5 ianuarie 2008 21:34
To: Gabriel Mateiciuc
Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem

 

probably best to open a case and upload all relevant info.

good luck,

-jay

On Jan 5, 2008 12:58 PM, Gabriel Mateiciuc <mgabi at ase.ro> wrote: 

Hello jay,

 

First of all, thanks for the patience of reading and explaining all this. Unfortunately I was already aware of the facts that you've laid here.

Normally, I would say the same things you've explained here, but … and there's a but, there are some more empirical observations:

We've had previous experience with a 6500-SUP2 (no fabric) that could hit 80-90% bus utilization without packet loss/drops.

About a month ago we were using 12.2(18)SXF3 IOS and we went to 12.2(18)SXF12. In the process we noticed the packet loss that occurs at peak hours, so at first we blamed the IOS and we started digging for solutions. 

Analyzing the trends revealed that the single difference is the bus utilization that rose from between 50-60% to 70-80% 

Comparing to 6 months ago:

Then: 3,4-3,5 gbit/s on each of the 4 backbone links (port-channels of 4 gbit each) – less clients – bus 50-60% - IOS SXF3

Now: congestion loss on the bb links 2,5-3 gbit/s at peak hours – more clients connected to the classic cards – bus 70-80% - IOS SXF12

I've made some test like moving 2 of the 4 links on the supervisor from one Portchannel – that seemed to solve the problem for that po link.

Putting another fabric-enabled card and moving some of the links there would solve the problem so I'm sure the bus is not hitting the limit. Then again, that doesn't answer why the 6724 seems uneffective.

 

So, getting to the very problem, some previous experience with port-channels, balancing algorithms, high-traffic, not-recommended configuration options unless advised by tac, not documented IOS bugs … I think the answer would be amongst these.

PS: I've read the caveats for the IOS we're using now … and there seems to be no link with the problems we're having.

 

From: cheddar cheese [mailto:cheddar3 at gmail.com] 
Sent: 5 ianuarie 2008 20:03
To: Gabriel Mateiciuc
Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem

 

Hello Gabriel,

since you have a combo of fabric and non-fabric modules the system switching mode is "truncated".  in this mode non-fabric cards like the 63xx modules put entire frames on the bus while fabric cards (like your 67xx modules) just put the headers.  a non-fabric card forwards a frame via the bus to the Supervisors PFC and the PFC then switches it through the fabric to the fabric enabled card (for traffic going from non-fabric to fabric cards). 

The maximum centralized switch performance in truncated mode is 15 Mpps.  It doesn't look like you're hitting this limit but it does look like the bus is busy.  are the 63xx modules heavily utilized?  is the traffic mostly large frames?  

i think replacing all or some of  the 63xx modules with fabric enabled modules (like the 6748) should help reduce the bus utilization.  also, if you replace all of them, the system can operate in "compact" mode which increases the maximum centralized switching capacity to 30 Mpps.  if you add DFCs to the fabric-enabled cards (67xx) then port-to-port traffic within those cards doesn't touch the bus and the total switching capacity also scales by 48 Mpps per DFC. 

6500 Architecture White paper
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper0900aecd80673385.shtml 

Cisco TAC might be able to help further.

-jay



On Jan 5, 2008 8:14 AM, Gabriel Mateiciuc <mgabi at ase.ro> wrote:

Hello everyone,

Here's the environment i'm talking about:

#sh platform hardware capacity
System Resources
 PFC operating mode: PFC3BXL
 Supervisor redundancy mode: administratively sso, operationally sso 
 Switching resources: Module   Part number               Series      CEF
mode
                      1        WS-X6348-RJ-45           classic
CEF
                      2        WS-X6348-RJ-45           classic 
CEF
                      3        WS-X6748-GE-TX            CEF720
CEF
                      4        WS-X6724-SFP              CEF720
CEF
                      5        WS-SUP720-3BXL        supervisor 
CEF
                      6        WS-X6704-10GE             CEF720
CEF
                      7        WS-X6348-RJ-45           classic
CEF
                      8        WS-X6348-RJ-45           classic 
CEF
                      9        WS-X6348-RJ-45           classic
CEF

CPU Resources
 CPU utilization: Module             5 seconds       1 minute       5
minutes
                  3                   0% /  0%             0% 
0%
                  4                   0% /  0%             0%
0%
                  5  RP              32% / 11%            11%
11%
                  5  SP              14% /  1%             9%
9%
                  6                   0% /  0%             0%
0%
 Processor memory: Module   Bytes:       Total           Used
%Used
                   3                 219661760       94927184
43% 
                   4                 219661760       94488840
43%
                   5  RP             927935472      132545832
14%
                   5  SP             912623676      218933576
24%
                   6                 219661760       94944424
43%
 I/O memory: Module         Bytes:       Total           Used
%Used
             5  RP                    67108864       11891816
18%
             5  SP                    67108864       11891760
18%

EOBC Resources
 Module                     Packets/sec     Total packets     Dropped
packets
 3          Rx:                       7         280576601 
3
            Tx:                       1          24002677
0
 4          Rx:                       7         280574860
3
            Tx:                       3          15260689
0
 5  RP      Rx:                      72         141474821 
4066
            Tx:                      59         109863281
0
 5  SP      Rx:                      11          41664038
4697
            Tx:                      20          64613234
0
 6          Rx:                       8         280576597 
2
            Tx:                       2           8779278
0

VLAN Resources
 VLANs: 4094 total, 149 VTP, 240 extended, 14 internal, 3691 free

L2 Forwarding Resources
          MAC Table usage:   Module  Collisions  Total       Used 
%Used
                             5                0  65536       2604
4%

            VPN CAM usage:                       Total       Used
%Used
                                                   512          0 
0%
L3 Forwarding Resources
            FIB TCAM usage:                     Total        Used
%Used
                 72 bits (IPv4, MPLS, EoM)     524288        5558
1%
                144 bits (IP mcast, IPv6)      262144           5 
1%

                    detail:      Protocol                    Used
%Used
                                 IPv4                        5558
1%
                                 MPLS                           0 
0%
                                 EoM                            0
0%

                                 IPv6                           2
1%
                                 IPv4 mcast                     3 
1%
                                 IPv6 mcast                     0
0%

           Adjacency usage:                     Total        Used
%Used
                                              1048576         635 
1%

    Forwarding engine load:
                    Module       pps   peak-pps
peak-time
                    5        7865738    8282714  22:21:27 UTC+2 Fri Jan 4
2008

CPU Rate Limiters Resources 
            Rate limiters:       Total         Used      Reserved
%Used
                   Layer 3           9            4             1
44%
                   Layer 2           4            2             2 
50%

ACL/QoS TCAM Resources
 Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR,
      QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND,
      Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source, 
      LOUdst - LOU destination, ADJ - ACL adjacency

 Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst  AND  OR
ADJ
 5          1%     2%     1%     1%     1%     1%     0%     3%   0%  0% 
1%

QoS Policer Resources
 Aggregate policers: Module                      Total         Used
%Used
                     5                            1024            1
1%
 Microflow policer configurations: Module        Total         Used 
%Used
                                   5                64            1
1%

Switch Fabric Resources
 Bus utilization: current: 71%, peak was 81% at 22:53:20 UTC+2 Fri Jan 4
2008
 Fabric utilization:     Ingress                    Egress 
   Module  Chanl  Speed  rate  peak                 rate  peak

   3       0        20G   35%   48% @20:38 27Dec07   26%   36% @20:44
04Jan08
   3       1        20G   40%   48% @23:00 04Jan08   34%   43% @22:21 
03Jan08
   4       0        20G   43%   55% @15:57 03Jan08   48%   63% @20:33
27Dec07
   5       0        20G   13%   18% @21:42 02Jan08    9%   17% @22:52
04Jan08
   6       0        20G    0%    1% @01:30 25Dec07    0%    2% @11:27 
30Dec07
   6       1        20G   33%   48% @20:26 27Dec07   45%   54% @22:36
03Jan08
 Switching mode: Module                                        Switching
mode
                 3
truncated
                 4
truncated
                 5                                               flow
through
                 6
truncated

Interface Resources
 Interface drops:
   Module    Total drops:    Tx            Rx      Highest drop port:  Tx 
Rx
   1                       7353          2166                           1
38
   2                   24609502        144685                          14
40
   3                      42130    8135613761                           7 
2
   4                     160468   49040038842                          17
6
   5                    1354908        184496                           1
2
   6                      12027        286149                           1 
1
   7                   29461165        218697                          33
37
   8                    2033449           282                          10
10
   9                   24030508        408094                          36 
29

 Interface buffer sizes:
   Module                            Bytes:     Tx buffer           Rx
buffer
   1                                               112640
6144
   2                                               112640 
6144
   3                                              1221120
152000
   4                                              1221120
152000
   6                                             14622592
1914304 
   7                                               112640
6144
   8                                               112640
6144
   9                                               112640
6144


And for those having enough patience to read the details, here's the
question/problem:
On the 4-th linecard (6724-SFP) we have links grouped in etherchannels
(4xGigabit backbone links), with respect to keeping most of the 
etherchannels with their ports grouped on the same asic/linecard. The
load-balancing used is src-dst-ip. Looking at the figures above I guess
anyone would say there are plenty of resources left yet our graphs/interface 
summary shows us that somere between 40-50% fabric utilization, both ingress
and egress, there is a problem with the forwarding performance (also seen
looking at the high IQD counters):

* GigabitEthernet4/1       0 3938121308    0    56 557290000  100095 
620339000  94591    0
* GigabitEthernet4/2       0 3909192601    0   304 562387000  94364
602164000  93503    0
* GigabitEthernet4/3       0 3909817998    0  1113 561663000  94280
847735000  113865    0
* GigabitEthernet4/4       0 3939072687    0    53 557529000  95337
643992000  95015    0

Now, other (posibly) relevant information from the config:

ip cef event-log traceback-depth 0
ip cef table consistency-check error-message 
ip cef table consistency-check auto-repair
ip cef load-sharing algorithm original
mls ip cef load-sharing simple
fabric switching-mode allow truncated
fabric buffer-reserve queue
fabric buffer-reserve low - that seemed to help a lot (over 10% boost in 
performance)

Did anyone hit similar problems with low performance on fabric enabled
linecards ? Any recommended configuration/IOS version ?

Cheers,


Gabriel Mateiciuc
Academia de Studii Economice 
Departamentul Reţele
Echipa Infrastructura - infrastructura at ase.ro




_______________________________________________
cisco-nsp mailing list   cisco-nsp at puck.nether.net <mailto:cisco-nsp at puck.nether.net> 
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

 

 



More information about the cisco-nsp mailing list