[c-nsp] 6500 - SUP720 - IOS - traffic problem

Sat Jan 5 13:58:40 EST 2008

Hello jay,

First of all, thanks for the patience of reading and explaining all this.
Unfortunately I was already aware of the facts that you've laid here.

Normally, I would say the same things you've explained here, but . and
there's a but, there are some more empirical observations:

We've had previous experience with a 6500-SUP2 (no fabric) that could hit
80-90% bus utilization without packet loss/drops.

About a month ago we were using 12.2(18)SXF3 IOS and we went to
12.2(18)SXF12. In the process we noticed the packet loss that occurs at peak
hours, so at first we blamed the IOS and we started digging for solutions. 

Analyzing the trends revealed that the single difference is the bus
utilization that rose from between 50-60% to 70-80% 

Comparing to 6 months ago:

Then: 3,4-3,5 gbit/s on each of the 4 backbone links (port-channels of 4
gbit each) - less clients - bus 50-60% - IOS SXF3

Now: congestion loss on the bb links 2,5-3 gbit/s at peak hours - more
clients connected to the classic cards - bus 70-80% - IOS SXF12

I've made some test like moving 2 of the 4 links on the supervisor from one
Portchannel - that seemed to solve the problem for that po link.

Putting another fabric-enabled card and moving some of the links there would
solve the problem so I'm sure the bus is not hitting the limit. Then again,
that doesn't answer why the 6724 seems uneffective.

So, getting to the very problem, some previous experience with
port-channels, balancing algorithms, high-traffic, not-recommended
configuration options unless advised by tac, not documented IOS bugs . I
think the answer would be amongst these.

PS: I've read the caveats for the IOS we're using now . and there seems to
be no link with the problems we're having.

From: cheddar cheese [mailto:cheddar3 at gmail.com] 
Sent: 5 ianuarie 2008 20:03
To: Gabriel Mateiciuc
Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem

Hello Gabriel,

since you have a combo of fabric and non-fabric modules the system switching
mode is "truncated".  in this mode non-fabric cards like the 63xx modules
put entire frames on the bus while fabric cards (like your 67xx modules)
just put the headers.  a non-fabric card forwards a frame via the bus to the
Supervisors PFC and the PFC then switches it through the fabric to the
fabric enabled card (for traffic going from non-fabric to fabric cards). 

The maximum centralized switch performance in truncated mode is 15 Mpps.  It
doesn't look like you're hitting this limit but it does look like the bus is
busy.  are the 63xx modules heavily utilized?  is the traffic mostly large
frames?  

i think replacing all or some of  the 63xx modules with fabric enabled
modules (like the 6748) should help reduce the bus utilization.  also, if
you replace all of them, the system can operate in "compact" mode which
increases the maximum centralized switching capacity to 30 Mpps.  if you add
DFCs to the fabric-enabled cards (67xx) then port-to-port traffic within
those cards doesn't touch the bus and the total switching capacity also
scales by 48 Mpps per DFC. 

6500 Architecture White paper
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper09
00aecd80673385.shtml 

Cisco TAC might be able to help further.

-jay

On Jan 5, 2008 8:14 AM, Gabriel Mateiciuc <mgabi at ase.ro> wrote:

Hello everyone,

Here's the environment i'm talking about:

#sh platform hardware capacity
System Resources
 PFC operating mode: PFC3BXL
 Supervisor redundancy mode: administratively sso, operationally sso 
 Switching resources: Module   Part number               Series      CEF
mode
                      1        WS-X6348-RJ-45           classic
CEF
                      2        WS-X6348-RJ-45           classic 
CEF
                      3        WS-X6748-GE-TX            CEF720
CEF
                      4        WS-X6724-SFP              CEF720
CEF
                      5        WS-SUP720-3BXL        supervisor 
CEF
                      6        WS-X6704-10GE             CEF720
CEF
                      7        WS-X6348-RJ-45           classic
CEF
                      8        WS-X6348-RJ-45           classic 
CEF
                      9        WS-X6348-RJ-45           classic
CEF

CPU Resources
 CPU utilization: Module             5 seconds       1 minute       5
minutes
                  3                   0% /  0%             0% 
0%
                  4                   0% /  0%             0%
0%
                  5  RP              32% / 11%            11%
11%
                  5  SP              14% /  1%             9%
9%
                  6                   0% /  0%             0%
0%
 Processor memory: Module   Bytes:       Total           Used
%Used
                   3                 219661760       94927184
43% 
                   4                 219661760       94488840
43%
                   5  RP             927935472      132545832
14%
                   5  SP             912623676      218933576
24%
                   6                 219661760       94944424
43%
 I/O memory: Module         Bytes:       Total           Used
%Used
             5  RP                    67108864       11891816
18%
             5  SP                    67108864       11891760
18%

EOBC Resources
 Module                     Packets/sec     Total packets     Dropped
packets
 3          Rx:                       7         280576601 
3
            Tx:                       1          24002677
0
 4          Rx:                       7         280574860
3
            Tx:                       3          15260689
0
 5  RP      Rx:                      72         141474821 
4066
            Tx:                      59         109863281
0
 5  SP      Rx:                      11          41664038
4697
            Tx:                      20          64613234
0
 6          Rx:                       8         280576597 
2
            Tx:                       2           8779278
0

VLAN Resources
 VLANs: 4094 total, 149 VTP, 240 extended, 14 internal, 3691 free

L2 Forwarding Resources
          MAC Table usage:   Module  Collisions  Total       Used 
%Used
                             5                0  65536       2604
4%

            VPN CAM usage:                       Total       Used
%Used
                                                   512          0 
0%
L3 Forwarding Resources
            FIB TCAM usage:                     Total        Used
%Used
                 72 bits (IPv4, MPLS, EoM)     524288        5558
1%
                144 bits (IP mcast, IPv6)      262144           5 
1%

                    detail:      Protocol                    Used
%Used
                                 IPv4                        5558
1%
                                 MPLS                           0 
0%
                                 EoM                            0
0%

                                 IPv6                           2
1%
                                 IPv4 mcast                     3 
1%
                                 IPv6 mcast                     0
0%

           Adjacency usage:                     Total        Used
%Used
                                              1048576         635 
1%

    Forwarding engine load:
                    Module       pps   peak-pps
peak-time
                    5        7865738    8282714  22:21:27 UTC+2 Fri Jan 4
2008

CPU Rate Limiters Resources 
            Rate limiters:       Total         Used      Reserved
%Used
                   Layer 3           9            4             1
44%
                   Layer 2           4            2             2 
50%

ACL/QoS TCAM Resources
 Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR,
      QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND,
      Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source, 
      LOUdst - LOU destination, ADJ - ACL adjacency

 Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst  AND  OR
ADJ
 5          1%     2%     1%     1%     1%     1%     0%     3%   0%  0% 
1%

QoS Policer Resources
 Aggregate policers: Module                      Total         Used
%Used
                     5                            1024            1
1%
 Microflow policer configurations: Module        Total         Used 
%Used
                                   5                64            1
1%

Switch Fabric Resources
 Bus utilization: current: 71%, peak was 81% at 22:53:20 UTC+2 Fri Jan 4
2008
 Fabric utilization:     Ingress                    Egress 
   Module  Chanl  Speed  rate  peak                 rate  peak

   3       0        20G   35%   48% @20:38 27Dec07   26%   36% @20:44
04Jan08
   3       1        20G   40%   48% @23:00 04Jan08   34%   43% @22:21 
03Jan08
   4       0        20G   43%   55% @15:57 03Jan08   48%   63% @20:33
27Dec07
   5       0        20G   13%   18% @21:42 02Jan08    9%   17% @22:52
04Jan08
   6       0        20G    0%    1% @01:30 25Dec07    0%    2% @11:27 
30Dec07
   6       1        20G   33%   48% @20:26 27Dec07   45%   54% @22:36
03Jan08
 Switching mode: Module                                        Switching
mode
                 3
truncated
                 4
truncated
                 5                                               flow
through
                 6
truncated

Interface Resources
 Interface drops:
   Module    Total drops:    Tx            Rx      Highest drop port:  Tx 
Rx
   1                       7353          2166                           1
38
   2                   24609502        144685                          14
40
   3                      42130    8135613761                           7 
2
   4                     160468   49040038842                          17
6
   5                    1354908        184496                           1
2
   6                      12027        286149                           1 
1
   7                   29461165        218697                          33
37
   8                    2033449           282                          10
10
   9                   24030508        408094                          36 
29

 Interface buffer sizes:
   Module                            Bytes:     Tx buffer           Rx
buffer
   1                                               112640
6144
   2                                               112640 
6144
   3                                              1221120
152000
   4                                              1221120
152000
   6                                             14622592
1914304 
   7                                               112640
6144
   8                                               112640
6144
   9                                               112640
6144

And for those having enough patience to read the details, here's the
question/problem:
On the 4-th linecard (6724-SFP) we have links grouped in etherchannels
(4xGigabit backbone links), with respect to keeping most of the 
etherchannels with their ports grouped on the same asic/linecard. The
load-balancing used is src-dst-ip. Looking at the figures above I guess
anyone would say there are plenty of resources left yet our graphs/interface

summary shows us that somere between 40-50% fabric utilization, both ingress
and egress, there is a problem with the forwarding performance (also seen
looking at the high IQD counters):

* GigabitEthernet4/1       0 3938121308    0    56 557290000  100095 
620339000  94591    0
* GigabitEthernet4/2       0 3909192601    0   304 562387000  94364
602164000  93503    0
* GigabitEthernet4/3       0 3909817998    0  1113 561663000  94280
847735000  113865    0
* GigabitEthernet4/4       0 3939072687    0    53 557529000  95337
643992000  95015    0

Now, other (posibly) relevant information from the config:

ip cef event-log traceback-depth 0
ip cef table consistency-check error-message 
ip cef table consistency-check auto-repair
ip cef load-sharing algorithm original
mls ip cef load-sharing simple
fabric switching-mode allow truncated
fabric buffer-reserve queue
fabric buffer-reserve low - that seemed to help a lot (over 10% boost in 
performance)

Did anyone hit similar problems with low performance on fabric enabled
linecards ? Any recommended configuration/IOS version ?

Cheers,

Gabriel Mateiciuc
Academia de Studii Economice 
Departamentul Reţele
Echipa Infrastructura - infrastructura at ase.ro

_______________________________________________
cisco-nsp mailing list   <mailto:cisco-nsp at puck.nether.net>
cisco-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/