[c-nsp] 6500 - SUP720 - IOS - traffic problem
Gabriel Mateiciuc
mgabi at ase.ro
Sat Jan 5 13:58:40 EST 2008
Hello jay,
First of all, thanks for the patience of reading and explaining all this.
Unfortunately I was already aware of the facts that you've laid here.
Normally, I would say the same things you've explained here, but . and
there's a but, there are some more empirical observations:
We've had previous experience with a 6500-SUP2 (no fabric) that could hit
80-90% bus utilization without packet loss/drops.
About a month ago we were using 12.2(18)SXF3 IOS and we went to
12.2(18)SXF12. In the process we noticed the packet loss that occurs at peak
hours, so at first we blamed the IOS and we started digging for solutions.
Analyzing the trends revealed that the single difference is the bus
utilization that rose from between 50-60% to 70-80%
Comparing to 6 months ago:
Then: 3,4-3,5 gbit/s on each of the 4 backbone links (port-channels of 4
gbit each) - less clients - bus 50-60% - IOS SXF3
Now: congestion loss on the bb links 2,5-3 gbit/s at peak hours - more
clients connected to the classic cards - bus 70-80% - IOS SXF12
I've made some test like moving 2 of the 4 links on the supervisor from one
Portchannel - that seemed to solve the problem for that po link.
Putting another fabric-enabled card and moving some of the links there would
solve the problem so I'm sure the bus is not hitting the limit. Then again,
that doesn't answer why the 6724 seems uneffective.
So, getting to the very problem, some previous experience with
port-channels, balancing algorithms, high-traffic, not-recommended
configuration options unless advised by tac, not documented IOS bugs . I
think the answer would be amongst these.
PS: I've read the caveats for the IOS we're using now . and there seems to
be no link with the problems we're having.
From: cheddar cheese [mailto:cheddar3 at gmail.com]
Sent: 5 ianuarie 2008 20:03
To: Gabriel Mateiciuc
Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem
Hello Gabriel,
since you have a combo of fabric and non-fabric modules the system switching
mode is "truncated". in this mode non-fabric cards like the 63xx modules
put entire frames on the bus while fabric cards (like your 67xx modules)
just put the headers. a non-fabric card forwards a frame via the bus to the
Supervisors PFC and the PFC then switches it through the fabric to the
fabric enabled card (for traffic going from non-fabric to fabric cards).
The maximum centralized switch performance in truncated mode is 15 Mpps. It
doesn't look like you're hitting this limit but it does look like the bus is
busy. are the 63xx modules heavily utilized? is the traffic mostly large
frames?
i think replacing all or some of the 63xx modules with fabric enabled
modules (like the 6748) should help reduce the bus utilization. also, if
you replace all of them, the system can operate in "compact" mode which
increases the maximum centralized switching capacity to 30 Mpps. if you add
DFCs to the fabric-enabled cards (67xx) then port-to-port traffic within
those cards doesn't touch the bus and the total switching capacity also
scales by 48 Mpps per DFC.
6500 Architecture White paper
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper09
00aecd80673385.shtml
Cisco TAC might be able to help further.
-jay
On Jan 5, 2008 8:14 AM, Gabriel Mateiciuc <mgabi at ase.ro> wrote:
Hello everyone,
Here's the environment i'm talking about:
#sh platform hardware capacity
System Resources
PFC operating mode: PFC3BXL
Supervisor redundancy mode: administratively sso, operationally sso
Switching resources: Module Part number Series CEF
mode
1 WS-X6348-RJ-45 classic
CEF
2 WS-X6348-RJ-45 classic
CEF
3 WS-X6748-GE-TX CEF720
CEF
4 WS-X6724-SFP CEF720
CEF
5 WS-SUP720-3BXL supervisor
CEF
6 WS-X6704-10GE CEF720
CEF
7 WS-X6348-RJ-45 classic
CEF
8 WS-X6348-RJ-45 classic
CEF
9 WS-X6348-RJ-45 classic
CEF
CPU Resources
CPU utilization: Module 5 seconds 1 minute 5
minutes
3 0% / 0% 0%
0%
4 0% / 0% 0%
0%
5 RP 32% / 11% 11%
11%
5 SP 14% / 1% 9%
9%
6 0% / 0% 0%
0%
Processor memory: Module Bytes: Total Used
%Used
3 219661760 94927184
43%
4 219661760 94488840
43%
5 RP 927935472 132545832
14%
5 SP 912623676 218933576
24%
6 219661760 94944424
43%
I/O memory: Module Bytes: Total Used
%Used
5 RP 67108864 11891816
18%
5 SP 67108864 11891760
18%
EOBC Resources
Module Packets/sec Total packets Dropped
packets
3 Rx: 7 280576601
3
Tx: 1 24002677
0
4 Rx: 7 280574860
3
Tx: 3 15260689
0
5 RP Rx: 72 141474821
4066
Tx: 59 109863281
0
5 SP Rx: 11 41664038
4697
Tx: 20 64613234
0
6 Rx: 8 280576597
2
Tx: 2 8779278
0
VLAN Resources
VLANs: 4094 total, 149 VTP, 240 extended, 14 internal, 3691 free
L2 Forwarding Resources
MAC Table usage: Module Collisions Total Used
%Used
5 0 65536 2604
4%
VPN CAM usage: Total Used
%Used
512 0
0%
L3 Forwarding Resources
FIB TCAM usage: Total Used
%Used
72 bits (IPv4, MPLS, EoM) 524288 5558
1%
144 bits (IP mcast, IPv6) 262144 5
1%
detail: Protocol Used
%Used
IPv4 5558
1%
MPLS 0
0%
EoM 0
0%
IPv6 2
1%
IPv4 mcast 3
1%
IPv6 mcast 0
0%
Adjacency usage: Total Used
%Used
1048576 635
1%
Forwarding engine load:
Module pps peak-pps
peak-time
5 7865738 8282714 22:21:27 UTC+2 Fri Jan 4
2008
CPU Rate Limiters Resources
Rate limiters: Total Used Reserved
%Used
Layer 3 9 4 1
44%
Layer 2 4 2 2
50%
ACL/QoS TCAM Resources
Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR,
QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND,
Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source,
LOUdst - LOU destination, ADJ - ACL adjacency
Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst AND OR
ADJ
5 1% 2% 1% 1% 1% 1% 0% 3% 0% 0%
1%
QoS Policer Resources
Aggregate policers: Module Total Used
%Used
5 1024 1
1%
Microflow policer configurations: Module Total Used
%Used
5 64 1
1%
Switch Fabric Resources
Bus utilization: current: 71%, peak was 81% at 22:53:20 UTC+2 Fri Jan 4
2008
Fabric utilization: Ingress Egress
Module Chanl Speed rate peak rate peak
3 0 20G 35% 48% @20:38 27Dec07 26% 36% @20:44
04Jan08
3 1 20G 40% 48% @23:00 04Jan08 34% 43% @22:21
03Jan08
4 0 20G 43% 55% @15:57 03Jan08 48% 63% @20:33
27Dec07
5 0 20G 13% 18% @21:42 02Jan08 9% 17% @22:52
04Jan08
6 0 20G 0% 1% @01:30 25Dec07 0% 2% @11:27
30Dec07
6 1 20G 33% 48% @20:26 27Dec07 45% 54% @22:36
03Jan08
Switching mode: Module Switching
mode
3
truncated
4
truncated
5 flow
through
6
truncated
Interface Resources
Interface drops:
Module Total drops: Tx Rx Highest drop port: Tx
Rx
1 7353 2166 1
38
2 24609502 144685 14
40
3 42130 8135613761 7
2
4 160468 49040038842 17
6
5 1354908 184496 1
2
6 12027 286149 1
1
7 29461165 218697 33
37
8 2033449 282 10
10
9 24030508 408094 36
29
Interface buffer sizes:
Module Bytes: Tx buffer Rx
buffer
1 112640
6144
2 112640
6144
3 1221120
152000
4 1221120
152000
6 14622592
1914304
7 112640
6144
8 112640
6144
9 112640
6144
And for those having enough patience to read the details, here's the
question/problem:
On the 4-th linecard (6724-SFP) we have links grouped in etherchannels
(4xGigabit backbone links), with respect to keeping most of the
etherchannels with their ports grouped on the same asic/linecard. The
load-balancing used is src-dst-ip. Looking at the figures above I guess
anyone would say there are plenty of resources left yet our graphs/interface
summary shows us that somere between 40-50% fabric utilization, both ingress
and egress, there is a problem with the forwarding performance (also seen
looking at the high IQD counters):
* GigabitEthernet4/1 0 3938121308 0 56 557290000 100095
620339000 94591 0
* GigabitEthernet4/2 0 3909192601 0 304 562387000 94364
602164000 93503 0
* GigabitEthernet4/3 0 3909817998 0 1113 561663000 94280
847735000 113865 0
* GigabitEthernet4/4 0 3939072687 0 53 557529000 95337
643992000 95015 0
Now, other (posibly) relevant information from the config:
ip cef event-log traceback-depth 0
ip cef table consistency-check error-message
ip cef table consistency-check auto-repair
ip cef load-sharing algorithm original
mls ip cef load-sharing simple
fabric switching-mode allow truncated
fabric buffer-reserve queue
fabric buffer-reserve low - that seemed to help a lot (over 10% boost in
performance)
Did anyone hit similar problems with low performance on fabric enabled
linecards ? Any recommended configuration/IOS version ?
Cheers,
Gabriel Mateiciuc
Academia de Studii Economice
Departamentul Reţele
Echipa Infrastructura - infrastructura at ase.ro
_______________________________________________
cisco-nsp mailing list <mailto:cisco-nsp at puck.nether.net>
cisco-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
More information about the cisco-nsp
mailing list