[c-nsp] 6500 - SUP720 - IOS - traffic problem
Gabriel Mateiciuc
mgabi at ase.ro
Sat Jan 5 15:03:21 EST 2008
For the record … I seem to be “hit” by some bugs:
CSCsl70634 Bug Details
Headline 67xx EC tx/rx traffic dependency resulting in low throughput
Product IOS
Feature OTHERS Duplicate of
Severity 1 Severity help Status Resolved Status help
First Found-in Version 12.2(18)SXF12 All affected versions First
Fixed-in Version 8.7(0.22)BUB19, 8.7(0.22)SRC4, 12.2(18.12.5)SXF
Version help
Release Notes
Symptom : Port Channel experiences over runs.
Condition : Seen on 67xx card.
Trigger : When port receives +6Gbps of ip2tag traffic.
Frequency : Found internally. No service requests.
Root Cause : This is caused by flow-control asserted by fabric interface
asic.
The impact : Impacts traffic.
Workaround : None.
Issue verification : None.
And :
CSCeh08451 Bug Details
Headline Excessive Overruns and lbusDrops due heavy flow control over
fabric
Product IOS
Feature OTHERS Duplicate of
Severity 1 Severity help Status Resolved Status help
First Found-in Version 12.2(17d)SXB02 All affected versions First
Fixed-in Version 12.2(18)SXE, 12.2(18)SXD05, 12.2(17d)SXB08 Version help
Release Notes
Symptoms
Sup720 system running in flow through mode (may not be not limited to this
mode) may get to constant flow controlling situation under certain traffic
profile which reduces the through put of the system
Workaround
A command has been to added to reserve ASIC buffers in Line card to improve
the through-put of the system
[no] fabric buffer-reserve [high | low | medium | value]
high - 0x5050
medium - 0x4040
low - 0x3030
value any 16 bit value from 0x0 to 0x5050
From: cheddar cheese [mailto:cheddar3 at gmail.com]
Sent: 5 ianuarie 2008 21:34
To: Gabriel Mateiciuc
Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem
probably best to open a case and upload all relevant info.
good luck,
-jay
On Jan 5, 2008 12:58 PM, Gabriel Mateiciuc <mgabi at ase.ro> wrote:
Hello jay,
First of all, thanks for the patience of reading and explaining all this. Unfortunately I was already aware of the facts that you've laid here.
Normally, I would say the same things you've explained here, but … and there's a but, there are some more empirical observations:
We've had previous experience with a 6500-SUP2 (no fabric) that could hit 80-90% bus utilization without packet loss/drops.
About a month ago we were using 12.2(18)SXF3 IOS and we went to 12.2(18)SXF12. In the process we noticed the packet loss that occurs at peak hours, so at first we blamed the IOS and we started digging for solutions.
Analyzing the trends revealed that the single difference is the bus utilization that rose from between 50-60% to 70-80%
Comparing to 6 months ago:
Then: 3,4-3,5 gbit/s on each of the 4 backbone links (port-channels of 4 gbit each) – less clients – bus 50-60% - IOS SXF3
Now: congestion loss on the bb links 2,5-3 gbit/s at peak hours – more clients connected to the classic cards – bus 70-80% - IOS SXF12
I've made some test like moving 2 of the 4 links on the supervisor from one Portchannel – that seemed to solve the problem for that po link.
Putting another fabric-enabled card and moving some of the links there would solve the problem so I'm sure the bus is not hitting the limit. Then again, that doesn't answer why the 6724 seems uneffective.
So, getting to the very problem, some previous experience with port-channels, balancing algorithms, high-traffic, not-recommended configuration options unless advised by tac, not documented IOS bugs … I think the answer would be amongst these.
PS: I've read the caveats for the IOS we're using now … and there seems to be no link with the problems we're having.
From: cheddar cheese [mailto:cheddar3 at gmail.com]
Sent: 5 ianuarie 2008 20:03
To: Gabriel Mateiciuc
Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem
Hello Gabriel,
since you have a combo of fabric and non-fabric modules the system switching mode is "truncated". in this mode non-fabric cards like the 63xx modules put entire frames on the bus while fabric cards (like your 67xx modules) just put the headers. a non-fabric card forwards a frame via the bus to the Supervisors PFC and the PFC then switches it through the fabric to the fabric enabled card (for traffic going from non-fabric to fabric cards).
The maximum centralized switch performance in truncated mode is 15 Mpps. It doesn't look like you're hitting this limit but it does look like the bus is busy. are the 63xx modules heavily utilized? is the traffic mostly large frames?
i think replacing all or some of the 63xx modules with fabric enabled modules (like the 6748) should help reduce the bus utilization. also, if you replace all of them, the system can operate in "compact" mode which increases the maximum centralized switching capacity to 30 Mpps. if you add DFCs to the fabric-enabled cards (67xx) then port-to-port traffic within those cards doesn't touch the bus and the total switching capacity also scales by 48 Mpps per DFC.
6500 Architecture White paper
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper0900aecd80673385.shtml
Cisco TAC might be able to help further.
-jay
On Jan 5, 2008 8:14 AM, Gabriel Mateiciuc <mgabi at ase.ro> wrote:
Hello everyone,
Here's the environment i'm talking about:
#sh platform hardware capacity
System Resources
PFC operating mode: PFC3BXL
Supervisor redundancy mode: administratively sso, operationally sso
Switching resources: Module Part number Series CEF
mode
1 WS-X6348-RJ-45 classic
CEF
2 WS-X6348-RJ-45 classic
CEF
3 WS-X6748-GE-TX CEF720
CEF
4 WS-X6724-SFP CEF720
CEF
5 WS-SUP720-3BXL supervisor
CEF
6 WS-X6704-10GE CEF720
CEF
7 WS-X6348-RJ-45 classic
CEF
8 WS-X6348-RJ-45 classic
CEF
9 WS-X6348-RJ-45 classic
CEF
CPU Resources
CPU utilization: Module 5 seconds 1 minute 5
minutes
3 0% / 0% 0%
0%
4 0% / 0% 0%
0%
5 RP 32% / 11% 11%
11%
5 SP 14% / 1% 9%
9%
6 0% / 0% 0%
0%
Processor memory: Module Bytes: Total Used
%Used
3 219661760 94927184
43%
4 219661760 94488840
43%
5 RP 927935472 132545832
14%
5 SP 912623676 218933576
24%
6 219661760 94944424
43%
I/O memory: Module Bytes: Total Used
%Used
5 RP 67108864 11891816
18%
5 SP 67108864 11891760
18%
EOBC Resources
Module Packets/sec Total packets Dropped
packets
3 Rx: 7 280576601
3
Tx: 1 24002677
0
4 Rx: 7 280574860
3
Tx: 3 15260689
0
5 RP Rx: 72 141474821
4066
Tx: 59 109863281
0
5 SP Rx: 11 41664038
4697
Tx: 20 64613234
0
6 Rx: 8 280576597
2
Tx: 2 8779278
0
VLAN Resources
VLANs: 4094 total, 149 VTP, 240 extended, 14 internal, 3691 free
L2 Forwarding Resources
MAC Table usage: Module Collisions Total Used
%Used
5 0 65536 2604
4%
VPN CAM usage: Total Used
%Used
512 0
0%
L3 Forwarding Resources
FIB TCAM usage: Total Used
%Used
72 bits (IPv4, MPLS, EoM) 524288 5558
1%
144 bits (IP mcast, IPv6) 262144 5
1%
detail: Protocol Used
%Used
IPv4 5558
1%
MPLS 0
0%
EoM 0
0%
IPv6 2
1%
IPv4 mcast 3
1%
IPv6 mcast 0
0%
Adjacency usage: Total Used
%Used
1048576 635
1%
Forwarding engine load:
Module pps peak-pps
peak-time
5 7865738 8282714 22:21:27 UTC+2 Fri Jan 4
2008
CPU Rate Limiters Resources
Rate limiters: Total Used Reserved
%Used
Layer 3 9 4 1
44%
Layer 2 4 2 2
50%
ACL/QoS TCAM Resources
Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR,
QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND,
Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source,
LOUdst - LOU destination, ADJ - ACL adjacency
Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst AND OR
ADJ
5 1% 2% 1% 1% 1% 1% 0% 3% 0% 0%
1%
QoS Policer Resources
Aggregate policers: Module Total Used
%Used
5 1024 1
1%
Microflow policer configurations: Module Total Used
%Used
5 64 1
1%
Switch Fabric Resources
Bus utilization: current: 71%, peak was 81% at 22:53:20 UTC+2 Fri Jan 4
2008
Fabric utilization: Ingress Egress
Module Chanl Speed rate peak rate peak
3 0 20G 35% 48% @20:38 27Dec07 26% 36% @20:44
04Jan08
3 1 20G 40% 48% @23:00 04Jan08 34% 43% @22:21
03Jan08
4 0 20G 43% 55% @15:57 03Jan08 48% 63% @20:33
27Dec07
5 0 20G 13% 18% @21:42 02Jan08 9% 17% @22:52
04Jan08
6 0 20G 0% 1% @01:30 25Dec07 0% 2% @11:27
30Dec07
6 1 20G 33% 48% @20:26 27Dec07 45% 54% @22:36
03Jan08
Switching mode: Module Switching
mode
3
truncated
4
truncated
5 flow
through
6
truncated
Interface Resources
Interface drops:
Module Total drops: Tx Rx Highest drop port: Tx
Rx
1 7353 2166 1
38
2 24609502 144685 14
40
3 42130 8135613761 7
2
4 160468 49040038842 17
6
5 1354908 184496 1
2
6 12027 286149 1
1
7 29461165 218697 33
37
8 2033449 282 10
10
9 24030508 408094 36
29
Interface buffer sizes:
Module Bytes: Tx buffer Rx
buffer
1 112640
6144
2 112640
6144
3 1221120
152000
4 1221120
152000
6 14622592
1914304
7 112640
6144
8 112640
6144
9 112640
6144
And for those having enough patience to read the details, here's the
question/problem:
On the 4-th linecard (6724-SFP) we have links grouped in etherchannels
(4xGigabit backbone links), with respect to keeping most of the
etherchannels with their ports grouped on the same asic/linecard. The
load-balancing used is src-dst-ip. Looking at the figures above I guess
anyone would say there are plenty of resources left yet our graphs/interface
summary shows us that somere between 40-50% fabric utilization, both ingress
and egress, there is a problem with the forwarding performance (also seen
looking at the high IQD counters):
* GigabitEthernet4/1 0 3938121308 0 56 557290000 100095
620339000 94591 0
* GigabitEthernet4/2 0 3909192601 0 304 562387000 94364
602164000 93503 0
* GigabitEthernet4/3 0 3909817998 0 1113 561663000 94280
847735000 113865 0
* GigabitEthernet4/4 0 3939072687 0 53 557529000 95337
643992000 95015 0
Now, other (posibly) relevant information from the config:
ip cef event-log traceback-depth 0
ip cef table consistency-check error-message
ip cef table consistency-check auto-repair
ip cef load-sharing algorithm original
mls ip cef load-sharing simple
fabric switching-mode allow truncated
fabric buffer-reserve queue
fabric buffer-reserve low - that seemed to help a lot (over 10% boost in
performance)
Did anyone hit similar problems with low performance on fabric enabled
linecards ? Any recommended configuration/IOS version ?
Cheers,
Gabriel Mateiciuc
Academia de Studii Economice
Departamentul Reţele
Echipa Infrastructura - infrastructura at ase.ro
_______________________________________________
cisco-nsp mailing list cisco-nsp at puck.nether.net <mailto:cisco-nsp at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
More information about the cisco-nsp
mailing list