[c-nsp] Cat 6500 SUP720 environment problems..

Tim Stevenson tstevens at cisco.com
Thu May 10 11:48:40 EDT 2007


Some comments inline:

At 02:18 PM 5/10/2007 +0100, John R contended:
>   Hello all
>
>We have some questions in relation to our environment, we basically have a
>pair of 6509 chassis with sup720-3b`s connecting to lots ( over 300 ) cisco
>3020 blade switches, with each 3020 attached to both 6509`s, there are no
>DFC`s on the linecards.
>
>The 6500`s have 8 x Gig-E connections as a portchannel between them
>
>The environment runs unicast and multicast but there is no really high
>traffic levels, we have some questions relating to below, any comments would
>be most welcome.
>
>
>         6500 --- 8 gig-e portchannel --- 6500
>         \                                          /
>          \                                        /
>           \      300+ 3020 blades      /
>
>
>
>Cat6509`s are running both running 12.2.18SXF5 -
>ipservicesk9-mz.122-18.SXF5.bin
>
>CAT6KSUP720-3B#sh cat
>   chassis MAC addresses: 1024 addresses from 0018.7433.3400 to
>0018.7433.37ff
>   traffic meter =   1%   Last cleared at 13:22:27 GMT Thu Nov 9 2006
>            peak =  96%        reached at 01:12:36 BST Thu May 10 2007
>   switching-clock: clock switchover and system reset is allowed
>
>Q - Is this peak only for the shared bus ?

Yes. Often this is indicitive of a broadcast storm.


>######################################################################################
>
>CAT6KSUP720-3B#sh pla ha cap for
>L2 Forwarding Resources
>            MAC Table usage:   Module  Collisions  Total       Used
>%Used
>                               5                0  65536       2905
>  4%
>
>              VPN CAM usage:                       Total       Used
>%Used
>                                                     512          0
>  0%
>L3 Forwarding Resources
>              FIB TCAM usage:                     Total        Used
>%Used
>                   72 bits (IPv4, MPLS, EoM)     196608        4232
>  2%
>                  144 bits (IP mcast, IPv6)       32768        1483
>  5%
>
>                      detail:      Protocol                    Used
>%Used
>                                   IPv4                        4232
>  2%
>                                   MPLS                           0
>  0%
>                                   EoM                            0
>  0%
>
>                                   IPv6                           2
>  1%
>                                   IPv4 mcast                  1481
>  5%
>                                   IPv6 mcast                     0
>  0%
>
>             Adjacency usage:                     Total        Used
>%Used
>                                                1048576        4194
>  1%
>
>      Forwarding engine load:
>                      Module       pps   peak-pps
>peak-time
>                      5         616391    9068315  15:29:21 GMT Mon Dec 18
>2006
>
>Q - Is the peak-pps the largest peak seen by the PFC

Yes.


>Q - If it is, is this not well short of the 30mpps that the box should be
>able to support

Yes, 9M is less than 30M ;)

I see the pps peak & the bus peak occurred at different times. I am 
trying to think of a reason that the bus could reach 96% in compact 
mode with only 9Mpps. I can't think of one... Could have to do 
w/polling intervals of the respective watermarks, ie, maybe the bus 
util was high for a very short time and the peak was missed in the 
fwding engine polling. Guessing.


>######################################################################################
>
>CAT6KSUP720-3B#sh ibc brief
>Interface information:
>         Interface IBC0/0(idb 0x51E4F010)
>         Hardware is Mistral IBC (revision 5)
>         5 minute rx rate 134000 bits/sec, 60 packets/sec
>         5 minute tx rate 76000 bits/sec, 48 packets/sec
>         801981457 packets input, 158150852481 bytes
>         571784929 broadcasts received
>         615169009 packets output, 150564832578 bytes
>         65392127 broadcasts sent
>         1 Inband input packet drops
>         0 Bridge Packet loopback drops
>         50002482 Packets CEF Switched, 118971932 Packets Fast Switched
>         0 Packets SLB Switched, 0 Packets CWAN Switched
>         IBC resets   = 1; last at 14:25:38.107 gmt Sat Oct 28 2006
>MISTRAL ERROR COUNTERS
>         System address timeouts  = 0     BUS errors     = 0
>         IBC Address timeouts     = 0 (addr 0x0)
>         Page CRC errors          = 0     IBL CRC errors = 0
>         ECC Correctable errors   = 0
>         Packets with padding removed (0/0/0)   = 0
>         Packets expanded (0/0)   = 0
>         Packets attempted tail end expansion > 1 page and were dropped = 0
>         IP packets dropped with frag offset of 1 = 0
>         1696 packets (aggregate) dropped on throttled interfaces
>         Hazard Illegal packet length     = 0     Illegal Offset       = 0
>         Hazard Packet underflow          = 0     Packet Overflow      = 0
>         IBL fill hang count              = 0     Unencapsed packets   = 0
>         LBIC RXQ Drop pkt count = 0            LBIC drop pkt count  = 0
>         LBIC Drop pkt stick     = 0
>
>The CEF counter is not clocking in this instance, whereas the fast switch
>counter is, our understanding is that the IBC is the bus between the SP and
>RP?

sh ibc shows you the inband interface leading to the RP CPU. It is 
not a bus, it is a dedicated 1G FDX interface connected to the switch 
fabric. The SP CPU has its own inband channel. Maybe you are thinking 
of the EOBC which is the back-end bus that interconnects the CPUs on 
the sup and all the linecards.

>Q - Why do we see so many fast switches packets
>Q - Should the CEF counter not increment

Just a theory. The only time you'll see traffic on the ibc is if the 
traffic is punted. Traffic that can be CEF switched therefore would 
not typically hit the inband because the hardware should handle it, 
so the traffic hitting the inband would typically require fast 
switching or process switching.




>######################################################################################
>
>CAT6KSUP720-3B#sh ip mroute count ters
>IP Multicast Statistics
>730 routes using 681034 bytes of memory
>21 groups, 33.76 average sources per group
>
>Q - The above is the avergae mcast count for the box, this to us doesn't
>seem high ?

No, it is not high. Assuming it's what you expect based on your environment.


>Q - With lots of multicast boundary commands configured can this add to load
>?

To which load? It should not affect the mroute count (well actually, 
if anything it will decrease it).


>######################################################################################
>
>CAT6KSUP720-3B#      sh mod
>Mod Ports Card Type                              Model              Serial
>No.
>--- ----- -------------------------------------- ------------------
>-----------
>   1   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1025XXXX
>   2   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1026XXXX
>   3   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1026XXXX
>   4   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1026XXXX
>   5    2  Supervisor Engine 720 (Active)         WS-SUP720-3B
>SAL1028XXXX
>   6   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1025XXXX
>   7   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1026XXXX
>   8   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1026XXXX
>   9   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP
>SAL1025XXXX
>
>Q. - Every port on the switch is configured as per the config below, will
>this cause problems ?
>I.E - Is RMON on every port advisable ?

I can't recall seeing other cust configs doing this. I am no RMON 
expert. It will probably increase CPU load managing all the stats.


>  rmon collection stats 4 owner "root at mgmtstation [1161348691907]"
>  rmon collection history 4 owner "root at mgmtstation [1161348775440]" buckets
>50
>
>#####################################################################################
>
>mls aging long 64
>mls aging normal 32
>
>Q. - Should the above setting be changed to default times for long and
>normal flows ?

I wouldn't unless you are experiencing high CPU.


>#####################################################################################
>
>Q. - We see large numbers of output drops, but little in the way of traffic
>between the port-channel connecting the two 6ks together - does this match
>the following bug ?
>
>http://www.cisco.com/cgi-bin/Support/Bugtool/onebug.pl?bugid=CSCdv86024


This bug should not be present in 12.2SX code. Whether the channel is 
utilized or not depends more on the traffic flow than anything else. 
Output discards represent oversubscription of the egress buffer, 
either due to many:1 traffic patterns, or due to flooding (eg, 
unknown unicasts, broadcast, etc). This brings us back to that 96% 
bus utilization that could have been a broadcast storm - a large 
number of outdiscards on all/lots of ports is another artifact of a storm.

HTH,
Tim


>_______________________________________________
>cisco-nsp mailing list  cisco-nsp at puck.nether.net
>https://puck.nether.net/mailman/listinfo/cisco-nsp
>archive at http://puck.nether.net/pipermail/cisco-nsp/



Tim Stevenson, tstevens at cisco.com
Routing & Switching CCIE #5561
Technical Marketing Engineer, Data Center BU
Cisco Systems, http://www.cisco.com
IP Phone: 408-526-6759
********************************************************
The contents of this message may be *Cisco Confidential*
and are intended for the specified recipients only.


More information about the cisco-nsp mailing list