[nsp] Catalyst6509 GE interface hang without any indication

Todd, Douglas M. DTODD at PARTNERS.ORG
Wed May 26 11:04:28 EDT 2004


 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joe:

1) There is always a possibility of a DoS against a box, but, most of the DoS
would cause an interface to stop accepting traffic L3 wise (Wedged interface,
cpu 100% etc).

2)  We have seen problems that have taken months to figure out where we would
have 15/1 reset on us for no apparent reason. We put a trace for months and were
able to correlate traffic to the resets.  The cause was 1000's corrupted packets
coming from a few appletalk machines to other appletalk machines causing 15/1 to
lose communication to the Sup.

3) A few questions to ask you:
	a) When the box hangs does the sup2 hang or is it just 15/1?
	b) On a 6500 sup2/pfc2 the only packets that hit the msfc are process
switched packets.
		i: Thus if you span 15/1 you will see all the packets that are
hitting the msfc and thus 
			process switched. Process switched traffic can include:
					- icmp to a local interface
					- IPX/Appletalk (fast switched)
					- Multicast
					- Invalid traffic 
						Ex: we had double dot1q taged
traffic where a packet sent to the
							msfc had a dot1q header
on it (15/1 is a "ISL trunk"). This
							extra header was causing
traffic to be process switched.
					- I am assuming there is much more
traffic. However spanning 15/1 will
						give you a view of what you are
process switching.

	c) You should not have any vlans "flap" per say. 
		1) no active ports in the vlan - (user segement)
		2) You are losing scp between the sup and the msfc (bad
sup/msfc!)
			-----	Snip-----
				> 01. 4/29/2004,11:37:38:
InbandKeepAliveFailure:Module 1 not 
			> responding over inband 02. 4/29/2004,11:37:38: 
> InbandKeepAlive:Module 2 inband rate: rx=0 pps, tx=0 pps 03. 
> 4/29/2004,11:38:03: ProcessStatusPing:Module 1 not responding 
> over SCP 04. 4/29/2004,11:38:03: ProcessStatusPing:Module 1 
> not responding...

		The following commands will help you diagnose this problem:
			MSFC:
			sh scp status
			sh scp accounting
			sh scp count

		3) on the sup: 
		To test scp: 
				test scp 15 <packet size> <number of packets>
				you should not drop or lose any packets. This is
NON-intrusive....However, if this is the
				problem with your switch you may hang the box.

	switch> (enable) test scp 15 1400 3000 
Pinging Module 15, Length 64, Count 3
0: PASSED
1: PASSED
2: PASSED
			
		
			sh inband
			sh scp mod
			sh scp stat
			sh scp failcnt
		4) See if you have any pinnacle errors (might have a bad asic on
that one Gig e) This we did have 
			and we traced it back to the 1st of 4 asics. We would
drop traffic inbound on 1 port only when
			traffic was sent to the msfc and back to the same vlan
(multinetting).
			- show asicreg 1/4 pinnacle err
			- show asicreg 1/3 pinnacle err

	d: If we are guessing a DoS, you can take a glance at your buffers and
see if you are taking hits at the time
		of the freeze. You would also see a log of input drops and
inmput buffer issues.

4) To note a few things from your email below - 15/1: Is there any correlation
between Gig to M160 not sending/receiving traffic and 15/1 freezing? OR is 15/1
freezing the only cause of traffic not being forwarded. I guess I am saying the
MSFC is problem and not the gig port between the 6500 and the m160. Is this
valid?

5) Assuming that scp is not the problem I would set up a span and capture
traffic up until the msfc fails.  Then see if you can see what might be causing
it (I have done this many times - Let me know if you need some ideas).

> Module 15 Log:
>   Reset Count:   2
>   Reset History: Thu Apr 29 2004, 11:39:59
>                  Mon Sep 8 2003, 00:24:09

Hope this helps.

==DMT>

- ----SIGNAURE-------
Douglas M. Todd, Jr.
Network Engineering
Partners Health Care
Building 149
149 13 Street
Charlestown, MA 02129-200
Tel: 617.726.1403
Email: dtodd at partners.org
- --------------------------------------------------------------------
PGP Finger Print: 9429 CAE3 B2D1 C2E1 DFBC  E7A6 E90A 9BE5 C7B6 47BC
Key available via email.
Verisign S/N: 3ff65cdf58b9dceda004baeed49e16cf
https://digitalid.verisign.com/services/client/index.html 

> -----Original Message-----
> From: Joe Shen [mailto:jshen at christmas.9966.org] 
> Sent: Tuesday, May 25, 2004 9:40 PM
> To: Todd, Douglas M.
> Cc: cisco-nsp at puck.nether.net
> Subject: RE: [nsp] Catalyst6509 GE interface hang without any 
> indication
> 
> Thanks for your help.
> 
> 
> I tried to run  "show errdisable detect" on my 6509 box, but 
> it responds with not recongnize.
> And, if it's for autonegotion problem why it works now, we do 
> not modify configuration.
> 
> I checked 6509's log,  but I can't find out anything possible 
> problem( I included them below).
> 
> 
> Is there any possible DoS attack on Catalyst6509 ?  One of our
> Catalyst6509 hanged yesterday again, and it does not responds 
> on either PFC or MSFC.
> 
>  
> Thanks a lot for your kindly help.
> 
> Best regards
> 
> Joe 
> 
> 
> Ps. Detailed information:
> 
> 
> 
>  the load on GE interface come down to 0 at about: 2004.4.30  14:00 . 
> 
> 
> ///////////////////////////////////////////////////////
> 
> 6509C-SUP-hz> (enable) sh log
> 
> Network Management Processor (STANDBY NMP) Log:
>   Reset count:   2
>   Re-boot History:   Apr 29 2004 11:36:41 0, Sep 08 2003 00:21:01 0
>   Bootrom Checksum Failures:      0   UART Failures:          
>         0
>   Flash Checksum Failures:        0   Flash Program Failures: 
>         0
>   Power Supply 1 Failures:        0   Power Supply 2 
> Failures:        0
>   Swapped to CLKA:                0   Swapped to CLKB:        
>         0
>   Swapped to Processor 1:         0   Swapped to Processor 2: 
>         0
>   DRAM Failures:                  0
> 
>   Exceptions:                     0
> 
>   Loaded NMP version:            7.6(1)
>   Software version:              bootflash:cat6000-sup2k9.7-6-1.bin
>   Reload same NMP version count: 5
> 
>   Last software reset by user: 7/21/2003,12:49:51
> 
>   EOBC Exceptions/Hang:            0
> 
> Heap Memory Log:
> Corrupted Block = none
>         
> NVRAM log:
>         
>         
> Network Management Processor (ACTIVE NMP) Log:
>   Reset count:   1
>   Re-boot History:   Sep 08 2003 00:21:25 0
>         
>   Bootrom Checksum Failures:      0   UART Failures:          
>         0
>   Flash Checksum Failures:        0   Flash Program Failures: 
>         0
>   Power Supply 1 Failures:        0   Power Supply 2 
> Failures:        0
>   Swapped to CLKA:                0   Swapped to CLKB:        
>         0
>   Swapped to Processor 1:         0   Swapped to Processor 2: 
>         1
>   DRAM Failures:                  0
>         
>   Exceptions:                     0
>         
>   Loaded NMP version:            7.6(1)
>   Software version:              bootflash:cat6000-sup2k9.7-6-1.bin
>   Reload same NMP version count: 4
>         
>   EOBC Exceptions/Hang:            0
>         
> Heap Memory Log:
> Corrupted Block = none
>         
> NVRAM log:
>         
> 01. 4/29/2004,11:37:38: InbandKeepAliveFailure:Module 1 not 
> responding over inband 02. 4/29/2004,11:37:38: 
> InbandKeepAlive:Module 2 inband rate: rx=0 pps, tx=0 pps 03. 
> 4/29/2004,11:38:03: ProcessStatusPing:Module 1 not responding 
> over SCP 04. 4/29/2004,11:38:03: ProcessStatusPing:Module 1 
> not responding...
> resetting module
> 05. 4/29/2004,11:38:03: ha_swover_sync_status:static-db:1, 
> dyn-db:1 (0-invalid, 1-valid) 06. 4/29/2004,11:38:03: 
> updateRuntimeWithNVRAM:Redundancy switch over: 2 07. 
> 4/29/2004,11:38:06: ha_swover_type:Switchover type: 1 (1-HA,
> 0-Non-HA)
> 08. 4/29/2004,11:38:06: ha_entity_swover_action:action: 1 
> (1-NOP, 4-CINIT), reason:0 09. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:1, action: 5 (1-5:N,A,P,C,R), 
> reason:6 10. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:2, action: 1 (1-5:N,A,P,C,R), 
> reason:0 11. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:3, action: 4 (1-5:N,A,P,C,R), 
> reason:4 12. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:4, action: 1 (1-5:N,A,P,C,R), 
> reason:0 13. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:5, action: 1 (1-5:N,A,P,C,R), 
> reason:0 14. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:6, action: 2 (1-5:N,A,P,C,R), 
> reason:12 15. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:7, action: 2 (1-5:N,A,P,C,R), 
> reason:12 16. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:8, action: 2 (1-5:N,A,P,C,R), 
> reason:12 17. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:9, action: 2 (1-5:N,A,P,C,R), 
> reason:12 18. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:15, action: 5 (1-5:N,A,P,C,R), 
> reason:6 19. 4/29/2004,11:38:12: 
> ha_module_swover_action:Module:16, action: 1 (1-5:N,A,P,C,R), 
> reason:11 20. 4/29/2004,11:38:12: ha_swover_time_data:Current 
> system time step:1,
> hi: 4, lo:-1269897776
> 21. 4/29/2004,11:38:12: ha_swover_time_data:Current system 
> time step:2,
> hi: 4, lo:-1269896393
> 22. 4/29/2004,11:38:12: ha_swover_time_data:Current system 
> time step:3,
> hi: 4, lo:-1269896215
> 23. 4/29/2004,11:38:12: ha_swover_time_data:Current system 
> time step:4,
> hi: 4, lo:-1269890878
>         
> Module 4 Log:
>   Reset Count:   1
>   Reset History: Mon Sep 8 2003, 00:24:14
>                  
>         
> Module 5 Log:
>   Reset Count:   1
>   Reset History: Mon Sep 8 2003, 00:24:14
>                  
>         
> Module 15 Log:
>   Reset Count:   2
>   Reset History: Thu Apr 29 2004, 11:39:59
>                  Mon Sep 8 2003, 00:24:09
>                  
>         
> Module 16 Log:
>   Reset Count:   1
>   Reset History: Mon Sep 8 2003, 00:24:09
>                  
> 6509C-SUP-hz> (enable)
> 
> 
> 6509C-SUP-hz> (enable) sh udld
> UDLD              : disabled
> Message Interval  : 15 seconds
> 
> 
> idc6509C-SUP-hz> (enable) sh port 1/1
> * = Configured MAC Address 
> 
> Port  Name                 Status     Vlan       Duplex Speed Type
> ----- -------------------- ---------- ---------- ------ -----
> ------------
>  1/1  to_M160     connected  12           full  1000 1000-LX/LH
> 
> 
> Port  Security Violation Shutdown-Time Age-Time Max-Addr Trap
> IfIndex
> ----- -------- --------- ------------- -------- -------- --------
> -------
>  1/1  disabled  shutdown             0        0        1 disabled
> 4
> 
> Port  Num-Addr Secure-Src-Addr     Age-Left Last-Src-Addr
> Shutdown/Time-Left
> ----- -------- -----------------   -------- -----------------
> ------------------
>  1/1         0                 -          -                 -        -
> -
> 
> Port  Flooding on Address Limit
> ----- -------------------------
>  1/1                    Enabled
> 
> Port     Broadcast-Limit Multicast Unicast Total-Drop           Action
> 
> -------- --------------- --------- ------- --------------------
> ------------
>  1/1                   -         -       -                    0
> drop-packets
> 
> Port  Send FlowControl  Receive FlowControl   RxPause    TxPause
>       admin    oper     admin     oper
> ----- -------- -------- --------- ---------   ---------- ----------
>  1/1  desired  on       off       off         0          0          
>         
> Port  Status     Channel              Admin Ch
>                  Mode                 Group Id
> ----- ---------- -------------------- ----- -----
>  1/1  connected  auto silent             33     0
>         
> Port  Status      ErrDisable Reason    Port ErrDisableTimeout 
>  Action on
> Timeout
> ----  ----------  -------------------  ----------------------
> -----------------
>  1/1  connected                     -  Enable                 
>  No Change
>         
> Port  Align-Err  FCS-Err    Xmit-Err   Rcv-Err    UnderSize
> ----- ---------- ---------- ---------- ---------- ---------
>  1/1           0          0          0          0         0
>         
> Port  Single-Col Multi-Coll Late-Coll  Excess-Col Carri-Sen Runts
> Giants
> ----- ---------- ---------- ---------- ---------- --------- ---------
> ---------
>  1/1           0          0          0          0         0         0
> 0
>         
> Port  Last-Time-Cleared
> ----- --------------------------
>  1/1  Mon May 24 2004, 11:07:53
>         
> Idle Detection
> --------------
>    --    
> 
> 6509C-SUP-hz> (enable) 
> 
> /////////////////////////////
> 
> The log on MSFC  shows some other VLAN interface on 6509 flap 
> between up
> & down frequently, but no record on GE interface connecting M160.
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Todd, Douglas M. [mailto:DTODD at PARTNERS.ORG] 
> Sent: Tuesday, May 25, 2004 11:16 PM
> To: Joe Shen
> Cc: cisco-nsp at puck.nether.net; jshen at christmas.996.org
> Subject: RE: [nsp] Catalyst6509 GE interface hang without any 
> indication
> 
> 
> Just some considerations:
> 
> 
> 1) do a {sh log}
> This might help with the debugging situation. Pay attention 
> to the nvram
> log and the Active NMP log. They may help you. 2)If I understand the
> problem correctly you have a GIG E hang - not pass any traffic? 
> 	you have the following: M160-GIG port->port GIG 6500, right?
> 3) These devices are not trunking (just to make sure)
> 	A: not channeling
> 	b: not running udld (see 3)
> 	c: Trunking on the port is off
> 
> 3) Are you running udld on the 6500 or on the m160?
> 	Sounds like a udld problem where you are xmit on side tx on the
> other. This can keep the port up/up and make it seem like it hangs...
> 	{sh udld}
> 	{sh udld port x/y}
> 	
> 4) do a {show port counters} {sh port x/y} on the 6500 and see if you
> find any counters other than the usual mib increasing. You don't have
> any align/fcs/xmit/rcv/ drops on the interface?
> 
> 5) no the next reload you could run a diag complete to check for
> hardware issues.
> 
> 6) What is the state of both interfaces when they are hung? Do the
> counters increase? Which ones increase on both sides?
> 
> Just some thoughts...
> 
> ==DMT>
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0.3

iQA/AwUBQLSyS8TXy2QW1R3hEQJmbgCbBs1B1WGvGeA4vhXgvWHlNDHC9WgAnjIB
y2plH/RAjeHNmfRoBX5xw6x+
=/YFE
-----END PGP SIGNATURE-----


More information about the cisco-nsp mailing list