[nsp] Catalyst6509 GE interface hang without any indication
Todd, Douglas M.
DTODD at PARTNERS.ORG
Wed May 26 11:04:28 EDT 2004
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Joe:
1) There is always a possibility of a DoS against a box, but, most of the DoS
would cause an interface to stop accepting traffic L3 wise (Wedged interface,
cpu 100% etc).
2) We have seen problems that have taken months to figure out where we would
have 15/1 reset on us for no apparent reason. We put a trace for months and were
able to correlate traffic to the resets. The cause was 1000's corrupted packets
coming from a few appletalk machines to other appletalk machines causing 15/1 to
lose communication to the Sup.
3) A few questions to ask you:
a) When the box hangs does the sup2 hang or is it just 15/1?
b) On a 6500 sup2/pfc2 the only packets that hit the msfc are process
switched packets.
i: Thus if you span 15/1 you will see all the packets that are
hitting the msfc and thus
process switched. Process switched traffic can include:
- icmp to a local interface
- IPX/Appletalk (fast switched)
- Multicast
- Invalid traffic
Ex: we had double dot1q taged
traffic where a packet sent to the
msfc had a dot1q header
on it (15/1 is a "ISL trunk"). This
extra header was causing
traffic to be process switched.
- I am assuming there is much more
traffic. However spanning 15/1 will
give you a view of what you are
process switching.
c) You should not have any vlans "flap" per say.
1) no active ports in the vlan - (user segement)
2) You are losing scp between the sup and the msfc (bad
sup/msfc!)
----- Snip-----
> 01. 4/29/2004,11:37:38:
InbandKeepAliveFailure:Module 1 not
> responding over inband 02. 4/29/2004,11:37:38:
> InbandKeepAlive:Module 2 inband rate: rx=0 pps, tx=0 pps 03.
> 4/29/2004,11:38:03: ProcessStatusPing:Module 1 not responding
> over SCP 04. 4/29/2004,11:38:03: ProcessStatusPing:Module 1
> not responding...
The following commands will help you diagnose this problem:
MSFC:
sh scp status
sh scp accounting
sh scp count
3) on the sup:
To test scp:
test scp 15 <packet size> <number of packets>
you should not drop or lose any packets. This is
NON-intrusive....However, if this is the
problem with your switch you may hang the box.
switch> (enable) test scp 15 1400 3000
Pinging Module 15, Length 64, Count 3
0: PASSED
1: PASSED
2: PASSED
sh inband
sh scp mod
sh scp stat
sh scp failcnt
4) See if you have any pinnacle errors (might have a bad asic on
that one Gig e) This we did have
and we traced it back to the 1st of 4 asics. We would
drop traffic inbound on 1 port only when
traffic was sent to the msfc and back to the same vlan
(multinetting).
- show asicreg 1/4 pinnacle err
- show asicreg 1/3 pinnacle err
d: If we are guessing a DoS, you can take a glance at your buffers and
see if you are taking hits at the time
of the freeze. You would also see a log of input drops and
inmput buffer issues.
4) To note a few things from your email below - 15/1: Is there any correlation
between Gig to M160 not sending/receiving traffic and 15/1 freezing? OR is 15/1
freezing the only cause of traffic not being forwarded. I guess I am saying the
MSFC is problem and not the gig port between the 6500 and the m160. Is this
valid?
5) Assuming that scp is not the problem I would set up a span and capture
traffic up until the msfc fails. Then see if you can see what might be causing
it (I have done this many times - Let me know if you need some ideas).
> Module 15 Log:
> Reset Count: 2
> Reset History: Thu Apr 29 2004, 11:39:59
> Mon Sep 8 2003, 00:24:09
Hope this helps.
==DMT>
- ----SIGNAURE-------
Douglas M. Todd, Jr.
Network Engineering
Partners Health Care
Building 149
149 13 Street
Charlestown, MA 02129-200
Tel: 617.726.1403
Email: dtodd at partners.org
- --------------------------------------------------------------------
PGP Finger Print: 9429 CAE3 B2D1 C2E1 DFBC E7A6 E90A 9BE5 C7B6 47BC
Key available via email.
Verisign S/N: 3ff65cdf58b9dceda004baeed49e16cf
https://digitalid.verisign.com/services/client/index.html
> -----Original Message-----
> From: Joe Shen [mailto:jshen at christmas.9966.org]
> Sent: Tuesday, May 25, 2004 9:40 PM
> To: Todd, Douglas M.
> Cc: cisco-nsp at puck.nether.net
> Subject: RE: [nsp] Catalyst6509 GE interface hang without any
> indication
>
> Thanks for your help.
>
>
> I tried to run "show errdisable detect" on my 6509 box, but
> it responds with not recongnize.
> And, if it's for autonegotion problem why it works now, we do
> not modify configuration.
>
> I checked 6509's log, but I can't find out anything possible
> problem( I included them below).
>
>
> Is there any possible DoS attack on Catalyst6509 ? One of our
> Catalyst6509 hanged yesterday again, and it does not responds
> on either PFC or MSFC.
>
>
> Thanks a lot for your kindly help.
>
> Best regards
>
> Joe
>
>
> Ps. Detailed information:
>
>
>
> the load on GE interface come down to 0 at about: 2004.4.30 14:00 .
>
>
> ///////////////////////////////////////////////////////
>
> 6509C-SUP-hz> (enable) sh log
>
> Network Management Processor (STANDBY NMP) Log:
> Reset count: 2
> Re-boot History: Apr 29 2004 11:36:41 0, Sep 08 2003 00:21:01 0
> Bootrom Checksum Failures: 0 UART Failures:
> 0
> Flash Checksum Failures: 0 Flash Program Failures:
> 0
> Power Supply 1 Failures: 0 Power Supply 2
> Failures: 0
> Swapped to CLKA: 0 Swapped to CLKB:
> 0
> Swapped to Processor 1: 0 Swapped to Processor 2:
> 0
> DRAM Failures: 0
>
> Exceptions: 0
>
> Loaded NMP version: 7.6(1)
> Software version: bootflash:cat6000-sup2k9.7-6-1.bin
> Reload same NMP version count: 5
>
> Last software reset by user: 7/21/2003,12:49:51
>
> EOBC Exceptions/Hang: 0
>
> Heap Memory Log:
> Corrupted Block = none
>
> NVRAM log:
>
>
> Network Management Processor (ACTIVE NMP) Log:
> Reset count: 1
> Re-boot History: Sep 08 2003 00:21:25 0
>
> Bootrom Checksum Failures: 0 UART Failures:
> 0
> Flash Checksum Failures: 0 Flash Program Failures:
> 0
> Power Supply 1 Failures: 0 Power Supply 2
> Failures: 0
> Swapped to CLKA: 0 Swapped to CLKB:
> 0
> Swapped to Processor 1: 0 Swapped to Processor 2:
> 1
> DRAM Failures: 0
>
> Exceptions: 0
>
> Loaded NMP version: 7.6(1)
> Software version: bootflash:cat6000-sup2k9.7-6-1.bin
> Reload same NMP version count: 4
>
> EOBC Exceptions/Hang: 0
>
> Heap Memory Log:
> Corrupted Block = none
>
> NVRAM log:
>
> 01. 4/29/2004,11:37:38: InbandKeepAliveFailure:Module 1 not
> responding over inband 02. 4/29/2004,11:37:38:
> InbandKeepAlive:Module 2 inband rate: rx=0 pps, tx=0 pps 03.
> 4/29/2004,11:38:03: ProcessStatusPing:Module 1 not responding
> over SCP 04. 4/29/2004,11:38:03: ProcessStatusPing:Module 1
> not responding...
> resetting module
> 05. 4/29/2004,11:38:03: ha_swover_sync_status:static-db:1,
> dyn-db:1 (0-invalid, 1-valid) 06. 4/29/2004,11:38:03:
> updateRuntimeWithNVRAM:Redundancy switch over: 2 07.
> 4/29/2004,11:38:06: ha_swover_type:Switchover type: 1 (1-HA,
> 0-Non-HA)
> 08. 4/29/2004,11:38:06: ha_entity_swover_action:action: 1
> (1-NOP, 4-CINIT), reason:0 09. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:1, action: 5 (1-5:N,A,P,C,R),
> reason:6 10. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:2, action: 1 (1-5:N,A,P,C,R),
> reason:0 11. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:3, action: 4 (1-5:N,A,P,C,R),
> reason:4 12. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:4, action: 1 (1-5:N,A,P,C,R),
> reason:0 13. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:5, action: 1 (1-5:N,A,P,C,R),
> reason:0 14. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:6, action: 2 (1-5:N,A,P,C,R),
> reason:12 15. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:7, action: 2 (1-5:N,A,P,C,R),
> reason:12 16. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:8, action: 2 (1-5:N,A,P,C,R),
> reason:12 17. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:9, action: 2 (1-5:N,A,P,C,R),
> reason:12 18. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:15, action: 5 (1-5:N,A,P,C,R),
> reason:6 19. 4/29/2004,11:38:12:
> ha_module_swover_action:Module:16, action: 1 (1-5:N,A,P,C,R),
> reason:11 20. 4/29/2004,11:38:12: ha_swover_time_data:Current
> system time step:1,
> hi: 4, lo:-1269897776
> 21. 4/29/2004,11:38:12: ha_swover_time_data:Current system
> time step:2,
> hi: 4, lo:-1269896393
> 22. 4/29/2004,11:38:12: ha_swover_time_data:Current system
> time step:3,
> hi: 4, lo:-1269896215
> 23. 4/29/2004,11:38:12: ha_swover_time_data:Current system
> time step:4,
> hi: 4, lo:-1269890878
>
> Module 4 Log:
> Reset Count: 1
> Reset History: Mon Sep 8 2003, 00:24:14
>
>
> Module 5 Log:
> Reset Count: 1
> Reset History: Mon Sep 8 2003, 00:24:14
>
>
> Module 15 Log:
> Reset Count: 2
> Reset History: Thu Apr 29 2004, 11:39:59
> Mon Sep 8 2003, 00:24:09
>
>
> Module 16 Log:
> Reset Count: 1
> Reset History: Mon Sep 8 2003, 00:24:09
>
> 6509C-SUP-hz> (enable)
>
>
> 6509C-SUP-hz> (enable) sh udld
> UDLD : disabled
> Message Interval : 15 seconds
>
>
> idc6509C-SUP-hz> (enable) sh port 1/1
> * = Configured MAC Address
>
> Port Name Status Vlan Duplex Speed Type
> ----- -------------------- ---------- ---------- ------ -----
> ------------
> 1/1 to_M160 connected 12 full 1000 1000-LX/LH
>
>
> Port Security Violation Shutdown-Time Age-Time Max-Addr Trap
> IfIndex
> ----- -------- --------- ------------- -------- -------- --------
> -------
> 1/1 disabled shutdown 0 0 1 disabled
> 4
>
> Port Num-Addr Secure-Src-Addr Age-Left Last-Src-Addr
> Shutdown/Time-Left
> ----- -------- ----------------- -------- -----------------
> ------------------
> 1/1 0 - - - -
> -
>
> Port Flooding on Address Limit
> ----- -------------------------
> 1/1 Enabled
>
> Port Broadcast-Limit Multicast Unicast Total-Drop Action
>
> -------- --------------- --------- ------- --------------------
> ------------
> 1/1 - - - 0
> drop-packets
>
> Port Send FlowControl Receive FlowControl RxPause TxPause
> admin oper admin oper
> ----- -------- -------- --------- --------- ---------- ----------
> 1/1 desired on off off 0 0
>
> Port Status Channel Admin Ch
> Mode Group Id
> ----- ---------- -------------------- ----- -----
> 1/1 connected auto silent 33 0
>
> Port Status ErrDisable Reason Port ErrDisableTimeout
> Action on
> Timeout
> ---- ---------- ------------------- ----------------------
> -----------------
> 1/1 connected - Enable
> No Change
>
> Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize
> ----- ---------- ---------- ---------- ---------- ---------
> 1/1 0 0 0 0 0
>
> Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen Runts
> Giants
> ----- ---------- ---------- ---------- ---------- --------- ---------
> ---------
> 1/1 0 0 0 0 0 0
> 0
>
> Port Last-Time-Cleared
> ----- --------------------------
> 1/1 Mon May 24 2004, 11:07:53
>
> Idle Detection
> --------------
> --
>
> 6509C-SUP-hz> (enable)
>
> /////////////////////////////
>
> The log on MSFC shows some other VLAN interface on 6509 flap
> between up
> & down frequently, but no record on GE interface connecting M160.
>
>
>
>
>
>
> -----Original Message-----
> From: Todd, Douglas M. [mailto:DTODD at PARTNERS.ORG]
> Sent: Tuesday, May 25, 2004 11:16 PM
> To: Joe Shen
> Cc: cisco-nsp at puck.nether.net; jshen at christmas.996.org
> Subject: RE: [nsp] Catalyst6509 GE interface hang without any
> indication
>
>
> Just some considerations:
>
>
> 1) do a {sh log}
> This might help with the debugging situation. Pay attention
> to the nvram
> log and the Active NMP log. They may help you. 2)If I understand the
> problem correctly you have a GIG E hang - not pass any traffic?
> you have the following: M160-GIG port->port GIG 6500, right?
> 3) These devices are not trunking (just to make sure)
> A: not channeling
> b: not running udld (see 3)
> c: Trunking on the port is off
>
> 3) Are you running udld on the 6500 or on the m160?
> Sounds like a udld problem where you are xmit on side tx on the
> other. This can keep the port up/up and make it seem like it hangs...
> {sh udld}
> {sh udld port x/y}
>
> 4) do a {show port counters} {sh port x/y} on the 6500 and see if you
> find any counters other than the usual mib increasing. You don't have
> any align/fcs/xmit/rcv/ drops on the interface?
>
> 5) no the next reload you could run a diag complete to check for
> hardware issues.
>
> 6) What is the state of both interfaces when they are hung? Do the
> counters increase? Which ones increase on both sides?
>
> Just some thoughts...
>
> ==DMT>
>
>
-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0.3
iQA/AwUBQLSyS8TXy2QW1R3hEQJmbgCbBs1B1WGvGeA4vhXgvWHlNDHC9WgAnjIB
y2plH/RAjeHNmfRoBX5xw6x+
=/YFE
-----END PGP SIGNATURE-----
More information about the cisco-nsp
mailing list