[c-nsp] 7204vxr freeze-up question
Adam Greene
maillist at webjogger.net
Fri Aug 31 10:05:48 EDT 2007
Rodney,
Thanks. I appreciate the follow-up.
The show int was from g2/0 because it was originally freezing up while the
card was in that slot. We moved it to g3/0 and it kept on freezing up. I
took the show controller reading after we had moved it to that slot.
I can consistently trigger this error by performing the load test for about
90 seconds or longer (enough time for traffic to build up to pretty high
levels).
I have not been able to upgrade the router to 12.4(16) yet because I don't
have a large enough flash card for the image. I still think that will be a
good test.
Unfortunately, the router is in production, which limits my ability to
perform testing. However, since the router was recently acquired, I have
gotten a replacement 7204VXR from the vendor and will do some tests with
that to see if the problem duplicates itself.
As soon as I have more results to share, I will.
Best regards,
Adam
----- Original Message -----
From: "Rodney Dunn" <rodunn at cisco.com>
To: "Adam Greene" <maillist at webjogger.net>
Cc: <cisco-nsp at puck.nether.net>
Sent: Friday, August 31, 2007 9:23 AM
Subject: Re: [c-nsp] 7204vxr freeze-up question
> You did a sh controller for 3/0 but your 'sh int' was from
> 2/0.
>
> It's hard to know all those controller counters without going
> and looking at the code for that driver.
>
> But, suffice to say that the interface should never lock
> up and have to be bounced to forward traffic or receive traffic.
> If it does it's a bug.
>
> Now, also have to make sure the bounce isn't causing some
> other device to clear and not this one.
>
> I'd suggest capturing a 'sh controller' before a couple of times
> and then after we think it's "hung". Capture it multiple times
> after.
>
> Was this in a lab?
> Can you trigger it every time?
>
> Rodney
>
>
>
> On Wed, Aug 22, 2007 at 02:43:56PM -0400, Adam Greene wrote:
>> Here's output from a "sh controller" during the outage state:
>>
>> Interface GigabitEthernet3/0(idb 0x6363B6DC)
>> Hardware is WISEMAN 2.1, network connection mode is auto
>> network link is up
>> loopback type is none
>> startup time: 176602 usec
>> GBIC type is 1000BaseSX
>> idb->lc_ip_turbo_fs=0x606372F4, ip_routecache=0x11(dfs=0/mdfs=0),
>> max_mtu=1528
>> fx1000_ds(tx)=0x6363CE6C(0x6363CE6C),
>> registers(tx)=0x3D800000(0x3D800000), cu
>> rr_intr=0
>> rx cache size=2000, rx cache end=1872, rx_nobuffer=0
>> FX1000 registers:
>> CTRL =0x18180005, STATUS=0x0000000F
>> FCAL =0x00C28001, FCAH =0x00000100, FCT =0x00008808, FCTTV
>> =0x000016E3
>> RCTL =0x00428032, RDBAL0=0x2000B000, RDBAH0=0x00000000,
>> RDLEN0=0x00000800
>> RDH0 =0x00000038, RDT0 =0x00000037, RDTR0 =0x00000000, IMS
>> =0x000002D6
>> TCTL =0x000400FA, TIPG =0x00A0080A, TQC =0x00000000, TDBAL
>> =0x2000C000
>> TDBAH =0x00000000, TDLEN =0x00001000, TDH =0x000000BA, TDT
>> =0x000000BA
>> TXCW =0xC00001A0, RXCW =0xCC0041A0, FCRTL =0x80001200, FCRTH
>> =0x0000AFF0
>> RDFH =0x000014D7, RDFT =0x000014D7, TDFH =0x000003A7, TDFT
>> =0x000003A7
>> RX=normal, enabled TX=normal, enabled
>> Device status=full-duplex, link up, tx clock, rx clock
>> AN status=done(RF:0 , PAUSE:3 ), SYNC'ed, rx idle stream, rx invalid
>> symbols,
>> rx idle char
>> GBIC registers:
>> Register 0x00: 01 07 01 00 00 00 01 00
>> Register 0x08: 00 00 00 01 0D 00 00 00
>> Register 0x10: 32 16 00 00 41 47 49 4C
>> Register 0x18: 45 4E 54 20 20 20 20 20
>> Register 0x20: 20 20 20 20 00 00 00 00
>> Register 0x28: 51 46 42 52 2D 35 36 38
>> Register 0x30: 39 20 20 20 20 20 20 20
>> Register 0x38: 30 30 30 30 00 00 00 58
>> Register 0x40: 00 1A 00 00 30 31 31 30
>> Register 0x48: 31 36 30 38 32 36 34 31
>> Register 0x50: 38 36 34 35 30 31 31 30
>> Register 0x58: 31 36 30 30 00 00 00 D8
>> PartNumber: QFBR-5689
>> PartRev: F
>> SerialNo: 0110160826418645
>> Options: 0
>> Length(9um/50um/62.5um): 000/500/220
>> Date Code: 01101600
>> Gigabit Ethernet Codes: 1
>> PCI configuration registers:
>> bus_no=6, device_no=0
>> DeviceID=0x1000, VendorID=0x8086, Command=0x0116, Status=0x0200
>> Class=0x02/0x00/0x00, Revision=0x03, LatencyTimer=0xFC,
>> CacheLineSize=0x10
>> BaseAddr0=0x49000004, BaseAddr1=0x00000000, MaxLat=0x00, MinGnt=0xFF
>> SubsysDeviceID=0x1000, SubsysVendorID=0x8086
>> Cap_Ptr=0x00000000 Retry/TRDY Timeout=0x00000000
>> PMC=0x00210001 PMCSR=0x00000000
>> Software MAC address filter(hash:length/addr/mask/hits):
>> need_af_check = 0
>> 0x00: 0 ffff.ffff.ffff 0000.0000.0000 0
>> 0xC0: 0 0100.0ccc.cccc 0000.0000.0000 0
>> 0xD0: 0 0007.8420.e854 0000.0000.0000 0
>> FX1000(type=0x98) Internal Statistics:
>> rxring(128)=0x2000B000, shadow=0x6363D310, head=56, rx_buf_size=512
>> txring(256)=0x2000C000, shadow=0x6363D53C, head=186, tail=186
>> tx_int_txdw=0, tx_int_txqe=0, rx_int_rxdmt0=0, rx_int_rxt0=0
>> tx_count=0, txring_full=0, rx_max=0, filtered_pak=0
>> rx_overrun=0, rx_seq=0, reg_read=0, reg_write=0
>> rx_count=128, throttled=1, enabled=1, disabled=1
>> rx_no_enp=0, rx_discard=0, link_reset=0, pci_rev=3
>> tbl_overflow=0, chip_state=2, tx_nonint_done=0, tx_limited=0
>> reset=5(init=0, check=0, restart=4, pci=0), auto_restart=1
>> tx_carrier_loss=1, fatal_tx_err=0, tx_stucks_count=1
>> isl_err=0, wait_for_last_tdt=0, ctrl=18800005, ctrl0=18900005
>> rx_stucks_count=2, rdtr_fpd=3
>> HW addr filter: 0x6363DD68, ISL disabled, Promiscuous mode multicast
>> Entry= 0: Addr=0007.8420.E854
>> Entry= 1: Addr=0000.0000.0000
>> Entry= 2: Addr=0000.0000.0000
>> Entry= 3: Addr=0000.0000.0000
>> Entry= 4: Addr=0000.0000.0000
>> Entry= 5: Addr=0000.0000.0000
>> Entry= 6: Addr=0000.0000.0000
>> Entry= 7: Addr=0000.0000.0000
>> Entry= 8: Addr=0000.0000.0000
>> Entry= 9: Addr=0000.0000.0000
>> Entry=10: Addr=0000.0000.0000
>> Entry=11: Addr=0000.0000.0000
>> Entry=12: Addr=0000.0000.0000
>> Entry=13: Addr=0000.0000.0000
>> Entry=14: Addr=0000.0000.0000
>> Entry=15: Addr=0000.0000.0000
>> FX1000 Statistics (PA3)
>> CRC error 0 Symbol error 0
>> Missed Packets 0 Single Collision 0
>> Excessive Coll 0 Multiple Coll 0
>> Late Coll 0 Collision 0
>> Defer 497 Receive Length 0
>> Sequence Error 0 XON RX 0
>> XON TX 0 XOFF RX 0
>> XOFF TX 0 FC RX Unsupport 0
>> Packet RX (64) 52 Packet RX (127) 289
>> Packet RX (255) 0 Packet RX (511) 5
>> Packet RX (1023) 0 Packet RX (1522) 433425
>> Good Packet RX 949328 Broadcast RX 46180
>> Multicast RX 32953 Good Packet TX 0
>> Good Octets RX.H 0 Good Octets RX.L 657160659
>> Good Octets TX.H 0 Good Octets TX.L 334817282
>> RX No Buff 0 RX Undersize 0
>> RX Fragment 0 RX Oversize 0
>> RX Octets High 0 RX Octets Low 657160659
>> TX Octets High 0 TX Octets Low 334817282
>> TX Packet 237515 RX Packet 433771
>> TX Broadcast 18 TX Multicast 1
>> Packet TX (64) 31 Packet TX (127) 18042
>> Packet TX (255) 20 Packet TX (511) 34
>> Packet TX (1023) 5 Packet TX (1522) 219383
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: "Adam Greene" <maillist at webjogger.net>
>> To: <cisco-nsp at puck.nether.net>
>> Sent: Wednesday, August 22, 2007 2:14 PM
>> Subject: Re: [c-nsp] 7204vxr freeze-up question
>>
>>
>> > Thanks, Chuck and Rodney.
>> >
>> > CEF is enabled. I'm sending about 95Mbps of 1470-byte UDP packets to
>> > the
>> > PA-GE interface, then the dueling gateways is trying to push that
>> > traffic
>> > even higher. The router is connected to a radio that can only do
>> > 100Mbps,
>> > so
>> > there's no chance of traffic exceeding 100Mbps in either direction.
>> >
>> > I'll try 12.4(16) and see what happens.
>> >
>> > Here's the 'show int' during the outage condition (particularly
>> > worrisome
>> > to
>> > me are the 18 interface resets, all caused by the test. There are also
>> > a
>> > lot
>> > of output drops, which is understandable, 1 no buffer and 1 throttle):
>> >
>> > r2#sh int g2/0
>> >
>> > GigabitEthernet2/0 is up, line protocol is up
>> >
>> > Hardware is WISEMAN, address is 0007.8420.e838 (bia 0007.8420.e838)
>> >
>> > Description: *** Wireless Network Mgmt VLAN ***
>> >
>> > MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
>> >
>> > reliability 255/255, txload 15/255, rxload 24/255
>> >
>> > Encapsulation 802.1Q Virtual LAN, Vlan ID 1., loopback not set
>> >
>> > Keepalive set (10 sec)
>> >
>> > Unknown duplex, Unknown Speed, link type is autonegotiation, media
>> > type
>> > is
>> > SX
>> >
>> > output flow-control is on, input flow-control is on
>> >
>> > ARP type: ARPA, ARP Timeout 04:00:00
>> >
>> > Last input 00:00:00, output 00:00:00, output hang never
>> >
>> > Last clearing of "show interface" counters never
>> >
>> > Input queue: 11/75/0/0 (size/max/drops/flushes); Total output drops:
>> > 1173852
>> >
>> > Queueing strategy: fifo
>> >
>> > Output queue: 0/40 (size/max)
>> >
>> > 30 second input rate 96026000 bits/sec, 7955 packets/sec
>> >
>> > 30 second output rate 59502000 bits/sec, 5143 packets/sec
>> >
>> > 29510670 packets input, 2249605461 bytes, 1 no buffer
>> >
>> > Received 1686266 broadcasts, 0 runts, 0 giants, 1 throttles
>> >
>> > 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
>> >
>> > 0 watchdog, 1513394 multicast, 611 pause input
>> >
>> > 0 input packets with dribble condition detected
>> >
>> > 30538392 packets output, 3381947542 bytes, 0 underruns
>> >
>> > 0 output errors, 0 collisions, 18 interface resets
>> >
>> > 0 babbles, 0 late collision, 0 deferred
>> >
>> > 3 lost carrier, 0 no carrier, 2 pause output
>> >
>> > 0 output buffer failures, 0 output buffers swapped out
>> >
>> >
>> > I don't have a 'show controller' during the outage condition. I'll try
>> > to
>> > obtain it.
>> >
>> > I agree, before we swap the router, we should do some more normal tests
>> > and
>> > see if the problem persists even then at 80Mbps in / 40Mbps out levels.
>> >
>> > Thanks,
>> > Adam
>> >
>> >
>> >
>> >
>> >
>> > ----- Original Message -----
>> > From: "Rodney Dunn" <rodunn at cisco.com>
>> > To: "Adam Greene" <maillist at webjogger.net>
>> > Cc: <cisco-nsp at puck.nether.net>
>> > Sent: Wednesday, August 22, 2007 12:13 PM
>> > Subject: Re: [c-nsp] 7204vxr freeze-up question
>> >
>> >
>> >> Can you get it in that condition and get a 'sh controller' and
>> >> 'show int'?
>> >>
>> >> It sounds like the ingress rx driver is locking up.
>> >>
>> >> Try the latest 12.4 mainline code (12.4(16)) if you have it in the
>> >> lab and see if it's there too.
>> >>
>> >> Rodney
>> >>
>> >> On Wed, Aug 22, 2007 at 11:45:29AM -0400, Adam Greene wrote:
>> >>> Hmm.
>> >>>
>> >>> Upgraded router to 12.3(23).
>> >>>
>> >>> Even after the upgrade, passing 82.3Mbps in / 42.5Mbps out over the
>> >>> 7204VXR's PA-GE interface (plus 1000BaseSX GBIC) causes the interface
>> >>> to
>> >>> stop passing traffic.
>> >>>
>> >>> Reseating the GBIC does not rectify the issue. However, reseating the
>> >>> PA-GE
>> >>> card does.
>> >>>
>> >>> Tried moving the PA-GE card from slot 2 to slot 3 (different PCI bus)
>> >>> and
>> >>> the problem still occurs.
>> >>>
>> >>> Tried with a different PA-GE card. Problem still occurs.
>> >>>
>> >>> I'll try with another GBIC, but that seems unlikely to resolve the
>> >>> issue.
>> >>>
>> >>> It's sounding like I may need to replace this router. Ugh.
>> >>>
>> >>> If anyone has any bright ideas, they are welcome.
>> >>>
>> >>> Thanks,
>> >>> Adam
>> >>>
>> >>>
>> >>> ----- Original Message -----
>> >>> From: "Adam Greene" <maillist at webjogger.net>
>> >>> To: "Masood Ahmad Shah" <masood at nexlinx.net.pk>;
>> >>> <cisco-nsp at puck.nether.net>
>> >>> Sent: Friday, August 17, 2007 10:22 AM
>> >>> Subject: Re: [c-nsp] 7204vxr freeze-up question
>> >>>
>> >>>
>> >>> > Masood,
>> >>> >
>> >>> > Thanks for the advice. Current IOS is 12.2(13)T16. We'll look into
>> >>> > upgrading
>> >>> > it. I'll have to see what will support the NPE300; we're running
>> >>> > very
>> >>> > few
>> >>> > features, though, so I don't expect to have an issue...
>> >>> >
>> >>> > The GBIC is plugged into a Bridgewave radio; power cycling the
>> >>> > radio
>> >>> > does
>> >>> > not resolve the issue, only cycling the router does, so I think the
>> >>> > issue
>> >>> > is
>> >>> > on the router end. But we'll keep in mind the suggestion.
>> >>> >
>> >>> > Thanks again,
>> >>> > Adam
>> >>> >
>> >>> > ----- Original Message -----
>> >>> > From: "Masood Ahmad Shah" <masood at nexlinx.net.pk>
>> >>> > To: "'Adam Greene'" <maillist at webjogger.net>;
>> >>> > <cisco-nsp at puck.nether.net>
>> >>> > Sent: Wednesday, August 15, 2007 9:19 PM
>> >>> > Subject: RE: [c-nsp] 7204vxr freeze-up question
>> >>> >
>> >>> >
>> >>> >> Well, which IOS version you run?
>> >>> >>
>> >>> >> I know there are some issues with Intel chipset while it gets
>> >>> >> connected
>> >>> >> into
>> >>> >> cisco GBIC. I strongly suggest updating driver of NIC (if there
>> >>> >> is),
>> >>> >> upgrade
>> >>> >> IOS or change your NIC to check it out...
>> >>> >>
>> >>> >>
>> >>> >> Regards,
>> >>> >> Masood Ahmad Shah
>> >>> >>
>> >>> >> -----Original Message-----
>> >>> >> From: cisco-nsp-bounces at puck.nether.net
>> >>> >> [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of Adam
>> >>> >> Greene
>> >>> >> Sent: Wednesday, August 15, 2007 8:43 PM
>> >>> >> To: cisco-nsp at puck.nether.net
>> >>> >> Subject: [c-nsp] 7204vxr freeze-up question
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I'm running into an issue with a 7204VXR/NPE-300 router with 128MB
>> >>> >> RAM.
>> >>> >>
>> >>> >> A 1000Base-SX GBIC is plugged into one of the slots (not sure of
>> >>> >> the
>> >>> >> part
>> >>> >> #
>> >>> >> of the card into which the GBIC plugs).
>> >>> >>
>> >>> >> We were running some dueling gateways speed tests with the router
>> >>> >> (packet
>> >>> >> stream is sent via iPerf to router A, which forwards it to router
>> >>> >> B,
>> >>> >> which
>> >>> >> forwards it back to router A, which forwards it back to router B,
>> >>> >> until
>> >>> >> TTL
>> >>> >> is decremented to 0).
>> >>> >>
>> >>> >> Soon after I start sending 75Mbps - 80Mbps of traffic to the
>> >>> >> router's
>> >>> >> gig
>> >>> >> interface via iPerf, the gig interface stops sending / receiving
>> >>> >> any
>> >>> >> traffic
>> >>> >> whatsoever. The CLI of the router remains up, the gig interface
>> >>> >> reports
>> >>> >> it
>> >>> >> is up / up, memory and cpu utilization remain low. No logs are
>> >>> >> generated.
>> >>> >> Traffic on other interfaces is unaffected. I shut / no shut the
>> >>> >> gigabit
>> >>> >> interface, but traffic still refuses to pass. Only a reload of the
>> >>> >> router
>> >>> >> rectifies the issue.
>> >>> >>
>> >>> >> I wonder if there is a debug command that could provide some
>> >>> >> insight
>> >>> >> into
>> >>> >> the problem. At this point I am suspecting a hardware issue (GBIC,
>> >>> >> card,
>> >>> >> or
>> >>> >> backplane).
>> >>> >>
>> >>> >> Thanks for any insights ....
>> >>> >>
>> >>> >> Adam
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> cisco-nsp mailing list cisco-nsp at puck.nether.net
>> >>> >> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> >>> >> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > cisco-nsp mailing list cisco-nsp at puck.nether.net
>> >>> > https://puck.nether.net/mailman/listinfo/cisco-nsp
>> >>> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> cisco-nsp mailing list cisco-nsp at puck.nether.net
>> >>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> >>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > cisco-nsp mailing list cisco-nsp at puck.nether.net
>> > https://puck.nether.net/mailman/listinfo/cisco-nsp
>> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>> _______________________________________________
>> cisco-nsp mailing list cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>
>
>
>
More information about the cisco-nsp
mailing list