[c-nsp] Strange issues with new WS-X6704-10GE card

Wed Dec 4 19:12:58 EST 2013

Hi,

We are running 7613 IOS 15.2(4)S3a with following hardware:

  1    0  4-subslot SPA Interface Processor-400  7600-SIP-400

  2    0  2 port adapter Enhanced FlexWAN        WS-X6582-2PA

  3    1  1-subslot SPA Interface Processor-600  7600-SIP-600

  4    1  1-subslot SPA Interface Processor-600  7600-SIP-600

  6    0  2 port adapter Enhanced FlexWAN        WS-X6582-2PA

  7    2  Supervisor Engine 720 (Hot)            WS-SUP720-3BXL

  8    2  Supervisor Engine 720 (Active)         WS-SUP720-3BXL

  9    4  CEF720 4 port 10-Gigabit Ethernet      WS-X6704-10GE

 10   20  ESM20G                                 7600-ES20-GE3CXL

 11   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP

 12   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX

 13   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX

We have about 250Watts available.

Couple of days ago we took out another FlexWAN card, and replaced ESM20G
which was in slot 9 with WS-X6704-10GE. While replacing card we got logs
like these (some of these popped up while doing first SSO switchover):

%MFI_LABEL_BROKER-3-MULTIPLE_BIND all over the place, then lots of these

%CPU_MONITOR-SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for
30 seconds [9/0]

GMT+1: %FABRIC-SP-6-TIMEOUT_ERR: Fabric in slot 8 reported timeout error
for channel 3 (Module 9, fabric connection 0)

%CPU_MONITOR-SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for
60 seconds [9/0]

%FABRIC-SP-6-TIMEOUT_ERR: Fabric in slot 8 reported timeout error for
channel 12 (Module 9, fabric connection 1)

 %CPU_MONITOR-SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for
90 seconds [9/0]

%CPU_MONITOR-SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for
120 seconds [9/0]

%ICC-SP-5-HUGE_BUFFER: Class [L2-DRV(FC)] with Request id 38 requested a
huge buffer of Size 47280

%OIR-SP-6-DOWNGRADE_EARL: Module 10 DFC installed is not identical to
system PFC and will perform at current system operating mode.

%IPV6_INTF-4-L2_MISMATCH: High load on interface events (LI-Null0),
auto-recovery complete.

Also lots of LDP/ISIS session went down, etc. It’s really hard to pinpoint
what is the cause.

During first switchover, one of SUP720 showed MAJOR error, while other
cards showed PASS on self test. After that, we did another switchover and
everything went back to normal. No logs, no errors. We thought this is
really strange.

This was couple of nights ago. Then today one of our eBGP sessions (sourced
from SVI interface) started flapping. We noticed no drops/errors on
physical link. Then, we took out MQC policer which was policing
inbound/outbound to 80Mbps, and everything was fine.

Sh policy-map int showed no counters were increasing at all. Other than
that, we have no issues with this router, as far as we can see.

What I would like to know is, could there be some incompatibility issues
with new WS-F6700-CFC we installed? The SVI vlan interface that was having
policer in policy-map was running over WS-X6748-SFP  in slot 11. Could it
be something between this card and the new card installed in slot 9?

Coincidently, this SVI was the only interface on this card with MQC policer
on it, the rest are switchports with all xconnects.

Maybe too many CFC cards installed for SUP720 to handle?

Anything would help, since we don’t know what to look for.

Regards,