[c-nsp] 12810 oddness for halloween =)

Eninja eninja at gmail.com
Sat Oct 30 17:54:23 EDT 2010


Drew,

Your limited logs indicate the following;

1 - CSC in slot 17 is analyzed (OOC, was it recently inserted?)
2 - Bit mask of 0x7F indicates presence of all fabric cards 
3 - CSC0 (slot 16) is primary & fabric clock is redundant i.e the two CSCs are online and working fine.
4 - We receive a FIA HALT, in this case from the RP in slot 8.
5 - As designed, this triggers a CSC switchover because of fears that if the primary CSC is compromised, all traffic traversing the system will be compromised.

The question now is whether or not an actual CSC switchover occurred. If a switchover occurred, traffic should be passing fine and the failed CSC will now be the backup. If traffic is halted, the failed CSC is still the primary and would need to be removed. 

See http://bit.ly/ceqnCA for more.

Feel free to unicast the following if you still need a hand.

-sh contr fia (from RP) - twice
-sh contr fia (from an "attach" session to any two LCs in slots 0 - 7) - twice
-sh diag
-sh log

eninja

www.multiven.com - simple, efficient & affordable maintenance for all networks™



On Oct 30, 2010, at 9:57 AM, Drew Weaver <drew.weaver at thenap.com> wrote:

> So,
> 
> Oct 30 11:29:55 EDT: %MBUS-6-FABANALYZED: Switch card in slot 17 analyzed
> Oct 30 11:29:55 EDT: %MBUS-6-FABCONFIG: Switch Cards 0x7F (bitmask)
> Primary Clock is CSC_0
> Fabric Clock is Redundant
> Bandwidth Mode : 40Gbps Bandwidth
> Oct 30 11:29:55 EDT: %FABRIC-3-ERR_HANDLE: Force CSC switchover on error FIA HALT from slot 8
> Oct 30 11:29:55 EDT: %FABRIC-3-ERR_HANDLE: Primary CSC switched over to slot 16
> 
> Didn't look like it was traffic impacting, and that is great news!
> 
> The not so great news is, after this happened if I send SNMP requests to the router:
> 
> [root at d3 ~]# /usr/local/nagios/libexec/check_ifstatus -C comm -H lo0.ip.add.ress
> ERROR: No snmp response from lo0.ip.add.ress (alarm timeout)
> 
> [root at d3 ~]# /usr/local/nagios/libexec/check_ifstatus -C comm -H other.interface.ip.address
> OK: host 'other.interface.ip.address', interfaces up: 7, down: 0, dormant: 0, excluded: 0, unused: 0 |up=7,down=0,dormant=0,excluded=0,unused=0
> 
> The SNMP requests to the lo0 ip address don't always time out but if they do work they are always about 10 seconds slower than the ones to the normal interfaces.
> 
> Prior to this CSC switchover, this nagios plugin and everything worked fine.
> 
> My concern is not so much that I can't send SNMP requests to the loopback IP on the router, but what other traffic could be similarly impacted in it's current state.
> 
> I'm running ver: 12.0(33)S5 which isn't super old but should be stable.
> 
> Has anyone seen anything like this before or am I "just lucky?"
> 
> -Drew
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/



More information about the cisco-nsp mailing list