[c-nsp] "%HARDWARE-1-TCAM_ERROR: Found error in HFTM TCAM Space and not able to recover the error" + server losing default GW

James S. Smith JSmith at WindMobile.ca
Sat Mar 10 22:40:20 EST 2012


Did the Solaris system have the gateway in the defaultrouter file, or did it need to be added?  

It's possible that it never did have a default gateway, and your local router was doing proxy arp.  I've run into that a few times where a server isn't given the proper gateway but still ends up getting connectivity because the local router is responding to the arps.  Or perhaps someone had added the default route by cli and never added it to the defaultrouter file, and then it somehow got lost.

It's an odd chain of events, but proxy arp should cause issues with the TCAM.


----- Original Message -----
From: Stefan [mailto:netfortius at gmail.com]
Sent: Saturday, March 10, 2012 05:30 PM
To: cisco-nsp at puck.nether.net <cisco-nsp at puck.nether.net>
Subject: [c-nsp] "%HARDWARE-1-TCAM_ERROR: Found error in HFTM TCAM Space and not able to recover the error" + server losing default GW

Problem: solaris server connected to a port on a 3750 switch.

Reported problem: solaris server lost capability to communicate over
the network (checks performed from remote location / different VLAN -
important to know!)

Immediate reaction - network folks engaged: switch investigation
reveals error from $subj:

%HARDWARE-1-TCAM_ERROR: Found error in HFTM TCAM Space and not able to
recover the error

so decision taken to immediately reload the switch

Phase II: switch recovers, no more errors, server still reported
unreachable from monitoring tool; a quick test from within switch
reveals reachability of server from within its own VLAN, though (all
tests = ICMP)!

Phase III: finally server folks involved - reached out to "down"
server via another one, on the same VLAN, connected to the same switch
- found missing gateway on the "down" server (allegedly there for the
last 4xx days of uptime)

Phase III - post-mortem monitoring: no more TCAM errors but also no
more problems (obviously) after re-adding the default GW on the server

What we are missing: test at the time of reported failure in
communication with server did not include an ICMP from within its own
VLAN (as the apparent problem was the error reported on the switch
TCAM)

My question to the audience: having done a little research on old
solaris behavior (as we have it), I found this:

http://www.tek-tips.com/viewthread.cfm?qid=211132

and now I wonder - is it possible that solaris mechanisms of spewing
whatever traffic, in missing the default GW, caused the TCAM issue, or
(and how come) the TCAM issue causing the "disappearance" of the
solaris default GW.

Anybody having experienced the problem described?

***Stefan
_______________________________________________
cisco-nsp mailing list  cisco-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/



More information about the cisco-nsp mailing list