[c-nsp] Very strange ME3600 err-disabled on Te0/1 Te0/2 problem

Thu Oct 1 12:37:06 EDT 2015

Does it affect both Te ports at EXACTLY the same time? Or does one go
down then a few seconds alter the other? If it's both at the same
time, that smells a bit like an ASIC issue, as Adam said they share an
ASIC for the TenG ports.

When you say this has happened on 5 switches, all 5 were ME3600-X's?

Has it ever happened on more than one switch at the same time, two
neighbouring switches for example have had all their TenG interface
cease to function at the same time?

Do you have out of band access to these PoPs? When you talk about
getting TAC via WebEx and rebooting the switches I assume you do, is
there anything else on the switch that isn't working? I.e all the 1G
ports (or however your customers are connected) are working? BGP
sessions up etc?

Do any of these PoPs where you have had the issue have low enough
traffic you could switch to a 1G link between them to trial for a
while?

Adam pointed out the RPs could be too busy to service the BFD
requests, is the CPU high when the issue occurs? Are you able to
mirror the ports (or SPAN) at one of the PoPs on the 10G ports, to see
if any traffic is actually coming over from the neighbouring PoP, or
being sent from the local switch?

You said the switches seem to fail randomly, can you increase your NMS
polling (since this sounds like a bug being triggered) to correlate
the interface shutdowns to something like a spike in traffic, spike in
latency between PoPs, IGP/BGP update/topology change coming through,
LDP/RSVP update coming through?

Cheers,
James.