[cisco-voip] Analyzing registration failures after the fact

Domain Admin daniel at scopsblog.com
Mon Jul 15 11:07:50 EDT 2013


Hey folks,

In short, how can I troubleshoot IP phones failing to register to
CallManager over a day after the issue was resolved?  Background below.
 CallManager version is 8.6.

I just started following this mailing list last week.  I've been learning
UCM administration for about a year, and I inherited a 5-node cluster to
worry about around six months ago.  I'm doing okay with the day-to-day
stuff, but there's still a lot of maintenance and administration stuff that
really need to learn more about.

Our cluster runs on two ESX clusters (1 publisher and 2 subs on 1; 2 subs
on the other for DR), and our engineer asked for my help to bring down the
side with the publisher so he could upgrade ESX from 4.1 to 5.0.

ICM and CVP (also my responsibility...whoo...) were on the same cluster,
and I was mainly worried about those coming back up, so I didn't pay enough
attention to the CM nodes to make sure the phones re-registered.  Luckily,
the call center, which runs off the first subscriber came up fine.

I didn't find out until Sunday afternoon that all except for a handful of
phones at HQ failed to register.  A developer who came in to work on the
weekend noticed that the phones were displaying, "VPN Authentication
Failed".  Sure enough, I could see the phones' IP addresses from the
publisher, but they were listed as Unregistered.  I checked the status
messages on one of the phones' web interfaces, and it reported, "All
Concentrators Failed."  This occurred on all except 2 of about 200 on-site
phones.  The phones that DO connect via VPN all appeared to be registered
by the time I took a look.

I went in to the office on Sunday and manually reset all the phones (**#**)
in order to make sure there was no service disruption on Monday morning,
but now my manager is asking me to find the root cause as-to why this issue
occurred, and I'm not sure where to start.  The logs on the individual
phones don't appear to go back far enough, and I'm not versed enough with
RTMT to know where to look.

Do you guys know where I can look to try and find out what happened here?
 I suspect that it was the Auto-Network Detect feature in the VPN profile
that got me.  I turned it off in one of my troubleshooting steps, but I'd
like to be able to say for sure when I present it to the company.

What sucks is that the CM groups are configured so that Subscriber 1 on the
first cluster should failover to Sub 3 on the second, and Sub 2 (the one
that should have registered the HQ phones) should have failed over to Sub
4, also on the second cluster.  I do not know why the phones would have
failed to register in the first place.

Thanks!
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20130715/a9ad9363/attachment.html>


More information about the cisco-voip mailing list