[cisco-voip] Trace files for CUCM/NTP problem

Tue Dec 14 11:15:58 EST 2010

Wes,

This is what one of the pubs looks like:
admin:utils ntp status
ntpd (pid 14883) is running...

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     LOCAL(0)        10 l   21   64    0    0.000    0.000 4000.00
 10.0.1.132      .STEP.          16 u   77   64    0    0.000    0.000 4000.00
 10.0.1.140      .STEP.          16 u  108   64    0    0.000    0.000 4000.00
 10.0.1.148      .STEP.          16 u  117   64    0    0.000    0.000 4000.00

>From my googling, the "reach" field is a bitmap that shows the
connectivity for the last 8 polling updates. 377 is the normal bit
map. Zero is obviously bad. I can watch these values progress from 0
up to 377 and then everything synchs back up. This happens on a a
somewhat periodic basis. What isn't clear to me yet, is this call
manager restarting the synchronization because it was getting values
it didn't like, or was it truly completely losing connectivity with
NTP for 8ish minutes (64 sec polling interval).

I'm running utils network capture port 123 on one of the pubs -
perhaps this will indicate if the connectivity is dropping.

I would point CUCM to a different NTP source for troubleshooting but I
don't have any maintenance windows to reset all the servers until
after the holidays, so trying to attack it from a different angle for
now.

On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <wsisk at cisco.com> wrote:
> what version of CM?  Many changes of NTP especially this one:
> CSCsk70971    publisher NTP down if configured NTP down or unreliable
>
> my interpretation:
> something on the network NTP source changed
> now subscribers giving error that pub is unreliable
>
> this is expected if pub cannot sync to NTP source. what changes did they
> make? it is still a viable NTP source for hte publisher? if not, publisher
> will use local clock which makes it an invalid source for all subs.
>
> http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636
>
> /Wes
>
> Ed Leatherman wrote:
>
> Hi folks,
>
> Our operations team updated the NTP service recently (infoblox), and
> right after that happened, I started getting syslog errors per below
> on two different CUCM 7 clusters, both of which use that NTP server.
>
> ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
> inaccessible or down. Verify the network between the primary and
> secondary nodes.  Check the status of NTP on both the primary and
> secondary nodes via CLI 'utils ntp status'.  If the network is fine,
> try restarting NTP using CLI 'utils ntp restart'.
>
> Looking at the status on these servers, the pub looks OK but the subs show:
> utils ntp status on all secondary nodes comes up with (example):
>      remote           refid      st t when poll reach   delay   offset
> jitter
> ==============================================================================
> *127.127.1.0     LOCAL(0)        10 l   32   64  377    0.000    0.000
> 0.004
>  10.192.20.10    .STEP.          16 u  488  512  376    0.244   16.553
> 0.052
>
> Restarting NTP on all nodes fixes the problem temporarily (NTP status
> goes back to normal) but only for a short time.
>
> The NTP logs don't show anything other than what appears to be the NTP
> service restarting every 30 minutes.. is this normal?
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|          Running sd_ntp. Process Id=12302
>                |<LVL::Info>
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp||<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list
> -s|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not
> changed|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>
>
> In both clusters, the pub and most or all of the subs are on the same
> VLAN and physical switch.
>
> What other traces can I look at on CM to troubleshoot this? Anyone
> know if there is a debug for the process that's generating my syslog
> errors?
>
> I want to make sure it's not an error on my end and hopefully have
> some better information on whats broke before I go back to the
> operations group. All the IOS routers using infoblox for NTP appear to
> be working just fine, so they see no problems :)
>
> Thanks in advance!
>
>

-- 
Ed Leatherman