Ed,<br><br>One more thing. From all of your debug output here we can see something different is happening with the "RefID" field.<br><br>The "RefID" field shows us the next upstream hop past the server we're pointing to. Consider something like this:<br>
<br>Note - don't ever point to this server - just using it for example.<br><br><a href="http://time.nist.gov">time.nist.gov</a> (Stratum 1) <-- Your Local NTP Master (Stratum 2) <-- CUCM Pub (Stratum 3) <-- CUCM Subs (Stratum 4)<br>
<br>So on your CUCM pub in this instance we would see:<br><br>Remote: Local NTP Master<br>Stratum: 2<br>RefID: <a href="http://time.nist.gov">time.nist.gov</a><br><br>This tells us your CUCM server is pointing to your local NTP server. Your local NTP master is just 1 hop away from the root server, <a href="http://time.nist.gov">time.nist.gov</a>.<br>
<br>In your case though, you see Stratum 16. 16 is like the infinite route in RIP. NTP is throwing it's hands in the air and saying "I have no idea what the hell the time it is".<br><br>Further, it's saying it's pointing at server ".STEP." This is a special keyword that means the time is off by further than NTP can adjust for in one single shot.<br>
<br>This seems to make sense with NTP restarting every 30 minutes. If we're further out of sync than NTP can correct for, then we need to restart NTP so the ntponeshot command can run to step the clock forwards or backwards by several seconds or minutes at a time. Something NTP can't do alone without the restart.<br>
<br>Here is an interesting experiment to find out what's happening. Remove the NTP server entries from CUCM right at the start of the hour. Wait 4 hours. At the start of the next 4 hours, find out how far off the CUCM clock is from your watch (or PC clock) via "show status". Then look at the NTP server and find out if it's time is still matching up with what you expect.<br>
<br>What is probably causing this is a hardware clock on the CUCM server (motherboard) or the NTP server, drifting faster than NTP can correct for.<br><br>NTP can correct for errors of 500 parts per million. This is something like 43 seconds drift in 24 hours. If after 4 hours your CUCM server clock is off by more than 7 seconds - then you need a new motherboard. It might be best to wait 24 hours instead of 4 hours, just to get a more accurate idea of how fast your clock might be drifting.<br>
<br>I'd be really interested to see what you find. I've replaced a few motherboards in IBM servers for this exact problem.<br><br>-Burns<br><br><div class="gmail_quote">On Tue, Dec 14, 2010 at 11:23 AM, Ed Leatherman <span dir="ltr"><<a href="mailto:ealeatherman@gmail.com">ealeatherman@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">I was about to say IOS devices are OK.. but i noticed the poll value<br>
was 64 on the one I was reviewing... on a stable environment that<br>
should be 1024 "steady-state" it sounds like. Some mischief is afoot.<br>
Thanks for the NTP tips.<br>
<div><div></div><div class="h5"><br>
On Mon, Dec 13, 2010 at 7:12 PM, Jason Aarons (US)<br>
<<a href="mailto:jason.aarons@us.didata.com">jason.aarons@us.didata.com</a>> wrote:<br>
> Have you pointed a different router/switch to your NTP server? Are they<br>
> getting 16 as well? I recall a high offset/variation from clock can also<br>
> make it 16.<br>
> A IOS device initially polls every 64ms, as the NTP server and client are<br>
> better synced and there aren't dropped packets, this number increases to a<br>
> maximum of 1024<br>
> <a href="http://www.nil.si/ipcorner/BeOnTime/" target="_blank">http://www.nil.si/ipcorner/BeOnTime/</a><br>
> <a href="http://www.cisco.com/en/US/products/sw/iosswrel/ps1818/products_tech_note09186a008015bb3a.shtml" target="_blank">http://www.cisco.com/en/US/products/sw/iosswrel/ps1818/products_tech_note09186a008015bb3a.shtml</a><br>
> �while the highest level (stratum 16) usually indicates that the clock is<br>
> not working or unaccessible�<br>
> From: <a href="mailto:cisco-voip-bounces@puck.nether.net">cisco-voip-bounces@puck.nether.net</a><br>
> [mailto:<a href="mailto:cisco-voip-bounces@puck.nether.net">cisco-voip-bounces@puck.nether.net</a>] On Behalf Of Jason Burns<br>
> Sent: Monday, December 13, 2010 6:57 PM<br>
> To: Wes Sisk<br>
> Cc: Cisco VOIP<br>
> Subject: Re: [cisco-voip] Trace files for CUCM/NTP problem<br>
> Ed,<br>
> CUCM is preferring the local clock, because your NTP reference has a Stratum<br>
> of 16!<br>
> � �.STEP. � � � � �16 u �488 �512 �376 � �0.244 � 16.553<br>
> 0.052<br>
> Fix your NTP server� and you'll fix your CUCM.<br>
> -Burns<br>
> On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <<a href="mailto:wsisk@cisco.com">wsisk@cisco.com</a>> wrote:<br>
> what version of CM?� Many changes of NTP especially this one:<br>
> CSCsk70971��� publisher NTP down if configured NTP down or unreliable<br>
> my interpretation:<br>
> something on the network NTP source changed<br>
> now subscribers giving error that pub is unreliable<br>
> this is expected if pub cannot sync to NTP source. what changes did they<br>
> make? it is still a viable NTP source for hte publisher? if not, publisher<br>
> will use local clock which makes it an invalid source for all subs.<br>
> <a href="http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636" target="_blank">http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636</a><br>
> /Wes<br>
> Ed Leatherman wrote:<br>
> Hi folks,<br>
> Our operations team updated the NTP service recently (infoblox), and<br>
> right after that happened, I started getting syslog errors per below<br>
> on two different CUCM 7 clusters, both of which use that NTP server.<br>
> ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently<br>
> inaccessible or down. Verify the network between the primary and<br>
> secondary nodes.� Check the status of NTP on both the primary and<br>
> secondary nodes via CLI 'utils ntp status'.� If the network is fine,<br>
> try restarting NTP using CLI 'utils ntp restart'.<br>
> Looking at the status on these servers, the pub looks OK but the subs show:<br>
> utils ntp status on all secondary nodes comes up with (example):<br>
> ���� remote���������� refid����� st t when poll reach�� delay�� offset<br>
> jitter<br>
> ==============================================================================<br>
> *���� LOCAL(0)������� 10 l�� 32�� 64� 377��� 0.000��� 0.000<br>
> 0.004<br>
> ���� .STEP.��������� 16 u� 488� 512� 376��� 0.244�� 16.553<br>
> 0.052<br>
> Restarting NTP on all nodes fixes the problem temporarily (NTP status<br>
> goes back to normal) but only for a short time.<br>
> The NTP logs don't show anything other than what appears to be the NTP<br>
> service restarting every 30 minutes.. is this normal?<br>
> 11/16/2010 23:00:02<br>
> sd_ntp|*********************************************************|<LVL::Info><br>
> 11/16/2010 23:00:02 sd_ntp|��������� Running sd_ntp. Process Id=12302<br>
> �������������� |<LVL::Info><br>
> 11/16/2010 23:00:02<br>
> sd_ntp|*********************************************************|<LVL::Info><br>
> 11/16/2010 23:00:02 sd_ntp||<LVL::Info><br>
> 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list<br>
> -s|<LVL::Info><br>
> 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug><br>
> 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not<br>
> changed|<LVL::Debug><br>
> 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug><br>
> 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info><br>
> In both clusters, the pub and most or all of the subs are on the same<br>
> VLAN and physical switch.<br>
> What other traces can I look at on CM to troubleshoot this? Anyone<br>
> know if there is a debug for the process that's generating my syslog<br>
> errors?<br>
> I want to make sure it's not an error on my end and hopefully have<br>
> some better information on whats broke before I go back to the<br>
> operations group. All the IOS routers using infoblox for NTP appear to<br>
> be working just fine, so they see no problems :)<br>
> Thanks in advance!<br>
> _______________________________________________<br>
> cisco-voip mailing list<br>
> <a href="mailto:cisco-voip@puck.nether.net">cisco-voip@puck.nether.net</a><br>
> <a href="https://puck.nether.net/mailman/listinfo/cisco-voip" target="_blank">https://puck.nether.net/mailman/listinfo/cisco-voip</a><br>
> ________________________________<br>
> Disclaimer: This e-mail communication and any attachments may contain<br>
> confidential and privileged information and is for use by the designated<br>
> addressee(s) named above only. If you are not the intended addressee, you<br>
> are hereby notified that you have received this communication in error and<br>
> that any use or reproduction of this email or its contents is strictly<br>
> prohibited and may be unlawful. If you have received this communication in<br>
> error, please notify us immediately by replying to this message and deleting<br>
> it from your computer. Thank you.<br>
> _______________________________________________<br>
> cisco-voip mailing list<br>
> <a href="mailto:cisco-voip@puck.nether.net">cisco-voip@puck.nether.net</a><br>
> <a href="https://puck.nether.net/mailman/listinfo/cisco-voip" target="_blank">https://puck.nether.net/mailman/listinfo/cisco-voip</a><br>
<font color="#888888">Ed Leatherman<br>