[c-nsp] c877 and ntp oddness

Tue Jul 14 12:34:06 EDT 2009

Have a bizarre NTP issue with 877 routers running 12.4(T) train.

Have a simple network setup such:

   [HUB]---[S2 NTPD]-->[S1 NTPD]
  /  |  \
[S] [S] [S]

A private hub/spoke network where hub is 7200 and spokes are the 877
routers in question.

Connected to the hub router is a freebsd box running latest build ntpd
(recently upgraded) which is happily serving other clients as a stratum
2 box.

A large percentage of the 87x routers will sync happily with the S2 box
and stay in sync with it for their lifetimes.

a small percentage sync initially but then lose sync after 10 minutes.

On the happy boxes:

#sh ntp assoc

address         ref clock     st  when  poll reach  delay  offset   disp
*~<S2>           <S1>         2    28   512  377     8.5    0.13     7.5

on the sad boxes:

#sh ntp assoc

address    ref clock       st   when   poll reach  delay  offset   disp
~<S2>       <S1>            2     43     64   377  0.000 134559. 1938.5

#sh ntp assoc det
<S2> configured, insane, invalid, stratum 2
ref ID <S1>  , time CE071C7B.D722D2EE (16:02:19.840 BST Tue Jul 14 2009)
our mode client, peer mode server, our poll intvl 64, peer poll intvl 64
root delay 0.00 msec, root disp 15.53, reach 377, sync dist 2.38
delay 0.00 msec, offset 134559.7237 msec, dispersion 1938.59
precision 2**18, version 4
org time CE071F24.C3B751E7 (16:13:40.764 BST Tue Jul 14 2009)
rec time CE071E9D.B07AD5A3 (16:11:25.689 BST Tue Jul 14 2009)
xmt time CE071E9D.A8FD405C (16:11:25.660 BST Tue Jul 14 2009)
filtdelay =     0.02    0.05    0.02    0.00    0.00    0.00    0.00    0.00
filtoffset =  135.08  134.81  134.55    0.00    0.00    0.00    0.00    0.00
filterror =     0.00    0.00    0.00   16.00   16.00   16.00   16.00   16.00
minpoll = 6, maxpoll = 10

*Jul 14 15:45:47.737: NTP recv pkt on v4 socket, pak = 0x83E79C78.
*Jul 14 15:45:47.737: NTP message received from <S2> on interface 'Dialer0':
*Jul 14 15:45:47.737:
 NTP Header:
   Leap = 00, Version = 4, Mode = 4,
   Stratum = 2,
   Poll Interval = 6,
   Precision = -18,
   Root Delay = 0.82,
   Root Dispersion = 0.1755,
   refid = <S1>,
   Last update reftime = 3456574670.3602360983,
   Originated time = 3456575147.3064944142,
   Received time = 3456575152.3162200771,
   Transmit time = 3456575152.3162396127.

To get it back, I simply remove the "clock-period" and reconfigure the
ntp server and I get another 10 mins of working ntp.

This is only happening to a very small percentage of routers from a new
batch recently purchased, I'm wondering if the "clock-period"
calculation is wrong?

Stuff that is the same between working/nonworking routers

- clock/timezone config
- latency and network quality between router and S2 server
- receipt of NTP packets (debug ntp pack shows *all* are being received
and processed so not an acl/filtering issue)

bugtool seems to be broken when searching for keyword "NTP" in all
12.4(T) train, I've reported this (just gives me blank screen in
multiple browsers), release notes do not show anything of interest.

Anybody with good NTP foo able to look at this and immediately
spot something obvious? or could it be there is a hardware problem in
this batch?

Footnotes:

- Upgraded to 12.4(22)T where clock-period is no longer configurable by
operator, same problem occurs.

- Only seems to affect a small percentage of 877 routers,
878s, 1800s , 2800s seem to be fine

Dave.