exec.htm [plain text]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" &amp;lt;html>
<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<title>Executive Summary - Computer Network Time
Synchronization</title>
</head>
<body>
<h3>Executive Summary - Computer Network Time Synchronization</h3>

<img align="left" src="pic/alice12.gif" alt="gif"><a href=
"pictures.htm">from <i>Alice's Adventures in Wonderland</i>, Lewis
Carroll</a> 

<p>The executive is the one on the left.<br clear="left">
</p>

<hr>
<h4>Introduction</h4>

<p>The standard timescale used by most nations of the world is
Coordinated UniversalTime (UTC), which is based on the Earth's
rotation about its axis, and the Gregorian Calendar, which is based
on the Earth's rotation about the Sun. The UTC timescale is
disciplined with respect to International Atomic Time (TAI) by
inserting leap seconds at intervals of about 18 months. UTC time is
disseminated by various means, including radio and satellite
navigation systems, telephone modems and portable clocks.</p>

<p>Special purpose receivers are available for many
time-dissemination services, including the Global Position System
(GPS) and other services operated by various national governments.
For reasons of cost and convenience, it is not possible to equip
every computer with one of these receivers. However, it is possible
to equip some number of computers acting as primary time servers to
synchronize a much larger number of secondary servers and clients
connected by a common network. In order to do this, a distributed
network clock synchronization protocol is required which can read a
server clock, transmit the reading to one or more clients and
adjust each client clock as required. Protocols that do this
include the Network Time Protocol (NTP), Digital Time
Synchronization Protocol (DTSS) and others found in the literature
(See "Further Reading" at the end of this article.)</p>

<h4>Protocol Design Issues</h4>

<p>The synchronization protocol determines the time offset of the
server clock relative to the client clock. The various
synchronization protocols in use today provide different means to
do this, but they all follow the same general model. On request,
the server sends a message including its current clock value or <i>
timestamp</i> and the client records its own timestamp upon arrival
of the message. For the best accuracy, the client needs to measure
the server-client propagation delay to determine its clock offset
relative to the server. Since it is not possible to determine the
one-way delays, unless the actual clock offset is known, the
protocol measures the total roundtrip delay and assumes the
propagation times are statistically equal in each direction. In
general, this is a useful approximation; however, in the Internet
of today, network paths and the associated delays can differ
significantly due to the individual service providers.</p>

<p>The community served by the synchronization protocol can be very
large. For instance, the NTP community in the Internet of 1998
includes over 230 primary time servers, synchronized by radio,
satellite and modem, and well over 100,000 secondary servers and
clients. In addition, there are many thousands of private
communities in large government, corporate and institution
networks. Each community is organized as a tree graph or <i>
subnet</i>, with the primary servers at the root and secondary
servers and clients at increasing hop count, or stratum level, in
corporate, department and desktop networks. It is usually necessary
at each stratum level to employ redundant servers and diverse
network paths in order to protect against broken software, hardware
and network links.</p>

<p>Synchronization protocols work in one or more association modes,
depending on the protocol design. Client/server mode, also called
master/slave mode, is supported in both DTSS and NTP. In this mode,
a client synchronizes to a stateless server as in the conventional
RPC model. NTP also supports symmetric mode, which allows either of
two peer servers to synchronize to the other, in order to provide
mutual backup. DTSS and NTP support a broadcast mode which allows
many clients to synchronize to one or a few servers, reducing
network traffic when large numbers of clients are involved. In NTP,
IP multicast can be used when the subnet spans multiple
networks.</p>

<p>Configuration management can be a serious problem in large
subnets. Various schemes which index public databases and network
directory services are used in DTSS and NTP to discover servers.
Both protocols use broadcast modes to support large client
populations; but, since listen-only clients cannot calibrate the
delay, accuracy can suffer. In NTP, clients determine the delay at
the time a server is first discovered by polling the server in
client/server mode and then reverting to listen-only mode. In
addition, NTP clients can broadcast a special "manycast" message to
solicit responses from nearby servers and continue in client/server
mode with the respondents.</p>

<h4>Security Issues</h4>

<p>A reliable network time service requires provisions to prevent
accidental or malicious attacks on the servers and clients in the
network. Reliability requires that clients can determine that
received messages are authentic; that is, were actually sent by the
intended server and not manufactured or modified by an intruder.
Ubiquity requires that any client can verify the authenticity of
any server using only public information. This is especially
important in such ubiquitous network services as directory
services, cryptographic key management and time
synchronization.</p>

<p>NTP includes provisions to cryptographically authenticate
individual servers using symmetric-key cryptography in which
clients authenticate servers using shared secret keys. However, the
secret keys must be distributed in advance using secure means
beyond the scope of the protocol. This can be awkward and fragile
with a large population of potential clients, possibly intruding
hackers.</p>

<p>Modern public-key cryptography provides means to reliably bind
the server identification credentials and related public values
using public directory services. However, these means carry a high
computing cost, especially when large numbers of time-critical
clients are involved as often the case with NTP servers. In
addition, there are problems unique to NTP in the interaction
between the authentication and synchronization functions, since
each requires the other for success.</p>

<p>The recent NTP Version 4 includes a revised security model and
authentication scheme supporting both symmetric and public-key
cryptography. The public-key variant is specially crafted to reduce
the risk of intrusion, minimize the consumption of processor
resources and minimize the vulnerability to hacker attack.</p>

<h4>Computer Clock Modelling and Error Analysis</h4>

Most computers include a quartz resonator-stabilized oscillator and
hardware counter that interrupts the processor at intervals of a
few milliseconds. At each interrupt, a quantity called <i>tick</i>
is added to a system variable representing the clock time. The
clock can be read by system and application programs and set on
occasion to an external reference. Once set, the clock readings
increment at a nominal rate, depending on the value of <i>tick</i>.
Typical Unix system kernels provide a programmable mechanism to
increase or decrease the value of <i>tick</i> by a small, fixed
amount in order to amortize a given time adjustment smoothly over
multiple <i>tick</i> intervals. 

<p>Clock errors are due to variations in network delay and
latencies in computer hardware and software (jitter), as well as
clock oscillator instability (wander). The time of a client
relative to its server can be expressed</p>

<center><i>T</i>(<i>t</i>) = <i>T</i>(<i>t</i><sub>0</sub>) + <i>
R</i>(<i>t - t</i><sub>0</sub>) + 1/2 <i>D</i>(<i>t -
t</i><sub>0</sub>)<sup>2</sup>,</center>

<p>where <i>t</i> is the current time, <i>T</i> is the time offset
at the last measurement update <i>t</i><sub>0</sub>, <i>R</i> is
the frequency offset and <i>D</i> is the drift due to resonator
ageing. All three terms include systematic offsets that can be
corrected and random variations that cannot. Some protocols,
including DTSS, estimate only the first term in this expression,
while others, including NTP, estimate the first two terms. Errors
due to the third term, while important to model resonator aging in
precision applications, are neglected, since they are usually
dominated by errors in the first two terms.</p>

<p>The synchronization protocol estimates <i>
T</i>(<i>t</i><sub>0</sub>) (and <i>R</i>(<i>t</i><sub>0</sub>),
where relevant) at regular intervals <font face="symbol">t</font>
and adjusts the clock to minimize <i>T</i>(<i>t</i>) in future. In
common cases, <i>R</i> can have systematic offsets of several
hundred parts-per-million (PPM) with random variations of several
PPM due to ambient temperature changes. If not corrected, the
resulting errors can accumulate to seconds per day. In order that
these errors do not exceed a nominal specification, the protocol
must periodically re-estimate <i>T</i> and <i>R</i> and compensate
for variations by adjusting the clock at regular intervals. As a
practical matter, for nominal accuracies of tens of milliseconds,
this requires clients to exchange messages with servers at
intervals in the order of tens of minutes.</p>

<p>Analysis of quartz-resonator stabilized oscillators show that
errors are a function of the averaging time, which in turn depends
on the interval between corrections. At correction intervals less
than a few hundred seconds, errors are dominated by jitter, while,
at intervals greater than this, errors are dominated by wander. As
explained later, the characteristics of each regime determine the
algorithm used to discipline the clock. These errors accumulate at
each stratum level from the root to the leaves of the subnet tree.
It is possible to quantify these errors by statistical means, as in
NTP. This allows real-time applications to adjust audio or video
playout delay, for example. However, the required statistics may be
different for various classes of applications. Some applications
need absolute error bounds guaranteed never to exceeded, as
provided by the following correctness principles.</p>

<h4>Correctness Principles</h4>

<p>Applications requiring reliable time synchronization such as air
traffic control must have confidence that the local clock is
correct within some bound relative to a given timescale such as
UTC. There is a considerable body of literature that studies these
issues with respect to various failure models such as fail-stop and
Byzantine disagreement. While these models inspire much confidence
in a theoretical setting, most require multiple message rounds for
each measurement and would be impractical in a large computer
network such as the Internet. However, it can be shown that the
worst-case error in reading a remote server clock cannot exceed
one-half the roundtrip delay measured by the client. This is a
valuable insight, since it permits strong statements about the
correctness of the timekeeping system.</p>

<p>In the Probabilistic Clock Synchronization (PCS) scheme devised
by Cristian, a maximum error tolerance is established in advance
and time value samples associated with roundtrip delays that exceed
twice this value are discarded. By the above argument, the
remaining samples must represent time values within the specified
tolerance. As the tolerance is decreased, more samples fail the
test until a point where no samples survive. The tolerance can be
adjusted for the best compromise between the highest accuracy
consistent with acceptable sample survival rate.</p>

<p>In a scheme devised by Marzullo and exploited in NTP and DTSS,
the worst-case error determined for each server determines a
correctness interval. If each of a number of servers are in fact
synchronized to a common timescale, the actual time must be
contained in the intersection of their correctness intervals. If
some intervals do not intersect, then the clique containing the
maximum number of intersections is assumed correct <i>
truechimers</i> and the others assumed incorrect <i>
falsetickers</i>. Only the truechimers are used to adjust the
system clock.</p>

<h4>Data Grooming Algorithms</h4>

By its very nature, clock synchronization is a continuous process,
resulting in a sequence of measurements with each of possibly
several servers and resulting in a clock adjustment. In some
protocols, crafted algorithms are used to improve the time and
frequency estimates and refine the clock adjustment. Algorithms
described in the literature are based on trimmed-mean and median
filter methods. The clock filter algorithm used in NTP is based on
the above observation that the correctness interval depends on the
roundtrip delay. The algorithm accumulates offset/delay samples in
a window of several samples and selects the offset sample
associated with the minimum delay. In general, larger window sizes
provide better estimates; however, stability considerations limit
the window size to about eight. 

<p>The same principle could be used when selecting the best subset
of servers and combining their offsets to determine the clock
adjustment. However, different servers often show different
systematic offsets, so the best statistic for the central tendency
of the server population may not be obvious. Various kinds of
clustering algorithms have been found useful for this purpose. The
one used in NTP sorts the offsets by a quality metric, then
calculates the variance of all servers relative to each server
separately. The algorithm repeatedly discards the outlyer with the
largest variance until further discards will not improve the
residual variance or until a minimum number of servers remain. The
final clock adjustment is computed as a weighted average of the
survivors.</p>

<p>At the heart of the synchronization protocol is the algorithm
used to adjust the system clock in accordance with the final
adjustment determined by the above algorithms. This is called the
clock discipline algorithm or simply the discipline. Such
algorithms can be classed according to whether they minimize the
time offset or frequency offset or both. For instance, the
discipline used in DTSS minimizes only the time offset, while the
one used in NTP minimizes both time and frequency offsets. While
the DTSS algorithm cannot remove residual errors due to systematic
frequency errors, the NTP algorithm is more complicated and less
forgiving of design and implementation mistakes.</p>

<p>All clock disciplines function as a feedback loop, with measured
offsets used to adjust the clock oscillator phase and frequency to
match the external synchronization source. The behavior of feedback
loops is well understood and modelled by mathematical analysis. The
significant design parameter is the time constant, or
responsiveness to external or internal variations in time or
frequency. Optimum selection of time constant depends on the
interval between update messages. In general, the longer these
intervals, the larger the time constant and vice versa. In practice
and with typical network configurations the optimal poll intervals
vary between one and twenty minutes for network paths to some
thousands of minutes for modem paths.</p>

<h4>Further Reading</h4>

<ol>
<li>
<p>Cristian, F. Probabilistic clock synchronization. In Distributed
Computing 3, Springer Verlag, 1989, 146-158.</p>
</li>

<li>
<p>Digital Time Service Functional Specification Version T.1.0.5.
DigitalEquipment Corporation, 1989.</p>
</li>

<li>
<p>Gusella, R., and S. Zatti. TEMPO - A network time controller for
a distributed Berkeley UNIX system. IEEE Distributed Processing
Technical Committee Newsletter 6, NoSI-2 (June 1984), 7-15. Also
in: Proc. Summer 1984 USENIX (Salt Lake City, June 1984).</p>
</li>

<li>
<p>Kopetz, H., and W. Ochsenreiter. Clock synchronization in
distributed real-time systems. IEEE Trans. Computers C-36, 8
(August 1987), 933-939.</p>
</li>

<li>
<p>Lamport, L., and P.M. Melliar-Smith. Synchronizing clocks in the
presence of faults. JACM 32, 1 (January 1985), 52-78.</p>
</li>

<li>
<p>Marzullo, K., and S. Owicki. Maintaining the time in a
distributed system. ACM Operating Systems Review 19, 3 (July 1985),
44-54.</p>
</li>

<li>
<p>Mills, D.L. Adaptive hybrid clock discipline algorithm for the
Network Time Protocol. <i>IEEE/ACM Trans. Networking 6, 5</i>
(October 1998), 505-514.</p>
</li>

<li>
<p>Mills, D.L. Improved algorithms for synchronizing computer
network clocks. <i>IEEE/ACM Trans. Networks 3, 3</i> (June 1995),
245-254.</p>
</li>

<li>
<p>Mills, D.L. Internet time synchronization: the Network Time
Protocol. IEEE Trans. Communications COM-39, 10 (October 1991),
1482-1493. Also in: Yang, Z., and T.A. Marsland (Eds.). Global
States and Time in Distributed Systems, IEEE Press, Los Alamitos,
CA, 91-102.</p>
</li>

<li>
<p>Mills, D.L. Modelling and analysis of computer network clocks.
Electrical Engineering Department Report 92-5-2, University of
Delaware, May 1992, 29 pp.</p>
</li>

<li>
<p>NIST Time and Frequency Dissemination Services. NBS Special
Publication432 (Revised 1990), National Institute of Science and
Technology, U.S. Department of Commerce, 1990.</p>
</li>

<li>
<p>Schneider, F.B. A paradigm for reliable clock synchronization.
Department of Computer Science Technical Report TR 86-735, Cornell
University, February 1986.</p>
</li>

<li>
<p>Srikanth, T.K., and S. Toueg. Optimal clock synchronization.
JACM 34, 3 (July 1987), 626-645.</p>
</li>

<li>
<p>Stein, S.R. Frequency and time - their measurement and
characterization (Chapter 12). In: E.A. Gerber and A. Ballato
(Eds.). Precision Frequency Control, Vol. 2, Academic Press, New
York 1985, 191-232, 399-416. Also in: Sullivan, D.B., D.W. Allan,
D.A. Howe and F.L. Walls (Eds.). Characterization of Clocks and
Oscillators. National Institute of Standards and Technology
Technical Note 1337, U.S. Government Printing Office (January,
1990), TN61-TN119.</p>
</li>
</ol>

<hr>
<a href="index.htm"><img align="left" src="pic/home.gif" alt=
"home"></a> 

<address><a href="mailto:mills@udel.edu">David L. Mills
&lt;mills@udel.edu&gt;</a></address>
</body>
</html>