Clocks, Time Error, and Noise

Date: Jan 8, 2022 By Shahid Ajmeri, Dennis Hagarty, Anshul Tanwar. Sample Chapter is provided courtesy of Cisco Press.

In the world of electronics, the term clock refers to a microchip that generates a clock signal, which is used to regulate the timing and speed of the components on a circuit board. This sample chapter from Synchronizing 5G Mobile Networks explains the different metrics used to quantify time error and how to measure them.

In this chapter, you will learn the following:

Clocks: Covers clocks, clock signals, and the key components of clocks.
Time error: Explains what time error is, the different types of metrics to quantify time error, and how these metrics are useful in defining clock accuracy and stability.
Holdover performance: Explains holdover for clocks, applicable whenever a synchronization reference is temporarily lost, and why the holdover capability of a clock becomes critical to ensure optimal network and application functioning.
Transient response: Examines what happens when a slave clock changes its input reference.
Measuring time error: Describes how to determine and quantify the key metrics of time error.

In earlier chapters, you read that any clock can only be near perfect (none are perfect)—there are always inherent errors in clocks. In this chapter, you will discover the various components of a clock and understand how these components contribute to removing or introducing certain errors. Because these errors can adversely impact the consumer of the synchronization services (the end application), it is important to track and quantify them. Once measured, these errors are compared against some defined performance parameters to determine how much (if any) impact they have on the end application. This chapter also explains the different metrics used to quantify time error and how to measure them.

Clocks

In everyday usage, the term clock refers to a device that maintains and displays the time of day and perhaps the date. In the world of electronics, however, clock refers to a microchip that generates a clock signal, which is used to regulate the timing and speed of the components on a circuit board. This clock signal is a waveform that is generated either by a clock generator or the clock itself—the most common form of clock signal in electronics is a square wave.

This type of clock is able to generate clock signals of different frequencies and phases as may be required by separate components within an electronic circuit or device. The following are some examples showing the functions of a clock:

Most sophisticated electronic devices require a clock signal for proper operation. These devices require that the clock signal delivered to them adheres to a core set of specifications.
All electronics devices on an electronic circuit board communicate with each other to accomplish certain tasks. Every device might require clock signals with a different specification; providing the needed signals allows these devices to interoperate with each other.

In both cases, a clock device on the circuit board provides such signals.

When discussing network synchronization or designing a timing distribution network, the timing signals need to travel much further than a circuit board. In this case, nodes must transfer clock signal information across the network. To achieve this, the engineer designates a clock as either a master clock or a slave clock. The master clock is the source for the clock signals, and a slave clock then synchronizes or aligns its clock signals to that of the master.

A clock signal relates to a (hardware) clock subsystem that generates a clocking signal, but often engineers refer to it simply using the term clock. You might hear the statement, “the clock on node A is not synchronized to a reference clock,” whereas the real meaning of clock in this sentence is that the clock signals are not synchronized. So, clock signal and clock are technically different terms with different meanings, but because the common usage has made one refer to the other, this chapter will also use the term clock to refer to a clock signal.

Oscillators

An electronic oscillator is a device that “oscillates” when an electric current is applied to it, causing the device to generate a periodic and continuous waveform. This waveform can be of different wave shapes and frequencies, but for most purposes, the clock signals utilized are sine waves or square waves. Thus, oscillators are a simple form of clock signal generation device.

There are a few different types of oscillators (as described in Chapter 3, “Synchronization and Timing Concepts”), but in modern electronics, crystal oscillators (referred to as XO) are the most common. The crystal oscillator is made up of a special type of crystal, which is piezoelectric in nature, which means that when electric current is applied to this crystal, it oscillates and emits a signal at a very specific frequency. The frequency could vary from a few tens of kHz to hundreds of MHz depending on the physical properties of the crystal.

Quartz is one example of a piezoelectric crystal and is commonly used in many consumer devices, such as wristwatches, wall clocks, and computers. Similar devices are also used in networking devices such as switches, routers, radio base stations, and so on. Figure 5-1 shows a typical crystal commonly utilized in such a device.

Figure 5-1 16-MHz Crystal Oscillator

The quartz that is being used in crystal oscillators is a naturally occurring element, although manufacturers grow their own for purity. The natural frequency of the clock signal generated by a crystal depends on the shape or physical properties (sometimes referred to as the cut) of the crystal.

On the other hand, the stability of the output signal is also heavily influenced by many environmental factors, such as temperature, humidity, pressure, vibration, and magnetic and electric fields. Engineers refer to this as the sensitivity of the oscillator to environmental factors. For a given oscillator, the sensitivity to one factor is often dependent on the sensitivity to another factor, as well as the age of the crystal or device itself.

As a real-life example, if your wristwatch is using a 32,768-Hz quartz crystal oscillator, the accuracy of the wristwatch in different environmental conditions will vary. The same behavior also applies to other electronic equipment, including transport devices and routers in the network infrastructure. This means that when electronic devices are used to synchronize devices to a common frequency, phase, or time, the environmental conditions adversely impact the stability of synchronization.

There have been many innovations to improve the stability of crystal oscillators deployed in unstable environmental conditions. One common approach in modern designs is for the hardware designer to design a circuit to vary the voltage being applied to the oscillator to adjust its frequency in small amounts. This class of crystal oscillator is known as voltage-controlled crystal oscillators (VCXO).

Of the many environmental factors that affect the stability and accuracy of a crystal oscillator, the major one is temperature. To provide better oscillator stability of crystal against temperature variations, two additional types of oscillators have emerged in the market:

Temperature-compensated crystal oscillators (TCXO) are crystal oscillators designed to provide improved frequency stability despite wide variations in temperature. TCXOs have a temperature compensation circuit together with the crystal, which measures the ambient temperature and compensates for any change by altering the voltage applied to the crystal. By aligning the voltage to values within the possible temperature range, the compensation circuit stabilizes the output clock frequency at different temperatures.
Oven-controlled crystal oscillators (OCXO) are crystal oscillators where the crystal itself is placed in an oven that attempts to maintain a specific temperature inside the crystal housing, independent of the temperature changes occurring outside. This reduces the temperature variation on the oscillator and thereby increases the stability of the frequency. As you can imagine, oscillators with additional heating components end up bulkier and costlier than TCXOs.

The basic approach with the TCXO is to compensate for measured changes in temperature by applying appropriate changes in voltage, whereas for the OCXO, the temperature is controlled (by being elevated above the expected operating temperature range). Figure 5-2 shows a typical OCXO.

Figure 5-2 Typical OCXO

An oscillator is the core component of a clock, which alone can significantly impact the quality of the clock. An approximate comparison of stability between these different oscillators suggests that the stability of an OCXO might be 10 to 100 times higher than a TCXO class device. Table 3-2 in Chapter 3 outlines the characteristics of the common types of oscillator.

The stability of an oscillator type also gets reflected in the cost. As a very rough estimate, cesium-based oscillators cost about $50,000 and rubidium-based oscillators around $100, whereas an OCXO costs around $30 and a TCXO would be less than $10.

PLLs

A phase-locked loop (PLL) is an electronic device or circuit that generates an output clock signal that is phase-aligned as well as frequency-aligned to an input clock signal. As shown in Figure 5-3, in its simplest form, a PLL circuit consists of three basic elements, as described in the list that follows.

Figure 5-3 PLL Building Blocks

Voltage-controlled oscillator (VCO): A special type of oscillator that changes frequency with changes in an input voltage (in this case from the loop filter). The frequency of the VCO with a nominal control signal applied is called the free-running frequency, indicated by the symbol f₀.
Phase comparator: Compares the phase of two signals (input clock and local oscillator) and generates a voltage according to the phase difference detected between the two signals. This output voltage is fed into the loop filter.
PLL loop filter: Primary function is to detect and filter out undesired phase changes passed on by the phase comparator in the form of voltage. This filtered voltage is then applied to the VCO to adjust the frequency. It is important to note that if the voltage is not filtered appropriately, it will result in a signal that exactly follows the input clock, inheriting all the variations or errors of the input clock reference. Thus, the properties of the loop filter directly affect the stability and performance of a PLL and the quality of the output signal.

When a PLL is initially turned on, the VCO with a nominal control signal applied will provide its free-running frequency (f₀). When fed an input signal, the phase comparator measures the phase difference compared to the VCO signal. Based on the size of the phase difference between the two signals, the phase comparator generates a correcting voltage and feeds it to the loop filter.

The loop filter removes (or filters out) the noise and passes the filtered voltage to the VCO. With the new voltage applied, the VCO output frequency begins to change. Assuming the input signal and VCO frequency are not the same, the phase comparator sees this as a phase shift, and the output of the loop filter will be an increasing or decreasing voltage depending on which signal has higher frequency.

This voltage adjustment causes the VCO to continuously change its frequency, reducing the difference between VCO and input frequency. Eventually the size of changes in the output voltage of the loop filter are also reduced, resulting in ever smaller changes to the VCO frequency—at some point achieving a “locked” state.

Any further change in input or VCO frequency is also tracked by a change in loop filter output, keeping the two frequencies very closely aligned. This process continues as long as an input signal is detected. The filtering process is covered in detail in the following section in this chapter.

Recall that the input signal, even one generated from a very stable source (such as an atomic clock), would have accumulated noise (errors) on its journey over the clocking chain. The main purpose of a PLL is to align frequency (and phase) to the long-term average of the input signal and ignore (filter out) short-term changes.

Now a couple of questions arise:

Will the loop filter react and vary the voltage fed to the VCO for every phase variation seen on the input signal?
When should the PLL declare itself in locked state or, conversely, if already in locked state, under what conditions could the PLL declare that it has lost its lock with the input reference?

The first question raised is answered in the next section, but the second question needs a discussion on PLL states and the regions of operation of a PLL. The PLL is said to be in the transient state when the output is not locked to the input reference and it is in the process of locking to the input signal. Alternatively, the steady state is when the PLL is locked with the input reference. As explained earlier, even during steady-state operations, the VCO will keep adjusting the frequency to match the input frequency based on the differential voltage being fed from the loop filter.

These states are governed by three different frequency regions, known as hold-in range, pull-in range, and pull-out range. Figure 5-4 illustrates these frequency ranges.

Figure 5-4 Frequency Ranges of PLL Operations

These frequency ranges are always quoted relative to the free-running frequency of the VCO (f₀) and are specified in parts per million (ppm) or parts per billion (ppb). Because these are ranges, there needs to be both minimum and maximum values. If not explicitly specified, the range is interpreted as including everything between the positive and negative values of the specified number. For example, if the range is specified as 4.6 ppm (or 4.6 × 10^–6), the range is assumed to cover a range between –4.6 ppm and +4.6 ppm around f₀.

As per ITU-T G.810, the definitions of these frequency regions are as follows:

Hold-in range: “The largest offset between a slave clock’s input signal and the nominal (natural) frequency of its VCO, within which the slave clock maintains lock as the frequency varies arbitrarily slowly over the frequency range.”

This is the range of difference between the nominal frequency and input frequency for which the PLL can steadily maintain phase tracking while in a steady (or locked) state. If the frequency of the input reference is slowly reduced or increased within the range, the PLL can still track it. The edge of the hold-in range is the point at which the PLL will lose the locked state.
Pull-in range: “The largest offset between a slave clock’s input signal and the nominal (natural) frequency of its VCO, within which the slave clock will achieve locked mode.”

This is the range of difference between the nominal frequency and the input frequency for which the PLL will always become locked throughout an acquisition (or tracking) process. Note that during this acquisition process there might be more than one cycle slip, but the PLL will always lock to the input signal. This is the range of frequencies within which a PLL can transition from the transient state to steady (locked) state.
Pull-out range: “The offset between a slave clock’s input signal and the nominal (natural) frequency of its VCO, within which the slave clock stays in the locked mode and outside of which the slave clock cannot maintain locked mode, irrespective of the rate of the frequency change.”

This can be seen as the range of the frequency step, which if applied to a steady-state PLL, the PLL still remains in the steady (or locked) state. The PLL declares itself not locked if the input frequency step is outside of this range.

Taking an example from ITU-T G.812, both the pull-in and hold-in ranges for Type III clock type is defined as 4.6 × 10–⁶, which is the same as ±4.6 ppm. Table 3-3 in Chapter 3 provides a quick reference of these frequency ranges for different types of clock nodes.

When the PLL is tracking an input reference, it is the loop filter that is enforcing the limits of these frequency ranges. The loop filter is usually a low-pass filter, and as the name suggests, it allows only low-frequency (slow) variations to pass through. That means it removes high-frequency variation and noise in the reference signal. Conversely, a high-pass filter allows high-frequency variations to pass and removes the low-frequency changes.

While these filters are discussed in the next section in detail, it is important to note that “low-frequency variations” does not refer to the frequency of the clock signal, but the rate with which the frequency or phase of the clock signal varies. Low rate (less frequent) changes are a gradual wander in the signal, whereas high rate changes are a very short-term jitter in the signal. You will read more about jitter and wander later in this chapter.

Because PLLs synchronize local clock signals to an external or reference clock signal, these devices have become one of the most commonly used electronic circuits on any communication device. Out of several types of PLL devices, one of the main types of PLL used today is the digital PLL (DPLL), which is used to synchronize digital signals.

It is worthwhile noting that, just like any other electronic circuits and devices, PLLs have also been evolving. Designers of modern communications equipment are incorporating the latest PLL devices into them, circuits that now contain multiple PLLs (analogue or digital).

To reduce real-estate requirements on circuit boards, newer-generation devices can operate with lower-frequency oscillators that can replace expensive high-frequency oscillators. They also output signals with ultra-low levels of jitter that is required for the tight jitter specifications required by some equipment designs (remembering that jitter is high-frequency noise).

Low-Pass and High-Pass Filters

A low-pass filter (LPF) is a filter that filters out (removes) signals that are higher than a fixed frequency, which means an LPF passes only signals that are lower than a certain frequency—hence the name low-pass filter. For the same reasons, sometimes LPFs are also called high-cut filters because they cut off signals higher than some fixed frequency. Figure 5-5 illustrates LPF filtering where signals with lower frequency than a cut-off frequency are not attenuated (diminished). The pass band is the range of frequencies that are not attenuated, and the stop band is the range of frequencies that are attenuated.

Meanwhile, the range up to the cut-off frequency becomes the clock bandwidth, which also matches the width of a filter’s pass band. For example, as shown in Figure 5-5, in the case of an LPF, the clock bandwidth is the range of frequencies that constitute the pass band. The clock bandwidth is typically a configurable parameter based on the capabilities of a PLL.

Figure 5-5 Low-Pass Filter

Similarly, a high-pass filter (HPF) is a device that filters out signals that are lower than some fixed frequency, which means that an HPF passes only signals that are higher than a certain frequency. And again, HPFs are also sometimes called low-cut filters because they cut off lower than a fixed frequency signal. Figure 5-6 depicts an HPF, showing that the pass band and stop band are a mirror image of the LPF case.

Figure 5-6 High-Pass Filter

It is interesting to note that combining both filters (LPF and HPF) on a circuit, you could design a system to allow only a certain range of frequencies and filter out the rest. Such a combination of LPF and HPF behaves as shown in Figure 5-7 and is called a band-pass filter. The band-pass name comes from the fact that it allows a certain band of frequencies (from lower cut-off to higher cut-off) to pass and attenuates the rest of the spectrum.

Figure 5-7 Band-Pass Filter

These filters are of the utmost importance when trying to filter errors out from a clock signal. To understand this process in more detail, this chapter next defines these errors, primarily jitter and wander, more technically.

Jitter and Wander

Jitter and wander are phase variations in the clock signal timing instants, as illustrated in Figure 5-8. Also, refer to the introduction to jitter and wander in Chapter 3. This variance and error, commonly called clock signal noise, can be caused by many factors, one being the quality of the clock components. Another factor is the noise accumulating from one node to the next when distributing timing signals through a chain of clocks.

Figure 5-8 Jitter Versus Ideal Versus Wander

Low rate (less frequent, slower) changes are a gradual wander in the signal, whereas high rate changes are a very short-term jitter in the signal. The ITU-T specifies (in G.810) that 10 Hz is the dividing line between jitter and wander (and has been a convention in the telecom industry for some time). And so, phase variations occurring at rate higher than 10 Hz are described as jitter, and the variations lower than 10 Hz are described as wander.

When all the phase variations of a clock signal as compared to its reference signal are measured and plotted in a graph, for jitter the resultant frequency (or rate) of the phase variations is higher than 10 Hz (ten variations per second). Figure 5-9 shows one such example of jitter, where the y-axis shows the phase variations (in ns), and the x-axis shows the time of variation (in seconds) itself. As depicted in Figure 5-9, the rate of phase variations recorded is much higher than 10 Hz, and such phase variations are classified as jitter.

Figure 5-9 Plot of Phase Deviations Showing Jitter in a Clock Signal

For wander, in a similar approach, a graph plotting the phase variations will show the frequency (or rate) of less than 10 Hz. It is important to note that the rate of phase variations for wander could go down to mHz or μHz (rate down to once in several minutes or hours). For this reason, it is always recommended to run wander measurement tests for long periods of time (hours or days).

As you read in Chapter 3, the jitter (and wander to some extent) can be filtered out, and to do that, filters are used. If one configures an LPF with 10 Hz as the cut-off frequency, it will eliminate phase variation with a rate of variation higher than 10 Hz. Because jitter is defined as phase variation above 10 Hz, any LPF configured this way filters out jitter.

Similarly, an HPF configured with a 10-Hz cut-off will filter out wander, because the HPF will filter out variations with a rate of 10 Hz or lower.

Phase variations from a real-life test are captured in Figure 5-10, Figure 5-11, and Figure 5-12, which show, respectively: 1) phase deviations of a clock signal with jitter and wander present with no filter; 2) with an HPF filter applied; and 3) with an LPF filter applied. In these figures, the y-axis shows phase variations (in ns), and the x-axis shows the time elapsed since the measurement was started (in minutes).

The graph shown in Figure 5-10 captures all the phase variations of a clock signal (low and high rate phase variations plotted in a single graph), and so it is not easy to visualize the jitter and the wander of a clock signal.

Figure 5-10 Plot of Phase Deviations of a Clock Signal Without Any Filters

In order to clearly visualize (and analyze) the jitter and wander, filters are applied to the phase variation measurements. In Figure 5-11, you can see that after the HPF is applied (which filters out the wander), the remaining noise is jitter (frequency of phase variations higher than 10 Hz).

Figure 5-11 Plot of Phase Deviations of a Clock Signal After HPF Applied

Similarly, in Figure 5-12, after the LPF is applied (which filters out the jitter), the remaining noise is wander.

Figure 5-12 Plot of Phase Deviations of a Clock Signal After LPF Applied

After reading the preceding section about clock bandwidth and the PLL loop filter, there are two questions that could arise. First, why not keep the LPF cut-off frequency very low to filter the jitter and also limit the wander? Recall that jitter is filtered with the LPF. And if the LPF cut-off frequency is kept low, it could also filter some range of wander. Of course, this means that the pass band for the LPF becomes very small. Secondly, why not do the same on every clock node in the chain?

To understand the answer to the second question, you first need to appreciate the following aspects of PLLs:

Not all PLLs can keep LPF cut-off frequency very low. Wander is classified as low-rate phase variations, which can reach extremely small values—10 Hz down to microhertz. So, there will always be some phase noise (in this case wander) within the LPF clock bandwidth and so will always be tracked by the PLL.
A PLL combines two signals: 1) the input reference signal and 2) the clock signal from the local oscillator (VCO) to output a synchronized signal. When the PLL loop filter (LPF) blocks the signal from the input reference, the output signal is constructed using the local oscillator.

So the process of a PLL filtering the noise from the input signal is substituting noise from the local oscillator. Taken to a theoretical corner case, if the clock bandwidth of an LPF is made zero, all that the PLL will output is the signal from the local oscillator—which defeats the purpose of having a reference signal.

So, if using a very low cut-off frequency for the LPF, the PLL needs to be matched to a good-quality local oscillator, so that the noise added by the local oscillator is reduced. For the hardware designer, this has obvious cost ramifications—to get better noise filtering, you need to spend more money on the oscillator.
The time taken for a PLL to move from transient state to the steady (or locked) state depends on the clock bandwidth (as well as the input frequency and quality of the local oscillator). The narrower the clock bandwidth for the LPF, the longer time it takes for the PLL to move to the steady state.

For example, a synchronization supply unit (SSU) type I clock or building integrated timing supply (BITS) stratum 2 clock with an LPF configured for bandwidth of 3 mHz will take several minutes to lock to an input signal. However, telecom networks are very widely distributed and can consist of long chains of clock nodes. If all the clocks in the chain had low bandwidth, the complete chain could take many hours to settle to a steady state. Similarly, it might take several hours for the last node of the chain to settle down after any disruption to the distribution of clock.

It is for these reasons that a clock node with better filtering capabilities should have a good-quality oscillator and should be placed in a chain of clock nodes at selected locations. These factors also explain why SSU/BITS clock nodes (which have stratum 2–quality oscillators and better PLL capabilities) are recommended only after a certain number of SDH equipment clock (SEC) nodes. The section “Synchronization Network Chain” in Chapter 6, “Physical Frequency Synchronization,” covers this limit and recommendations by ITU-T in greater detail.

To ensure interoperability between devices and to minimize the signal degradation due to jitter and wander accumulation across the network, the ITU-T recommendations (such as G.8261 and G.8262) specify jitter and wander performance limits for networks and clocks. The normal network elements (NE) and synchronous Ethernet equipment clocks (EEC) are usually allocated the most relaxed limits.

For example, ITU-T G.8262 specifies the maximum amount of peak-to-peak output jitter (within a defined bandwidth) permitted from an EEC. This is to ensure that the amount of jitter never exceeds the specified input tolerance level for subsequent EECs. Chapter 8, “ITU-T Timing Recommendations,” covers the ITU-T recommendations in greater detail.

Frequency Error

While jitter and wander are both metrics to measure phase errors, the frequency error (or accuracy) also needs to be measured.

The frequency error (also referred to as frequency accuracy) is the degree to which the frequency of a clock can deviate from a nominal (or reference) frequency. The metric to measure this degree is called the fractional frequency deviation (FFD) or sometimes just the frequency offset. This offset is also referred to as the fractional frequency offset (FFO).

The basic definition is given in ITU-T G.810 by the following equation:

where:

y(t) is the FFD at time t
v(t) is the frequency being measured and
v_nom is the nominal (or reference) frequency

FFD is often expressed in parts per million (ppm) or parts per billion (ppb). For example, the free-running frequency accuracy of a synchronous Ethernet (SyncE) clock is < 4.6 ppm, while the carrier frequency accuracy required in cellular mobile radio is < 50 ppb.

Taking an example calculation, if the nominal frequency of an oscillator is 20 MHz (which is 20,000,000 Hz and represents v_nom as per the previous ITU-T G.810 formula) and the measured frequency is 20,000,092 Hz, then the FFD for this case (in ppm) will be

Using the same formula, a measured frequency of 19,999,08 Hz and nominal frequency of 20,000,000 Hz will give the FFD as –4.6 ppm.

FFD can also be expressed in units of time. For example, an FFD of +4.6 ppm means that the measured signal is accumulating 4.6 μs of time for each second, whereas an FFD of +4.6 ppm means that the measured signal is losing 4.6 μs of time for each second. After 10 s of real time, the signal would be off by 46 μs compared to the reference signal. This is known as the frequency drift and the relation is as follows:

ITU-T clock specifications such as ITU-T G.812, define the holdover specification (also called long-term transient response) using frequency accuracy. This sets the limits of the drift of a clock away from the frequency to which it was previously locked. Also, other ITU-T clock specifications, such as ITU-T G.813, define the frequency accuracy of the free-running clock. This defines the required accuracy of the frequency of a clock at start-up, before it has been locked to any other reference.

Time Error

So far, this chapter has outlined the mechanisms that help to synchronize a local clock to a given external reference clock signal. However, even when a local clock is determined to be synchronized in both phase and frequency to a reference clock, there are always errors that can be seen in the final synchronized clock signal.

These errors can be attributed to various factors such as the quality and configuration of input circuit being used as a reference; any environmental factors that influence the synchronization process; and the quality of oscillator providing the initial clock.

To quantify the error in any synchronized clock, various parameters have been defined by the standard development organizations (SDO). To understand the errors and the various metrics to quantify these errors, refer to Figure 5-13, which illustrates the output of a clock synchronized to an external reference clock. Because the measurement is taken from the synchronized clock that is receiving a reference signal, this synchronized clock will be referred to as the measured clock.

Figure 5-13 Comparing Clock Signals

It is also important to note that the error being referred to here is an error in significant instants or phase of a clock. This is called phase error, which is a measure of the difference between the significant instants of a measured signal and a reference signal. Phase error is usually expressed in degrees, radians, or units of time; however, in telecommunication networks, phase error is typically represented as a unit of time (such as ns).

The two main aspects that you need to keep in mind for understanding clock measurements are as follows:

Just as the measured clock is synchronized to a reference clock, the measured clock is always compared to a reference clock (ideally the same one). Therefore, the measurement represents the relative difference between the two signals. Much like checking the accuracy of a wristwatch, measurement of a clock has no meaning unless it is compared to a reference.
The error in a clock varies over time, so it is important that clock errors are measured over a sufficiently long time period to thoroughly understand the quality of a clock. To illustrate, most quartz watches gain/lose some time (seconds) daily. For such a device, a measurement done every few minutes or every few hours might not be a good indication of the quality. For such watches, a good time period might be one month, over which it could wander by up to (say) 10 seconds.

For illustration purposes, Figure 5-13 shows only four clock periods or cycle times, T_c, of a clock. These clock periods are shown as (n – 1), (n), and (n + 1). Notice that the measured clock is not perfectly aligned to the reference clock, and that difference is called clock error or time error (TE). The error is marked as x_{(n – 1)} showing the error for instance (n – 1) of the clock period, x_(n) for period (n), and so on. In the figure, the TEs at each successive instance of the clock period are marked as x_{(n – 1)}, x_(n), and x_{(n + 1)}. The TE is simply the time of the measured clock minus the time of the reference clock.

In Figure 5-13, the measured clock signal at instance (n + 1) and (n) lags (arrives later than) the reference clock signal and is therefore by convention a negative time error. And for the interval (n + 1), the clock signal is leading the reference and is referred to as a positive time error. Of course, the time error measured at varying time periods varies and can be either negative or positive.

TE measurements in the time domain are normally specified in seconds, or some fraction of, such as nanoseconds or microseconds. As mentioned previously, there could be positive or negative errors, and so it is imperative to indicate this via a positive or negative sign. For example, a time error for a certain interval might be written as +40 ns, whereas another interval might be –10 ns.

An engineer can measure the TE value (against the reference) at every edge (or significant instant) of the clock over an extended observation interval. If these values are plotted in a graph against time on the horizontal axis, it might look something like the graph shown in Figure 5-14. Each additional measurement (+ or –) is added to the graph at the right. A value of zero means that for that measurement, the offset between the signals was measured as zero.

Figure 5-14 Graph Showing cTE, dTE, and max|TE|

This figure also includes several statistical derivations of TE, the main one being max|TE|, which is defined as the maximum absolute value of the time error observed over the course of the measurement. The following sections explain these measures and concepts. Refer to ITU-T G.810 and G.8260 to get more details about the mathematical models of timing signals and TE.

Maximum Absolute Time Error

The maximum absolute time error, written as max|TE|, is the maximum TE value that the measured clock has produced during a test measurement. This is a single value, expressed in units of seconds or fractions thereof, and represents the TE value that lies furthest away from the reference. Note that although the TE itself can be positive or negative, the max|TE| is taken as an absolute value; hence, the signed measured value is always written as a positive number. In Figure 5-14, the max|TE| is measured as 50 ns.

Time Interval Error

The time interval error (TIE) is the measure of the change in the TE over an observation interval. Recall that the TE itself is the phase error of the measured clock as compared to the reference clock. So, the change in the TE, or the TIE, is the difference between the TE values of a specific observation interval.

The observation interval, also referred to as τ (tau), is a time interval that is generally fixed before the start of a measurement test. The TE is measured and recorded at the beginning of the test and after each observation interval time has passed; and the difference between the corresponding time error values gives the TIE for that observation interval.

As an example, assuming an observation interval consists of k cycle times of a clock, the calculations for TE and TIE for the clock signals shown in Figure 5-15 are as follows (the measurement process starts at clock period n or (n)^th T_c):

Figure 5-15 Time Error for k Cycle Periods as an Observation Interval

TE at (n)^th T_c = difference between measured clock and reference = x_(n)
TE at (n + 1)^th T_c = difference between measured clock and reference = x_{(n + 1)}
And similarly, TE at (n + k)^th T_c = x_{(n + k)} and so on
TIE for first observation interval (k cycles between n and n + k) = x_{(n + k)} – x_(n)

To characterize the timing performance of a clock, the timing test device measures (observes) and records the TIE as the test proceeds (as the observation interval gets longer and longer). So, if the clock period (T_c) is 1 s, then the first observation interval after the start of the test will be at t = 1 s, then the second will be at t = 2 s, and so on (assuming the observation interval steps are the same as the clock period).

This way, TIE measures the total error that a clock has accumulated as compared to a reference clock since the beginning of the test. Also, because TIE is calculated from TE, TIE is also measured using units of seconds. For any measurement, by convention the value of TIE is defined to be zero at the start of the measurement.

In comparison, the TE is the instantaneous measure between the two clocks; there is no interval being observed. So, while the TE is the recorded error between the two clocks at any one instance, the TIE is the error accumulated over the interval (length) of an observation. Another way to think of it is that the TE is the relative time error between two clocks at a point (instant) of time, whereas TIE is the relative time error between two clocks accumulated between two points of time (which is an interval).

Taking a simple example where the steps in the observation interval for a TIE measurement is the same as the period of a TE measurement, the TIE will exactly follow the contour of the TE. One case might be where the TE is being measured every second, and the TIE observation interval is increasing by one second at every observation. Although the curve will be the same, there is an offset, because the TIE value was defined to be at 0 at the start of the test.

Figure 5-16 shows this example case, where the plotted TIE graph looks the same as a plot of time error with a constant offset because the TIE value starts at 0. The list that follows explains this figure in more detail.

Figure 5-16 Graph Showing TE and TIE Measurements and Plots

As shown in Figure 5-16, the TE is +10 ns at the start of a measurement (TE₀), at t = 0. Therefore, TE₀ is +10 ns (measured), and the TIE is zero (by convention).
Assume that the observation interval is 1 second. After the first observation interval, from Figure 5-16, TE₁ is 15 ns. The TIE calculated for this interval will be (TE₁ – TE₀) = 5 ns.
After the second observation interval, assume TE₂ is 0 ns. The TIE calculated now will be (TE₂ – TE₀) = –10 ns. Note that this calculation is for the second observation interval and the interval for calculations really became 2 seconds.
The observation interval keeps increasing until the end of the entire measurement test.

There are other clock metrics (discussed later in this chapter), such as MTIE and TDEV, that are calculated and derived from the TIE measurements. TIE, which is primarily a measurement of phase accuracy, is the fundamental measurement of the clock metrics.

Some conclusions that you can draw based on plotted TIE on a graph are as follows:

An ever-increasing trend in the TIE graph suggests that the measured clock has a frequency offset (compared to the reference clock). You can infer this because the frequency offset will be seen in each time error value and hence will get reflected in TIE calculations.
If TIE graph shows a large value at the start of the measurement interval and starts converging slowly toward zero, it might suggest that the clock or PLL is not yet locked to the reference clock or is slow in locking or responding to the reference input.

Note that the measurement of the change in TE at each interval really becomes measurement of a short-term variation, or jitter, in the clock signals, or what is sometimes called timing jitter. And as TIE is a measurement of the change in phase, it becomes a perfect measurement for capturing jitter.

Now it is time to revisit the concepts of accuracy and stability from Chapter 1, “Introduction to Synchronization and Timing.” These are two terms that have a very precise meaning in the context of timing.

For synchronization, accuracy measures how close one clock is to the reference clock. And this measurement is related to the maximum absolute TE (or max|TE|). So, a clock that closely follows the reference clock with a very small offset is an accurate clock.

On the other hand, stability refers to the change and the speed of change in the clock during a given observation interval, while saying nothing about how close it is to a reference clock.

As shown in Figure 5-17, an accurate clock might not be a very stable clock, or a very stable clock might not be accurate, and so on. As the end goal is to have the most stable as well as the most accurate clock possible, a clock is measured on both aspects. The metrics that quantify both these aspects become the timing characteristics of a clock.

Figure 5-17 Considering Clock Accuracy and Stability

As you should know by now, the timing characteristics of a clock depend on several factors, including the performance of the in-built oscillator. So, it is possible to use that to differentiate between different classes of performance that a clock exhibits. Consequently, the ITU-T has classified clocks by the difference in expected timing characteristics—based on the oscillator quality. For example, a primary reference clock or primary reference source (PRC/PRS), based on a cesium atomic reference, is expected to provide better accuracy and stability when compared to a clock based on a stratum 3E OCXO oscillator. Chapter 3 thoroughly discussed these different classes of oscillator, so now it is time to define some metrics to categorize these qualities more formally.

Constant Versus Dynamic Time Error

Constant time error (cTE) is the mean of the TE values that have been measured. The TE mean is calculated by averaging measurements over either some fixed time period (say 1000 seconds) or the whole measurement period. When calculated over all the TE measurements, cTE represents an average offset from the reference clock as a single value. Figure 5-14 showed this previously, where the cTE is shown as a line on the graph at +15 ns. Because it measures an average difference from the reference clock, cTE is a good measure of the accuracy of a clock.

Dynamic time error (dTE) is the variation of TE over a certain time interval (you may remember, the variation of TE is also measured by TIE). Additionally, the variation of TE over a longer time period is known as wander, so the dTE is effectively a representation of wander. Figure 5-14 showed this previously, where the dTE represents the difference between the minimum and maximum TE during the measurement. Another way to think about it is that dTE is a measure of the stability of the clock.

These two metrics are very commonly used to define timing performance, so they are important concepts to understand. Normally, dTE is further statistically analyzed using MTIE and TDEV; and the next section details the derivations of those two metrics.

Maximum Time Interval Error

As you read in the “Time Interval Error” section, TIE measures the change in time error over an observation interval, and the TIE is measured as the difference between the TE at the start and end of the observation interval. However, during this observation interval, there would have been other TE values that were not considered in the TIE calculation.

Taking the example as shown in Figure 5-18, there were k time error values (k also being the observation interval). These time error values were denoted as x_(n), x_{(n + 1)}, x_{(n + 2)},…, x_{(n + k)}. The TIE calculation only considered x_(n) and x_{(n + k)} because the observation interval was k cycles starting at cycle n. It could be that these two values do not capture the maximum or minimum time error values. To determine the maximum variation during this time, one needs to find the maximum TE (x_(max)) and minimum TE (x_(min)) during the same observation interval.

Figure 5-18 TIE and MTIE Example

This maximum variation of the TIE within an observation interval is known as the maximum time interval error (MTIE). The MTIE is defined in ITU-T G.810 as “the maximum peak-to-peak delay variation of a given timing signal” for a given observation interval. As this essentially represents the variation in the clock, it is also referred to as the maximum peak-to-peak clock variation.

The peak-to-peak delay variation captures the two inverse peaks (low peak and high peak) during an observation interval. This is calculated as the difference between the maximum TE (at one peak) and minimum TE (another peak) for a certain observation interval. Figure 5-18 shows an example of TIE and of MTIE for the same observation interval, illustrating the maximum peak-to-peak delay variation.

The example in Figure 5-18 clearly shows that during an observation interval k, while the TIE calculation is done based on the interval between x_(n) and x_{(n + k)}, there are peaks during this interval that are the basis for the MTIE calculations. These peaks are denoted by P_(max) and P_(min) in the figure.

Note also that as the MTIE is the maximum value of delay variation, it is recorded as a real maximum value; not only for one observation interval but for all observation intervals. This means that if a higher value of MTIE is measured for subsequent observation intervals, it is recorded as a new maximum or else the MTIE value remains the same. This in turn means that the MTIE values never decrease over longer and longer observation intervals.

For example, if the MTIE observed over all the 5-second periods of a measurement run is 40 ns, then this value will not go to less than 40 ns when measured for any subsequent (longer) periods. For example, any MTIE that is calculated using a 10-s period will be the same or more than any maximum value found for the 5-s periods. This applies similarly for larger values.

Therefore, MTIE can only stay the same or increase if another higher value is found during the measurement using a longer observation interval; hence, the MTIE graph will never decline as it moves to the right (with increasing tau).

The MTIE graph remaining flat means that the maximum peak-to-peak delay variation remains constant; or in other words, no new maximum delay variation was recorded. On the other hand, if the graph increases over time, it suggests that the test equipment is recording ever-higher maximum peak-to-peak delay variations. Also, if the graph resembles a line increasing linearly, it shows that the measured clock is not locked to the reference clock at all, and continually wandering off without correction. MTIE is often used to define a limit on the maximum phase deviation of a clock signal.

A simplified algorithm to record and plot MTIE is explained as follows:

Determine all the TE values for every 1-second interval (the values may have been sampled at a higher rate, such as 1/30 of 1 second).
Find the maximum peak-to-peak delay variation within each observed 1-s observation interval (MTIE values for 1-s intervals).
Record the highest value at the 1-s horizontal axis position in the MTIE graph.
Determine all the TE values for all 2-s intervals.
Find the maximum peak-to-peak delay variation within each observed 2-s interval (MTIE values for 2-s intervals).
Record the highest value against the 2-s horizontal axis position in the MTIE graph.
Repeat for other time intervals (4 s, 8 s, 16 s, etc.).

Figure 5-19 shows the MTIE limit for a primary reference time clock (PRTC), as specified in ITU-T G.8272.

Figure 5-19 MTIE Limits for Two Classes of PRTC Clock (Figure 1 from ITU-T 8272)

The flat graph required by the recommendation for the MTIE values suggests that after a certain time period, there should not be any new maximum delay variation recorded for a PRTC clock. This maximum MTIE value, as recommended in ITU-T G.8272, is 100 ns for a PRTC class A (PRTC-A) clock and 40 ns for a PRTC class B (PRTC-B) clock. This line on the graph represents the limits below which the calculated values must remain and is referred to as a mask.

The standards development organizations have specified many MTIE masks—for example, the ITU-T has written them into recommendations G.812, G.813, G.823, G.824, and G.8261 for frequency performance of different clock and network types.

To summarize, MTIE values record the peak-to-peak phase variations or fluctuations, and these peak-to-peak phase variations can point to frequency shift in the clock signal. Thus, MTIE is very useful in identifying a shift in frequency.

Time Deviation

Whereas MTIE shows the largest phase swings for various observation intervals, time deviation (TDEV) provides information about phase stability of a clock signal. TDEV is a metric to measure and characterize the degree of phase variations present in the clock signal, primarily calculated from TIE measurements.

Unlike MTIE, which records the difference between the high and low peaks of phase variations, TDEV primarily focuses on how frequent and how stable (or unstable) such phase variations are occurring over a given time—note the importance of “over a given time.” This is important because both “how frequent” and “how stable” parameters would change by changing the time duration of the measurements. For example, a significant error occurring every second of a test can be considered a frequently occurring error. But if that was the only error that occurred for the next hour, the error cannot be considered to be a frequently occurring error.

Consider a mathematical equation that divides the total phase variations (TE values) by the total test time; you will realize that if there are less phase variations over a long time, this equation produces a low value. However, the result from the equation increases if there are more phase variations during the same test time. This sort of equation is used for TDEV calculations to determine the quality of the clock.

The low frequency phase variations could appear to be occurring randomly. Only if such events are occurring less often over a longer time can these be marked as outliers. And this results in gaining confidence in the long-term quality of the clock. Think of tossing five coins simultaneously, and with the first toss, all five coins come down heads! Without tossing those coins a lot more times, you cannot be sure if that was an outlier (a random event) or if something strange is wrong with the coins.

The TDEV of a clock defines the limit of any randomness (or outliers) in the behavior of a clock. The degree of this randomness is a measure of the low frequency phase variation (wander), which can be statistically calculated from TIE measurements and plotted on graphs, using a TDEV calculation.

According to ITU-T G.810, TDEV is defined as “a measure of the expected time variation of a signal as a function of integration time.” So, TDEV is particularly useful in revealing the presence of several noise processes in clocks over a given time interval; and this calculated measurement is compared to the limit (specified as a mask in the recommendations) for each clock type.

Usually, instability of a system is expressed in terms of some statistical representation of its performance. Mathematically, variance or standard deviation are usual metrics that would provide an adequate view to express such instability. However, for clocks, it has been shown that the usual variance or standard deviation calculation has a problem—the calculation does not converge.

The idea of convergence can be explained with a generic example. A dataset composed of a large collection of values with a Gaussian distribution will have a specific mean and variance. The mean shows the average value, and the variance shows how closely the values cluster around the mean—much like the stability of a clock. Now, if one were to calculate these values using a small subset or sample of these values, then the level of confidence in those metrics would be low—much like the toss of five coins. And as these parameters are calculated using larger and larger samples of this example dataset, the confidence in these metrics increases. The increase in confidence here means that variance moves toward a true value for the given dataset—this is referred to as convergence of variance. However, for clocks and oscillators, no matter how large the sample of variations is, apparently there is no evidence of this variance value converging. To deal with this deficiency, one of the alternative approaches is to use TDEV.

Rather than understanding the exact formula for calculating TDEV, it might be better to see the TDEV measure as a root mean square (RMS) type of metric to capture clock noise. Mathematically, RMS is calculated by first taking a mean (average) of square values of the sample data and then taking a square root of it. The key aspect of this calculation is that it averages out extremes or outliers in the sample (much more than the normal mean). Interestingly, the extremes average out more and more as the sample data increases.

Thus, TDEV becomes a highly averaged value. And if the noise in the clock is occurring rarely, the degree (or magnitude) of this rarely occurring error starts diminishing over longer periods of measurement. An example might be something occurring rarely or seldom, such as errors introduced by environmental changes over the course of the day (diurnal rhythms), such as the afternoon sun striking the exterior metal panel of the clock.

So, to gain confidence in the TDEV values (primarily to discard the rare noise occurrence in clock), the tests are usually quite long, at least several times longer than the usual MTIE observation intervals. For example, ITU-T G.811 calls for running TDEV tests for 12 times the maximum observation interval.

Compared to TDEV, you might notice that MTIE is perfectly suited to catch outliers in samples of TIE measurements. MTIE is a detector of peaks, but TDEV is a measure of wander over varying observation time intervals and is well suited to characterization of wander noise. It can do that because it is very good at averaging out short-term and rare variations.

So, it makes sense that TDEV is used alongside MTIE as metrics to characterize the quality of clock. As is the case for MTIE, ITU-T recommendations also specify TDEV masks for the same purpose. These masks show the maximum permissible values for both TDEV and MTIE for different clock types and in different conditions.

Noise

In the timing world, noise is an unwanted disturbance in a clock signal—the signal is still there, and it is good, but it is not as perfect as one would like. Noise arises over both the short term, say for a single sampling period, as well as over much longer timeframes—up to the lifetime of the clock. Because there are several factors that can introduce noise, the amount and type of noise can vary greatly between different systems.

The metrics that are discussed in this chapter try to quantify the noise that a clock can generate at its output. For example, max|TE| represents the maximum amount of noise that can be generated by a clock. Similarly, cTE and dTE also characterize different aspects of the noise of a clock.

Noise is classified as either jitter or wander. As seen in the preceding section “Jitter and Wander,” by convention, jitter is the short-term variation, and wander is the long-term variation of the measured clock compared to the reference clock. The ITU-T G.810 standard defines jitter as phase variation with a rate of change greater than or equal to 10 Hz, whereas it defines wander as phase variation with a rate of change less than 10 Hz. In other words, slower-moving, low-frequency jitter (less than 10 Hz) is called wander.

One analogy is that of a driver behind the wheel of a car driving along a straight road across a featureless plain. To keep within the lane, the driver typically “corrects” the steering wheel with very short-term variations in the steering input and force (unconsciously, drivers do these corrections many times per second). This is to correct the “jitter” of the car moving around within the lane.

But now imagine that the road is gently turning to the right and the driver is beginning to impact a little more force on the steering wheel in that direction to affect the turn. This is an analogy for wander. The jitter is the short-term variance trying to take you out of your lane, but the long-term direction being steered is wander.

The preceding sections explained that MTIE and TDEV are metrics that can be used to quantify the timing characteristics of clocks. Therefore, it is no surprise that the timing characteristics of these clocks based on the noise performance (specifically noise generation) have been specified by ITU-T in terms of MTIE and TDEV masks. These definitions are spread across numerous ITU-T specifications, such as G.811, G.812, G.813, and G.8262. Chapter 8 covers these specifications in more detail.

Because these metrics are so important, any timing test equipment used to measure the quality of clocks needs to be able to generate MTIE and TDEV graphs as output. The results on these graphs are then compared to one of the masks from the relevant ITU-T recommendation. However, it is imperative that the engineer uses the correct mask to compare the measured results against.

For example, if the engineer is measuring SyncE output from a router, the MTIE data needs to be compared to the SyncE mask from G.8262 and not from some other standard like G.812 (which covers SSU/BITS clocks).

Figure 5-20 represents the various types of categories of noise performance and the point at which they are relevant (and could be measured).

Figure 5-20 Noise Generation, Transfer, and Tolerance

The sections that follow cover each of these categories in more detail.

Noise Generation

Figure 5-21 illustrates the noise generation at the output of Network Element-1 (NE-1) and hence the measurement point for noise generation. Noise generation is the amount of jitter and wander that is added to a perfect reference signal at the output of the clock. Therefore, it is clearly a performance characteristic of the quality or fidelity with which a clock can receive and transmit a time signal. Figure 5-21 depicts some amount of noise that gets added and is observed at output, when an ideal (perfect) reference signal is fed to the clock.

Figure 5-21 Noise Generation

One analogy that might help is that of a photocopier. If you take a document and run it though a photocopier, it will generate a good (or even excellent) copy of the document. The loss of quality in that process is an analogy for noise generation. But now, take a photocopy of that copy of the document, and a photocopy of that copy. Repeat that process, and after four to five generations of copies, the quality will have degraded substantially. The lower the noise generated in each cycle of the copy process, the more generations of copies can be made with acceptable quality.

For transport of time across a network, lower noise generation in each clock translates into a better clock signal at the end of the chain, or the ability to have longer chains. That is why noise generation is a very important characteristic for clocks.

Noise Tolerance

A slave clock can lock to the input signal from a reference master clock, and yet every clock (even reference clocks) generates some additional noise on its output. As described in the previous section, there are metrics that quantify the noise that is generated by a clock. But how much noise can a slave clock receive (tolerate) on its input and still be able to maintain its output signal within the prescribed performance limits?

This is known as the noise tolerance of a clock. Figure 5-20 shows the noise tolerance at the input of Network Element-3 (NE-3). It is simply a measure of how bad an input signal can become before the clock can no longer use it as a reference.

Going back to the photocopier analogy, the noise tolerance is the limit at which the photocopier can only just read the input document well enough to produce a readable copy. Any further degradation and it cannot produce a satisfactory output.

Like noise generation, noise tolerance is also specified as MTIE and TDEV masks for each class of clock performance. Up until the maximum permissible values of noise specified in ITU-T recommendations is applied at the input of a clock, the output of the clock should continue to operate within the expected performance limits. These masks are specified in the same ITU-T specifications that cover noise generation, such as G.812, G.813, G.823, G.824, and G.8262.

What happens when the noise at the input of a clock exceeds the permissible limits—what should be the action of a clock? The clock could do one or more of the following:

Report an alarm, warning that it has lost the lock to the input clock
Switch to another reference clock if there are backup references available
Go into holdover (operating without an external reference input) and raise an alarm warning of the change of status and quality

Noise Transfer

To synchronize network elements to a common timing source, there can be at least two different approaches—external clock distribution and line timing clock distribution.

An external clock distribution or frequency distribution network (refer to the section “Legacy Synchronization Networks” in Chapter 2, “Usage of Synchronization and Timing”) becomes challenging because the network elements are normally geographically dispersed. So, building a dedicated synchronization network to each node could become very costly, and hence this method is often not preferred.

The other approach is to cascade the network elements and use the line clocking (again refer to the section “Legacy Synchronization Networks” in Chapter 2), where the clock is passed from one network element to the next element, as in a daisy chain. Figure 5-20 shows the clock distribution model using the line clocking approach, where NE-1 is passing the clock to NE-2 and then subsequently to NE-3.

From the previous section on noise generation, you know that every clock will generate additional noise on the output of the clock compared to the input. When networks are designed as a daisy chain clock distribution, the noise generated at one clock is being passed as input to the next clock synchronized to it. This way, noise not only cascades down to all the clocks below it in the chain, but the noise gets amplified during transfer. To try to lessen that, the accumulated noise needs to be somehow reduced or filtered at the output of the clock (on each network element).

This propagation of noise from input to output of a node is called noise transfer and can be defined as the noise that is seen at the output of the clock due to the noise that is fed to the input of the clock. The filtering capabilities of the clock determines the amount of noise being transferred through a clock from the input to the output. Obviously, less noise transfer is desirable. Figure 5-20 shows the noise that NE-2 transferred to NE-3 based on noise it was subjected to from NE-1.

Again, as with the ITU-T clock specifications for noise generation and tolerance, MTIE and TDEV masks are used to specify the allowed amount of noise transfer. Remember that each clock node has a PLL with filtering capabilities (LPF and HPF) that can filter out noise based on the clock bandwidth (refer to the earlier section “Low-Pass and High-Pass Filters”).

The main metric used to describe how noise is passed from input to output is clock bandwidth, which was explained in the section “Low-Pass and High-Pass Filters.” For example, the noise transfer of an Option 1 EEC is described in clause 10.1 of ITU-T G.8262 as follows: “The minimum bandwidth requirement for an EEC is 1 Hz. The maximum bandwidth requirement for an EEC is 10 Hz.”

Holdover Performance

Given an accurate reference clock as an input signal, one can achieve a well-synchronized clock. This is the fundamental principle of timing synchronization whereby a master clock drives the slave clock, always assuming that the master clock itself is stable and accurate.

Occasionally, there will be cases where the master clock becomes unavailable. During such periods, the local clock is no longer disciplined by a master (or reference) clock. Most synchronization-based applications need to know the amount of frequency drift that can occur during such events.

The time during which a clock does not have a reference clock to synchronize to is called clock holdover. In this state, the clock behaves like a flywheel that keeps spinning at a constant speed even when it is not being actively driven, so this is sometimes also referred to as flywheel mode. And the measure of the speed with which a slave clock drifts away from the reference or master clock is called its holdover performance. The clock’s ability to maintain the same frequency over an interval of time without a reference frequency being available is called frequency holdover. Similarly, time or phase holdover is the ability to maintain phase accuracy over an interval of time without the external phase reference.

This synchronization across network elements is achieved either via a physical signal (such as SyncE) or via precision time protocol (PTP) packets carrying timing synchronization information. There is even the ability to combine both approaches in a hybrid clocking architecture, whereby frequency is carried with SyncE and phase synchronization with PTP packets. Chapter 7, “Precision Time Protocol,” delves into this a little more.

A physical link failure (or fault) will break the frequency distribution via SyncE or interrupt the PTP packet path between slave and master. During such events, the slave clock will start slowly “drifting” away from the reference or master clock. This frequency and phase drift, besides depending on many external factors, also depends on components used for the local clock and the characteristics of the PLL circuitry.

Obviously, it is desirable for the drift from the reference clock during holdover to be small, although “small” is not a helpful measure and so it is defined in more quantitative terms. The measurement of holdover performance is done when either one, both, or all clock sources (frequency and phase) are removed, and the output of the slave clock is measured against a reference signal from the master clock.

The ITU-T recommendations specify MTIE and TDEV masks to compare the accepted holdover performance for each clock type (the masks are not as strict as the masks used when the clocks are locked to an external reference clock).

Two major factors determine holdover performance. These are primarily the internal components of a clock or PLL and secondly the external factors that could impact the clock output signals. Among the components, the oscillator (chiefly its stability) plays the most important role in providing better holdover performance from a clock. For example, a clock based on a rubidium (stratum 2) oscillator or an OCXO (stratum 3E) oscillator will provide better holdover characteristics than a stratum 4 oscillator.

Similarly, smaller variability in environmental conditions (especially temperature) during the holdover period will result in better holdover characteristics. For example, rubidium oscillators are very stable, but they are susceptible to being less so when exposed to temperature changes. Figure 5-22 provides a very rough comparison of holdover performance of clocks with different classes of oscillator. Note that Figure 5-22 is a graphical representation to show the magnitude of deviation with respect to time; either positive or negative deviations are possible for different oscillators.

Figure 5-22 Relative Holdover Performance for Different Classes of Oscillator

Hardware designers are constantly innovating to improve holdover performance. One such innovation is to observe and record the variations in the local oscillator while the reference signal is available, sometimes also referred to as history. When the reference signal is lost, this historical information is used to improve the preservation of the accuracy of the output signal. When using such techniques, the quality of this holdover data plays an important role in the holdover performance.

There are cases where multiple paths of synchronization are available, and not all are lost simultaneously. For the hybrid synchronization case, where frequency synchronization is achieved via physical network (SyncE) and phase synchronization is achieved via PTP packets, it may be that only one of the synchronization transport paths breaks (such as, only the PTP packet path or just the frequency synchronization path). For such cases, one of the synchronization modes could move into holdover.

For example, it could be that the PTP path to the master is lost (for example, due to logical path failure to the PTP master) but frequency synchronization (SyncE) is still available. In this case, the frequency stays locked to the reference and just the phase/time moves into holdover.

Why is holdover performance important, and how long should good holdover performance be required?

As an example, let us examine the case of a 4G LTE-Advanced cell site, which may have a phase alignment requirement to be always within ±1.5 μs of its master. This cell site is equipped with a GPS receiver and external antenna to synchronize its clock to this level of phase accuracy. Suppose this GPS antenna fails for some reason (say, a lightning strike). In that case, the clock in this cell site equipment will immediately move into holdover. The operation of this cell site can only continue so long as its clock phase alignment of that cell site radio stays within ±1.5 μs of accuracy.

Should this GPS antenna failure happen during the night, it might take some time for operations staff to discover and locate the failure, and then dispatch a technician to the correct location. If it was a long weekend, it might take quite some time for the technician to reach the location and fix the issue (assuming the technician knows what the problem is and has spares for the GPS antenna readily available).

Of course, it would be very convenient if, during this whole duration, the clock was able to stay within a phase alignment of ±1.5 μs and provide uninterrupted service to that area. If the maximum time to detect and correct this fault is (say) 24 hours, then for uninterrupted service, the clock on this cell site requires a holdover period of 24 hours for ±1.5 μs. In fact, the generally accepted use case at the ITU-T is that a clock should need holdover performance for up to 72 hours (considering a long weekend).

Note that this requirement (24 hours for ±1.5 μs) is quite a strict holdover specification and just about all currently deployed systems (cell site radios and routers) are not able to achieve this level of holdover. Being able to do it for 72 hours is very difficult and requires a clock with a very high-quality oscillator, such as one based on rubidium.

Adding to the difficulty, the weather conditions at a cell site (up some mountaintop) might be extremely challenging with only minimal control of environmental conditions for the equipment. Temperature is a very significant factor contributing to the stability of an oscillator, and that stability feeds directly into holdover performance. For this reason, the ITU-T G.8273.2 specification for a PTP boundary and slave clock defines the required holdover performance in absence of PTP packets for both cases—constant temperature and variable temperature.

Transient Response

Holdover is the state that a clock transitions to whenever the input reference clock is lost. It has been shown that providing good, long-duration holdover to a failure of the reference signal is neither easy nor inexpensive. However, there could be multiple reference clocks simultaneously available to a slave clock, offering the ability to provide backup signals in the case of failure. Consequently, network designers prefer a design where multiple references are available to the network elements—providing redundancy and backup for timing synchronization.

The question arises, how should the clock behave when an input reference clock is lost and the clock decides to select the second-best clock source as the input reference? The behavior during this event is characterized as the transient response of a clock.

Multiple reference inputs to a clock could be traceable (via different paths) to the same source of time (such as a PRTC) or back to different sources. The clock quality information (such as Ethernet synchronization messaging channel (ESMC) and PTP packets) sent through the network shows the quality of the source of time that these separate references offer. The software managing the slave clocks then decides which backup reference clock to use based on the quality information. As expected, a high-quality reference is preferred over a lower-quality reference clock.

For cases where there are multiple inputs, the network designer might assign the input references a priority, so that the slave clock can decide in what order the input is preferred. So, when the highest priority (say priority 1) reference clock fails, the slave must decide whether there is another reference available to provide an input signal. The slave selects one of the remaining sources available out of those signaling the highest quality level. If more than one source signal is available at that quality level, the slave selects the available reference with the highest priority (say, priority 2). The slave will then start to acquire this new input signal and continue to provide an output signal but now aligned to the new reference.

This switchover, referred to as reference clock change, is known as a transient event and the change in timing characteristics of a clock during such an event is referred to as transient response. This is a very different event from a clock moving into the holdover state, because the change in reference clock happens very quickly, resulting in the slave clock being in the holdover state for only a very brief time.

These transients are thus divided in the ITU-T recommendations into two categories: long-term transients and short-term transients. Switching between two reference inputs is an example of a short-term transient. Long-term transient is another name for holdover.

Transient response specifies the noise that a slave clock generates during such events. Slave clocks are unlikely to be transparent during such events and could either transfer or generate additional noise, which is measured as the transient response. Again, for each clock type, there are MTIE and TDEV masks in the ITU-T standards to specify the accepted transient response. For example, you can find MTIE and TDEV masks for transient response for clocks supporting SDH (E1) and SONET (T1) in clause 11 of ITU-T G.812.

Measuring Time Error

This chapter explores several definitions and limits of time error, and numerous metrics to quantify it, as well as examining the impact of time error on applications. In summary, time error, essentially the difference between a slave clock and a reference or master clock, is primarily measured using four metrics: cTE, dTE, max|TE|, and TIE. From those basic measurements, engineers have statistically derived MTIE and TDEV from TIE, in combination with the application of different filters.

ITU-T recommendations specify the allowed limits for these different metrics of time error. The metrics cTE and max|TE| are defined in terms of time (such as nanoseconds). dTE, being the change of the time error function over time, is characterized using TIE and is then compared to a mask expressed in MTIE and TDEV, which defines the permissible limits for that type of clock.

Some ITU-T recommendations define allowed limits for a single standalone clock (a single network element), and others specify clock performance at the final output of a timing chain in an end-to-end network. You should not mix these two types of specifications, because limits that apply to the standalone case do not apply to the network case. Similarly, masks used to define the limits on a standalone clock do not apply to the network. This is especially the case when discussing the testing of timing performance, as in Chapter 12, “Operating and Verifying Timing Solutions.”

For each type and class of standalone clock, all four metrics, max|TE|, cTE, dTE, and TIE, are specified by these standards, while for the network case, max|TE| and dTE are specified. The cTE is not listed for network limits because the limits of max|TE| for the network case automatically include cTE, which might be added by any one clock in the chain of clocks.

The ITU-T specifies the various noise and/or time error constraints for almost all the clock types. For each one of the numerous clock specifications, the ITU-T recommendations define all five aspects for timing characterization of a clock:

Noise generation
Noise tolerance
Noise transfer
Transient response
Holdover performance

Chapter 8 will cover these specifications in more detail.

Topology

As shown in Figure 5-23, the setup required for time error measurement testing includes

Timing testing equipment that can synthesize ideal reference clock signals.
Clock under test that would normally be embedded in a network element.
Clock capture and measurement equipment. Most often this is the same equipment that generates the reference clock so that it can do an easy comparison to the output signal returned from the clock under test.

Note that if the tester is a different piece of equipment from the reference input clock, the same clock signal needs to be also passed to the tester to allow it to compare the measured clock to the reference clock. You cannot use two separate signals, one as reference and one in the tester.

Figure 5-23 Time Error Measurement Setup

The testing equipment that is generating the reference clock is referred to as a synthesized reference clock in Figure 5-23 because of the following:

For noise transfer tests, the reference clock needs to be synthesized such that jitter and wander is introduced (under software control). The tester generates the correct amount of jitter and wander, according to ITU-T recommendations, for the types of clock under test. The clock under test is required to filter out the introduced input noise and produce an output signal that is under the permissible limits from the standards. This limit is defined as a set of MTIE and TDEV masks.
For noise tolerance tests, the reference clock needs to emulate the input noise that the clock under test could experience in real-world deployments. And for this purpose, the testing equipment simulates a range of predefined noise or time errors in the reference signal to the clock under test. To test the maximum noise that can be tolerated by the clock as input, the engineer tests that the clock does not generate any alarms; does not switch to another reference; or does not go into holdover mode.
For noise generation tests, the reference clock that is fed to the clock under test should be ideal, such as one sourced from a PRTC. So, for this measurement, there is no need for artificially synthesizing the reference clock, unlike the noise transfer and noise tolerance case, where the clock is synthesized. The testing equipment then simply compares the reference input signal to the noisy signal from the clock under test.

References in This Chapter

Open Circuit. “16Mhz Crystal Oscillator HC-49s.” Open Circuit. Picture. https://opencircuit.shop/Product/16MHz-Crystal-Oscillator-HC-49S

The International Telecommunication Union Telecommunication Standardization Sector (ITU-T).

“G.781: Synchronization layer functions for frequency synchronization based on the physical layer.” ITU-T Recommendation, 2020. http://handle.itu.int/11.1002/1000/14240
“G.810: Definitions and terminology for synchronization networks.” ITU-T Recommendation, 1996. http://handle.itu.int/11.1002/1000/3713
“G.811: Timing characteristics of primary reference clocks.” ITU-T Recommendation, 1997. http://handle.itu.int/11.1002/1000/4197
“G.812: Timing requirements of slave clocks suitable for use as node clocks in synchronization networks.” ITU-T Recommendation, 2004. http://handle.itu.int/11.1002/1000/7335
“G.813: Timing characteristics of SDH equipment slave clocks (SEC).” ITU-T Recommendation, 2003. http://handle.itu.int/11.1002/1000/6268
“G.8261: Timing and synchronization aspects in packet networks.” ITU-T Recommendation, Amendment 1, 2020. http://handle.itu.int/11.1002/1000/14207
“G.8262: Timing characteristics of a synchronous equipment slave clock.” ITU-T Recommendation, Amendment 1, 2020. http://handle.itu.int/11.1002/1000/14208
“G.8262.1: Timing characteristics of an enhanced synchronous equipment slave clock.” ITU-T Recommendation, Amendment 1, 2019. http://handle.itu.int/11.1002/1000/14011

Chapter 5 Acronyms Key

The following table expands the key acronyms used in this chapter.

Term	Value
μs	microsecond or 1 × 10^-6 second
4G	4th generation (mobile telecommunications system)
BITS	building integrated timing supply (SONET)
cTE	constant time error
DPLL	digital phase-locked loop
dTE	dynamic time error
E1	2-Mbps (2048-kbps) signal (SDH)
EEC	Ethernet equipment clock
ESMC	Ethernet synchronization messaging channel
FFD	fractional frequency deviation
FFO	fractional frequency offset
GPS	Global Positioning System
HPF	high-pass filter
ITU	International Telecommunication Union
ITU-T	ITU Telecommunication Standardization Sector
LPF	low-pass filter
LTE	Long Term Evolution (mobile communications standard)
max\|TE\|	maximum absolute time error
MTIE	maximum time interval error
NE	network element
OCXO	oven-controlled crystal oscillator
PLL	phase-locked loop
ppb	parts per billion
ppm	parts per million
PRC	primary reference clock (SDH)
PRS	primary reference source (SONET)
PRTC	primary reference time clock
PRTC-A	primary reference time clock—class A
PRTC-B	primary reference time clock—class B
PTP	precision time protocol
RMS	root mean square
SDH	synchronous digital hierarchy (optical transport technology)
SDO	standards development organization
SEC	SDH equipment clock
SONET	synchronous optical network (optical transport technology)
SSU	synchronization supply unit (SDH)
SyncE	synchronous Ethernet—a set of ITU-T standards
T1	1.544-Mbps signal (SONET)
TCXO	temperature-compensated crystal oscillator
TDEV	time deviation
TE	time error
TIE	time interval error
VCO	voltage-controlled oscillator
VCXO	voltage-controlled crystal oscillator
XO	crystal oscillator