Abstract—The design and simulation of a 10Gb/s Tx/Rx serial link has been performed in 90nm CMOS. The transmitter contains 2:1 data multiplexing and 4-tap transmit equalization with current mode drivers. The receiver has 1:4 downsampling with variable gain control, predictive decision feedback equalization and a 50 ohm termination. Clocking was provided by a custom PLL and DLL with phase interpolation. The link was simulated in a lossy channel modeled for resistive, capacitive and inductive losses.

I. INTRODUCTION

The origins of electronics were formed from the need to transmit information over long distances. Since the success of the first telegraph operation, engineers have been attempting to relay data faster, with lower power, and more reliably to meet the growing consumer demand. Today, serial links can provide data transmission at rate up to tens of Gb/s per channel through channels with responses not much different than that of previous generations. The reason for this progress comes from innovative and careful design of the transmit and receive circuitry which will be investigated in this document.

This report will cover the design of a 10Gb/s serial link in 90nm CMOS. Topics covered include the channel response, high level architectural design choices, transmitter design, receiver characterization, equalization schemes, clocking methodologies, and finally a performance summary.

II. CHANNEL CHARACTERISTICS

The channel used for this design was modeled from an integrated circuit chip to PCB model shown in Figure 1 while the transmission line characteristics are listed in Table 1.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Specified Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Distributed Capacitance</td>
<td>148 pF/m</td>
</tr>
<tr>
<td>Mutual Dist. Capacitance</td>
<td>6.6 pF/m</td>
</tr>
<tr>
<td>Distributed Inductance</td>
<td>302 nH/m</td>
</tr>
<tr>
<td>Mutual Dist. Inductance</td>
<td>13.4 nH/m</td>
</tr>
<tr>
<td>Characteristic Impedance</td>
<td>45 Ohm</td>
</tr>
<tr>
<td>Distributed Resistance</td>
<td>4.375 Ohm/m</td>
</tr>
<tr>
<td>Dist. Skin Effect Resistance</td>
<td>330 uOhm/m</td>
</tr>
</tbody>
</table>

The channel was initially tested with the injection of 5ns and 500ps pulses through a PMOS current mode driver to examine the attenuation and distortion of the line. Various input driver device sizes were simulated and their response is shown in Figure 2 and Figure 4. It is clear that driver sizes and currents must be large to effectively transmit even higher frequency data. While reflections contributed to the channel degeneration, capacitive and resistive losses tended to dominate.

III. SERIAL LINK ARCHITECTURE

The serial link design consisted of a transmitter with a 2:1 input data multiplexer, current mode driver and transmit equalization. Clocking for the transmitter is provided by a PLL running at 10GHz with an inverter based VCO. This data output feeds into the channel described above and into the receiver. The receiver consists of a variable gain amplifier followed by a 1:4 demultiplexing with four comparators and predictive decision feedback equalization. Data output is clocked with an inverter based DLL and comes out of the link at 2.5 Gb/s. A model of the link is shown in Figure 5.

IV. TRANSMITTER DESIGN

The basic transmitter is composed of three components: a CML (current mode logic) multiplexer, a CML buffer, and the CML driver itself. A CML topology was chosen because each of these structures needs to operate at 10 GHz, which means very high bandwidth is required. CML provides high bandwidth, but the circuits consume static power as a price. CML uses differential pair structures that produce signals with less than rail-to-rail swing, but can also receive such signals. Since the link using differential signaling, the differential structure of CML makes it a natural fit in the transmitter.
A. The 4:1 Multiplexer

Since the link operates at 10 GHz, which is faster than typical CMOS clock speeds for this process, multiplexing is required to combine slower data into one 10 GHz bit stream. Assuming the CMOS clock is 2.5 GHz, the CML 4x1 multiplexer drives four bits every 2.5 GHz cycle.

Four 2.5 GHz clock phases are the selection signals to the multiplexer. For example, when phases 0 and 1 are high, bit 0 is selected by the multiplexer. When phases 1 and 2 are high, bit 1 is selected. The multiplexer performs an “AND” operation of two phases to simulate having a clock with a 25% duty cycle. This allows each of the four incoming 2.5 GHz bit streams to be selected 25% of the time.

The multiplexer uses a CML topology. The typical differential pair structure is modified to allow eight different current paths. Stacks of two NMOS switches both perform the AND operation on two clock phases and allows current to flow through only two of these paths at any one time. Then the current value of the selected bit stream steers current through only one of the two remaining paths. Figure 3 shows a schematic of the 4x1 multiplexer.

B. The Buffer and Driver

Both the buffer and driver use the typical CML differential pair structure. The driver’s output resistance must match the characteristic impedance of the transmission line, which is 50 Ω. The driver’s current is then chosen to provide about 200 mV of differential voltage swing at the receiver. Assuming a channel with a length of 50 cm, about 400 mV of swing is required at the driver due to losses in the channel. With such a small load, a sizeable amount of current is needed to produce these swings (about 4 mA), which means the input transistors have large widths (36 µm).

Since the whole CML driver is large with large parasitic capacitances, the output of the multiplexer must be buffered to reach the required bandwidths. Another CML differential pair is used with smaller currents and transistor sizes, but larger load resistors to bridge the gap between the smaller multiplexer and the large driver. Figure 8 shows a schematic of the topology used in both the buffer and driver.

Simulations showed the total power consumption of the transmitter without equalization is 11.94 mW. Figure 6 shows the eye diagram at the transmitter and receiver without equalization.
V. TRANSMITTER EQUALIZATION

The link utilizes feed-forward equalization to improve the eye size at the output. Feed-forward equalization applies a high-pass filter to the data to cancel the low-pass characteristic of the channel. This applies a FIR filter response to the channel by using the fact that the transmitter knows what data it is sending and is about to transmit. In this design, one pre tap and two post taps are used. The form of the z-domain response is: 

\[ -az + 1 - bz^{-1} - cz^{-2} \]

The summation portion of the FIR filter is done in the current domain by attaching additional differential pairs to the two 50 Ω loads of the transmitter. These additional differential pairs are scaled relative to the main driver to provide the coefficients. The optimal coefficient was found to be 1/3 and 1/9 for the post-taps and 1/9 for the pre-tap. The polarity of the data is also swapped so that the equalization taps are effectively subtracted from the bit currently being transmitted.

The total power consumption with equalization is 18.03 mW. This is an increase of 6.06 mW. However, from the previous eye diagram, the inter-symbol interference is too large to operate this link without equalization. There was effectively no eye at the output before FFE and an approximately 190 mV eye with equalization. In Figure 9 is a block diagram of the FFE scheme and the input and output eye diagrams are shown in Figure 7.

VI. RECEIVER DESIGN

The purpose of the receiver is to detect and store all incoming data at a 1:4 demultiplexing rate. The incoming data is passed through a variable gain amplifier which will amplify the signal by a factor of three (max). The variable gain amplifier is based off the structure proposed by Buchazelli [1] with variable source degeneration on the input transistors. After amplification, the data passes to four comparator paths which are clocked off four phases of a 10GHz DLL. The output is then latched and passed on for further digital processing. Additionally there is a DFE with skewed preamplifier before the sense amp. A full receiver block diagram with decision feedback equalization is shown in Figure 11.

The sense amplifier is a modified Strongarm design [2] that is reset every clock cycle. Design considerations were made to ensure that the speed and drive strength of the comparator were maximized while reducing dynamic power considerations. Ideally, there should be nearly no leakage power loss in this sense amp structure. In order to reduce the bit error rate, the comparator was designed to allow for digital capacitive trimming on the outputs for offset correction. The full transistor level comparator design with sizes is shown in Figure 10.
VII. Decision Feedback Equalization

Decision feedback equalization attempts to improve the bit error rate by using feedback from the recovered data to cancel any inter-symbol interference that remains after other methods of equalization have been performed. Unfortunately because this equalization uses feedback, the loop delay must be less than one 10 GHz period (100 ps). Latency in the comparators is usually greater than one bit, which prevents simple decision feedback from functioning.

However, an alternative exists for simple one bit DFE. Instead of using feedback, two samplers sample the incoming data assuming the last bit was a one and a zero. While these comparators are solving the data, the previous set of comparators comes to a decision as to whether the previous bit was a one or zero. Based on the previous bit, the correct output of the comparators can be selected and then passed on to the next pair of samplers. This allows DFE to operate regardless of the delay in comparators. The block diagram in Figure 11 includes a diagram of the lookahead DFE topology.

The pre-amps perform the equalization by purposely adding an offset to the data. This offset is either positive or negative depending on what the pre-amp is assuming the last bit to be. As a note, the pre-amp with the negative offsets drives half the samplers while the pre-amp with the positive offset drives the other half. The pre-amps topology is an imbalanced CML differential pair that is sized to be able to drive the input capacitances of the samplers. Overall, the entire DFE receiver system uses four pairs of samplers running at 2.5 GHz and uses only 3.38 mW of power. Figure 12 shows the outputs of the two pre-amps without FFE equalization on a 20 cm channel (DFE is not beneficial if the signal is already equalized) and Figure 13 shows transmitted and recovered data.
VIII. CLOCK GENERATION: PLL

One option for generating the required clock phases for the transmitter or receiver is a phase locked loop (PLL). The PLL’s input is a lower frequency clock (1.25 GHz). Since multiple phases are required, a ring oscillator is used so the phases can be tapped off different locations in the ring. The PLL can also perform clock multiplication. Since four 2.5 GHz phases are required at both the transmitter and receiver, the PLL needs to multiply the input frequency by two.

While PLLs have higher phase noise and greater stability problems than DLLs, PLLs will not have phase spacing errors like DLLs. Since DLLs use a voltage-controlled delay line, any inaccuracy in the delay will mean errors in the spacing of the four phases. PLLs lock to both phase a frequency, which means while PLLs may have a phase offset, the phase spacing will be much more accurate. Also, DLLs cannot perform the clock multiplication from 1.25 GHz to 2.5 GHz.

The PLL design is a standard analog charge pump PLL. The loop filter capacitor and resistor are chosen to give the PLL a phase margin of 70° with a bandwidth 50 MHz. The phase/frequency detector is a typical three-state detector using the glitch-latch topology. The charge pump is a standard design and does introduce some phase error due to current mismatch.

The VCO is a pseudo-differential topology that uses cross-coupled PMOS transistors to form the latching mechanism to ensure the oscillator oscillates in a differential manner. The control is by adjusting the gate voltage of PMOS loads. A delay cell is shown in Figure 14.

The entire oscillator contains 8 stages with every fourth stage’s two outputs tapped to generate the four phases. In this design, the VCO is unregulated which means it is very sensitive to supply noise. A regulator would need to be included to reduce jitter causes by power supply noise. However, since the oscillator is not supply regulated, its output is rail-to-rail which eliminates the requirement of a level-shifter. The plots above demonstrate the functionality of the PLL. Figure 15 shows an example of the PLL’s control voltage settling behavior. Figure 17 shows the generated 2.5 GHz clock phases.

IX. CLOCK GENERATION: DLL

A second option for generating the required clocks that was employed in this design is the Delay Locked Loop. Rather than a VCO in a PLL, the DLL has simple a voltage controlled delay line to create the needed phases and feeds back the output of the VCDL to a phase detector. As shown in Figure 16. The DLL here uses an XOR phase detector for simplicity. This is not optimal due to the natural phase offset from the phase detector, causing a phase error at the output of the DLL. A better phase detector would have been a three state Alexander.
The VCDL is shown in Figure 18 and contains 16 tunable inverter stages with the output phases taken every four inverters. The tuning is done with a varactor by controlling the voltage coming from the phase detector and locking the input frequency to the output. In-between the first 8 and the second 8 inverters are cross coupling inverters, allowing for an even number of phase outputs. One disadvantage to the DLL is that it requires a 2.5 GHz clock to run the receiver side and cannot multiply clock frequencies.

X. PERFORMANCE SUMMARY AND CONCLUSION

The final performance of the serial link is summarized in Table 2. This report has demonstrated and analyzed a potential architecture for a high speed digital serial link in 90nm CMOS and has shown it is possible to operate at high frequencies with the correct design choices. The link operated at the desired 10Gb/s frequency with less than 30mW of total power.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Channel Speed</td>
<td>10Gb/s</td>
</tr>
<tr>
<td>Equalization Methods</td>
<td>Tx, P. DFE, VGA</td>
</tr>
<tr>
<td>Tx Input Muxing</td>
<td>2:1</td>
</tr>
<tr>
<td>Tx Driver Mode</td>
<td>Current Mode</td>
</tr>
<tr>
<td>Tx Power Dissipation</td>
<td>18mW (12 mW w/o Equaliz.)</td>
</tr>
<tr>
<td>Rx Data Rate</td>
<td>2.5 GHz (After 4:1 Demux)</td>
</tr>
<tr>
<td>Max VGA Gain</td>
<td>≈ 3x</td>
</tr>
<tr>
<td>Rx Power Dissipation</td>
<td>3.38 mW (13 mW w/3x VGA)</td>
</tr>
<tr>
<td>Clocking Type</td>
<td>Plesiosynchronous</td>
</tr>
<tr>
<td>PLL BW</td>
<td>50 MHz</td>
</tr>
</tbody>
</table>

As the scaling of transistors continues, even high data rates may need to be achieved, making serial link a major bottleneck in the transmission of data. With time other schemes will need to be employed and channel medium alternatives may need to be considered.

XI. REFERENCES


XII. AUTHORS

Brian Drost received the B.S. degree in Electrical Engineering from Oregon State University in 2009 and is currently working towards a Masters in Electrical Engineering from Oregon State University.

He is currently a research member of the Analog and Mixed Signal group at Oregon State University in Corvallis, Oregon with a focus in the areas of clocking, phase locked loops, power management, and low-noise circuit design.

Jon Guerber (S’05) received the B.S. degree in Electrical Engineering from Oregon State University in 2008 and is currently working towards a Masters in Electrical Engineering from Oregon State University.

During the Summer of 2008 he was with Teradyne Corp developing high frequency signal tracking and active power management solutions for semiconductor test devices. During the 2007 he was with Intel Corp investigating high performance, small form factor motherboard architectures to support future PC microprocessor requirements. He is currently a research member of the Analog and Mixed Signal group at Oregon State University.

He is currently working on research related to the design and implementation of active power management solutions for future PC microprocessors.

Mr. Guerber is a life member of theEta Kappa Nu Electrical Engineering Society and an Active Wikipedia Electronics Contributor.