DDR3 Length Matching – ZYNQ 7000

Posted by Bijan in Electrical Engineering

Lately DDR3 is becoming more prevalent in new custom designs however I find that there isn’t much comprehensive document available for newcomers to the DDR franchise. Here I wrote down my own bootstrap on learning DDR3 jargons and way up to design and understand underline rationale behind the DDR3 length matching.

Why DDR

The first question you might ask is why to use DDR. All signals we send on wire or through air are occuping some bandwidth. On spectrum side, the signal bandwidth are so invaluable but what about on the PCBs. Is it important? Yes. The higher the bandwidth, the more delicate design is required and a higher loss would be observed on the signal path. This means that the higher frequencies portion would be attenuated more and to compensate that you need some sort of equlizer circuits like analog DFEs and … . But why bother to send higher bandwidth signals? In short, to acheive more data rate. So heres the thing. In these cases, DDR is used to maintain the bandwidth and at the same time doubles the data rate! How? You know…, by send and receive signal in both clock edges. and you know the rest.

DDR1, 2 and 3 Whats the difference

After that lets talk about the analogy between each ddr generation. In each leap in ddr generation the bus frequency would become higher and Vcc will be lower but except from these, the main improvement in each new version of DDR is incremental increases in prefetch bit size. Before I explain this more, you should know that apart from memory data rate, the internal clk frequency of DDR chip is not more than of 200MHz. So how a 200MHz memory could transfer data at rate of up to 102.4 Gb/s . The answer is by using as many as pins as possible, and using a serialising/deserialising technique.

DDR1 had prefetch size of 2 bit so the IO rate is doubled of internal rate by multiplexing 2 bit and send it in higher frequency on a single line. DDR2 has prefetch size of 4 bit and DDR3, 8 bit. Thats how DDR memories maintain their internal clk while acheiveing higher troughput in each new generation.

DDR Pins

Back to the main topic, lets introduce signaling of DDR chips. I assume a DDR3 chip here but you should able to generalize it to other DDR generation as well. At the top level, The DDR signals are divided into two categories, Data and nonData(CNTL/CLK/Address). The first thing is although it may not seem like it but these two categories have no relation with each other. Data category or DQ for short (Q used becuase of the same reason it used in Flip-Flop) have it’s own clock, DQS (Data strobe), while CNTL/CLK/Address are syncronized with other signal which is CLK. Therefore for routing (layout phase) on PCB each category should length match to ONLY it’s own members.

Behind The Scene

If you are not intrested in knowing how this is possible, please move on. So how Data send and recieved without syncronizing with addresses and control signals? The answer is they are synced at inside of the chip. DDR used a DLL (Delay Locked Loop) which add digital delay to signal to sync data and other signal from inside controller point of view. This pull out the pressure of matching all signal flight times and allow more degree of freedom and ease of use for DDR routing at the board level. The delay value will determined during Memory Training phase. This process is run after power up sequence and calibrate the delay of data group to the clk. The only requirment here is to have CLK be delayed longer than the longest DQS pair.

Byte Lane

Lukily data signals also breaks down into smaller group and each group has it’s own data strobe (CLK). With each byte lane have an independent clk, the contraint for length mathing diminish once more and only length matching inside a byte lane is required.

Data Group: $M D Q S (8 : 0), \bar{M D Q S} (8 : 0), M D M (8 : 0), M D Q (63 : 0), M E C C (7 : 0)$

Address/CMD Group: $M B A (2 : 0), M A (15 : 0), \bar{M R A S}, \bar{M C A S}, \bar{M W E}$

Control Group: $\bar{M C S} (3 : 0), M C K E (3 : 0), M O D T (3 : 0)$

Clock Group: $M C K (5 : 0)$ $\bar{M C K} (5 : 0)$

Final Note

Following checklist should be served as a good epilogue for this post:

Transmission Line Impedance should be 40Ohm with 10 percent tolerance
CLK trace should be longer than the longest DQS path
DQ signal are allowed to swap inside of a byte lane
Cross talk is an important issues in DDR3 SI. micron have a good constraint about how much signals can be tightening together.
For ZYNQ-7000 10ps is the length mathing margin. It translate to 10mils on a FR4 substrate.

Useful Links

StackExchange – Rams and DQ lines

GitHub – Understanding DDR Memory Training

Xilinx – DQS to CK maximum on Zynq PS DDR Controller

Is it good practice to length match all traces of DDR3