Home    Bloggers    Messages    Webinars    Resources   
Tw  |  Fb  |  In  |  Rss
Mike Field

Building a High-Speed Serial Link the Hard Way

Mike Field
Brian Davis
Brian Davis
2/14/2013 8:56:08 PM
User Rank
Clever Clogs
Uncrossing LVDS pairs internally with IBUF{G}DS_DIFF_OUT
@hamster
"Now that I am sended real data down (well, sync + "Hello")  the link I discovered that I did have my wires crossed..."

A handy tip for cases where a PCB layout accidentally or intentionally had the pairs swapped: there are variants of the Xilinx differential input buffers available that provide both true and complementary phases of the input signal, look up the following in the Xilinx libraries guide:

ibufds_diff_out

ibufgds_diff_out

( sometimes you can recover from a pair swap internally with a downstream logic inversion, but there are certain use cases for clocking and ISERDES pairing where direct access to the inverted signal at the IOB is required or more useful )

50%
50%
hamster
hamster
2/14/2013 8:10:30 PM
User Rank
Blogger
Re: Clock recovery with a DCM
@Brian, "swapping the +/- pairs and inverting the recovered data would patch things up"

Funny you should mention that. My initial testing has been using just 8b/10b sync codewords, which when you run back-to-back every other one is inverted to preserve disparity.

Now that I am sended real data down (well, sync + "Hello")  the link I discovered that I did have my wires crossed...

50%
50%
Brian Davis
Brian Davis
2/14/2013 7:50:03 PM
User Rank
Clever Clogs
Re: Clock recovery with a DCM
@hamster
"Seems perfectly workable if a DCM / PLL is only sensitive to the rising edges of the clock"

As I originally envisioned it, that clock recovery scheme would require[1] an input reference divider that is rising edge sensitive in order to recover a stable 1/2 rate reference clock.

Although as mentioned in the link:
>
> If the {D|P}LL phase detector & ctl logic only uses
> leading edges, and doesn't mind the wild duty cycle
> swings, you could skip the divide-by-2 and double steps.
>
-----------

The Spartan-3E DCM CLKIN divide-by-2 is mentioned in:
  Table  29 of DS312 v4.0
  Table 3-7 of UG331 v1.8
  Table   4 of XAPP462 v1.1

http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf
http://www.xilinx.com/support/documentation/user_guides/ug331.pdf
http://www.xilinx.com/support/documentation/application_notes/xapp462.pdf

And is enabled with a generic on the DCM:
 I_DCM1 : DCM
   generic map
   ( 
      CLKIN_DIVIDE_BY_2 => TRUE,
      DESKEW_ADJUST     => "SOURCE_SYNCHRONOUS",
      ...
   )

But I have not verified whether the S3E DCM divide-by-2 is rising edge triggered.

Notes:
 - stepping 0 Spartan-3E parts have a *severely* limited DCM frequency range

 - that SOURCE_SYNCHRONOUS DCM setting better centers IOB FF setup & hold around the active clock edge

 - Clock input Pulse Width and Jitter requirements are in Table 104 of DS312

-----------
> Sweet!

at least on paper :)

My original thoughts for this were to be able to use a SATA or eSATA cable for ~100 Mbit communications between bog-standard FPGA's having only:

1) LVDS I/O pins
2) a typical FPGA DCM or PLL clock generator

-Brian


[1] although if for some reason the reference divider is falling edge sensitive, swapping the +/- pairs and inverting the recovered data would patch things up.

50%
50%
devel@latke.net
devel@latke.net
2/14/2013 7:36:40 PM
User Rank
Guru
Re: I'm very impressed
Hamster:

I could never quite understand why the designers of USB1.x decided to use bit stuffing to ensure that regular clock transitions occured. If five '1's or five '0's are to be send, a dummy bit is inserted to ensure that the receiver has enough transitions to recover the clock). This makes it slower to send all zeros or all ones to a USB1 device!

Actually, USB does bit-stuffing to force a '0' on the bus only if it has sent six consecutive '1' bits. That's because USB uses NRZI where a '0' bit is represented by a transition and a '1' bit is represented by no transmission.

Obviously when sending zeros then there are plenty of transitions for the clock recovery circuit to lock on to as well as ensuring good DC balance. Thus no bit stuffing is required.

Sending a lot of ones means no transitions during that time. The bit stuffing of a '0' forces a transition which keeps the clock recovery happy.

See the Wikipedia entry on NRZI.

Anyways, absolute worst case (sending a whole lot of 0xFF data) means you lose 16.67% of your bandwidth, but I would guess that for random data the bit stuffing results in negligible bandwidth loss.

50%
50%
hamster
hamster
2/13/2013 12:47:23 PM
User Rank
Blogger
Re: Clock recovery with a DCM
I had a read of the scheme. Seems perfectly workable if a DCM / PLL is only sensitive to the rising edges of the clock...

Pretty cunning - rising edge is used for clocking, timing of the falling edge is used for sending data. Sweet!

50%
50%
Brian Davis
Brian Davis
2/12/2013 9:52:56 PM
User Rank
Clever Clogs
Clock recovery with a DCM
@hamster

I posted a slightly off-kilter (and completely untested) clock recovery scheme to comp.arch.fpga some years back for a one-LVDS-pair forwarded-clock data link that was designed to be DCM-friendly:

https://groups.google.com/group/comp.arch.fpga/msg/f475fafc9fdd07a9

By phase modulating just the falling edges of the 'clock', the modulation can be stripped with the DCM's reference input divide-by-two. This clean divided reference clock is then used by the DCM to generate a 2x phase shifted clock for receiver data sampling.

This scheme wastes bandwidth as compared to a 'real' clock recovery circuit; e.g., a 200 MHz clock with a DDR output flip-flop (400 Mbps) will produce 100 Mbps raw data rate after recovery.

This modulation scheme preserves the DC balance of the input, so an 8b10b encoder/decoder would be a useful addition for some applications.

-Brian

50%
50%
hamster
hamster
2/12/2013 5:22:03 PM
User Rank
Blogger
Re: I'm very impressed
I have only just started to get to grips with the whole subject of clock recovery. It is all very interesting.

But I do like how useful DC balanced codes are. If the coded bitstream has more ones than zeros and Automatic Gain Control in the physical layer it will wander off the ideal gain to receive the signal. 

I could never quite understand why the designers of USB1.x decided to use bit stuffing to ensure that regular clock transitions occured. If five '1's or five '0's are to be send, a dummy bit is inserted to ensure that the receiver has enough transitions to recover the clock). This makes it slower to send all zeros or all ones to a USB1 device!

50%
50%
JezmoSSL
JezmoSSL
2/12/2013 4:30:50 PM
User Rank
Blogger
Re: I'm very impressed
This is why high speed serial interfaces are encoded with Manchester code or something similar, which allows you to recover the clock at the receiver end, Ethernet uses NRZI and 5b/4b coding You can recover a Manchester coded signal with a PLL and an XOR gate

50%
50%
Max Maxfield
Max Maxfield
2/12/2013 12:58:38 PM
User Rank
Blogger
I'm very impressed
Hi Hamster (all hail the Mighty Hamster) -- I just wanted to say that I am always tremendously impressed by the way you leap into the fray with gusto and abandon -- whenever you want to learn something new you really dive down deep -- thansk so much for sharing all this stuff with us in your blogs

50%
50%
thrakkor
thrakkor
2/12/2013 12:46:15 PM
User Rank
Blogger
good project
I did a similar thing way back when on three Virtex2 devices on between 2 different boards over semi-rigid cable using LVDS. 

it was a 4:1, DDR for 8:1 total per line.  I had many, many pairs to line up on the receive end.  we ended up using the DCM phase increment capability to dial in the clock to be happy with all of the parallel serdes links, since at that time there were no fancy I/O features to do so.

50%
50%
More Blogs from Mike Field
Over the past few days I have managed to create quite a few designs using the Xilinx Memory Interface Generator (MIG) and they all seem to work.
If I were an evil genius working on a plan for world domination (with regard to enterprise-level data storage solutions) I would be seriously considering building my design around a Zynq All Programmable SoC.
I would like to present to fellow readers of All Programmable Planet a new technique that I have invented to serialize data within the FPGA's main fabric at 1.5Gb/s.
It's time to look at the receiver portion of the link. After a few false starts, Mike Field has found what looks to be a workable design.
flash poll
follow us on twitter
follow Xilinx on twitter
like us on facebook
like Xilinx on facebook
All Programmable Planet     About Us     Contact Us     Help     Register     Twitter     Facebook     RSS