In this blog, I'm continuing a mini-series on the development of Serializer Deserializer (SerDes or SERDES) functions embedded into FPGAs.
In my previous post on this topic, I discussed some of the background on SerDes technology in FPGAs with Mike Peng Li, Altera Fellow, and Salman Jiva, senior marketing manager, high-end products, from Altera. Today, we continue the discussion and review some of the key challenges facing SerDes technology in FPGAs. (If you are new to SerDes technology, a good resource to review the basics is a DesignCon 2004 paper from Dave Lewis, titled SerDes Architectures and Applications.)
Today's challenges
In my previous post, we discussed how process node parity, market need for more flexible solutions, a compelling time to market imperative as standards were solidifying, and FPGA company involvement in determining and defining key elements of new high-speed serial interconnect standards created a perfect market window for FPGA companies. The result is that SerDes technology in FPGAs is now a key market driver. This doesn't mean that FPGA companies can rest on their laurels, however; more than ever, they must push forward with new features and capabilities to stay ahead of alternative implementation choices. Perhaps the best way to identify some of today's challenges is to review some key features that have been recently added to FPGA SerDes functions. This may also give us a good idea as to where this capability will be going in the future.
One of the most obvious challenges is the increasing number of serial communications standrads that an FPGA must be able to address -- legacy, established, and emerging. For example, at the 8Gb/s rate we see a range of established and legacy stadards for: PCIe 3.0, PCIe 2.0, Interlaken, Serial RapidIO, CPRI+, OBSAI 4.0+, STAT 3.0, SATA 2.0, SPAUI, DDR-XAUI, QPI, HyperTransport 3.0+, HighGig+, HighGig2+, OIF/CEI 6G-SR, OIF/CEI 6G-LR, and 4G FC (to name just a few).
Similarly, at 10Gbit/s we have legacy and established standards for: IEEE 802.3ba 40G/100G/10GBASE-R/10GBASE-KR, 10G PON/EPON, OIF SFI-S, OIF SFI-5.2 (40G), 10G Interlaken, SONET/SDH OC-192 (10G/40G), SFP+, XFP, OIF/CEI 11G-SR/LR, OTU2/3/4, 10G SDI, and 10G Infiniband.
And, as if all of this wasn't enough, at 28Gbit/s we see the following emerging standards: OIF/CE 28G-SR/VSR, IEEE 802.3ba 100G, 32G FibreChannel, and 25G Infiniband (so far).
Many of these standards have very specific requirements, so the SerDes designs used in FPGAs must be able to satisfy these requirements using a very flexible implementation, but without using too much power, increasing jitter, or increasing error rates. This presents a tough challenge, but one with which the FPGA manufacturers have some experience (and a few tricks up their sleeves) at overcoming.
Building a solid base
Perhaps one of the things the FPGA folks have created as a key core competency over the years is the ability to create some solid "foundation" capabilities that make it easy to then layer on top the more specific features required by specific SerDes requirements. As examples, very low jitter clocking and low power output drivers were used by Altera in the Stratix V SerDes designs. Low clock jitter will help improve all the other clock-related functions (and there are many) in SerDes designs. Low power drives will improve power efficiency of the typically power-hungry high-speed SerDes outputs on any standard interface implementation. Let's look at these two examples in more detail.
Control your jitter, Captain
A major component of jitter in a transceiver typically comes from the oscillator located within the PLL (phase-locked loop) circuit. Two of the most widely-used oscillator circuits are the Voltage Controlled Ring Oscillator (VCRO) and LC tanks (LCs). VCROs have a wide frequency tuning range, from 1 to100MHz to 1 to 10GHz. This covers a trmendously wide data rate, but the VCRO is sensitive to front-end noise spurs, power supply noise, and substrate noise. An LC offers superior phase-noise performance due to its highly selective and high-Q LC tank. With finer process geometries, it is now possible to integrate inductors on chip that occupy only a small amount of die real estate. Stratix V offers a ring oscillator and an LC oscillator as clock sources for the transmit (TX) clock. This allows the application to optimize for both the required frequency range and a small jitter budget.
Adam Taylor 2/21/2013 3:19:30 AM User Rank Blogger
My Challenge
SerDes especially the MGT are very interesting technology, we are already investigating high end ADC and DAC for flight which interface using this technology.
The only problem I have at the moment is finding suitable cabling solutions for breaking out and fanning out a large number of these MGT links.
I would like an example (link), and why it is (so) poor, so I may go fix the problem, if at all possible.
Here's a specific example. UG381 (Spartan 6 SelectIO Resources), in the chapter on "Select IO Logic Resources," there's a discussion called I/O DELAY Overview. It's rather baffling in some respects. For example, there's the list of signals in Table 2-8 (in version 1.4 of this doc). The description of IOCLK0 and IOCLK1 have no correlation to how those inputs are used in an example.
IOCLK0: "This is the primary clock input when the clock doubler is engaged."
IOCLK1: "This is the secondary clock input and is only used when the clock doubler is engaged."
Fantastic. What the heck does that mean? And then there's the signal:
CLK: "This is the clock for the FPGA logic interconnect domain." How about saying what that clock is used for in the IODELAY2 block?
Also, apparently if you have a differential input clock, you're supposed to use two IODELAY2 primitives, one for the inverting and another for the non-inverting sides, also you need an IBUFGDS_DIFF_OUT on the input. The mapper wouldn't accept that.
UG382 (Spartan-6 FPGA Clocking Resources) mentions all over the place that you can use IODELAY2 on clock inputs as well as data inputs. But there are no pictures that show usable configurations. Page 30 of v1.7 of that guide says SDR Data Rate (FD register in IOB, no IOSERDES2) and there are two pictures. The descriptions of both figures include the line, "Works With Or Without IODELAY2." Same for the DDR, and for the ISERDES2 pictures. BUT -- show me where I put the IODELAY2 for these guys. A picture, please.
"I would like an example (link), and why it is (so) poor, so I may go fix the problem, if at all possible."
Chapter 3 of "UG381 Spartan-6 FPGA SelectIO Resources" includes timing diagrams, but it does not show how you can implement those timings.
You need to have studied Figure 4, page 34 of the UG382 Spartan 6 Clocking Resources User Guide if you are to stand a chance of implementing it!
I don't want to be mean, but since you offered....
Oh, and just open "UG625 Constraints Guide" to pretty much any page in chapter 3. It reads like an internal engineering document that sneeked out into the wild.
It perfectly describes how to specifiy a constraint but puts zero context around it. Take for example the entry for "IOB", a relatvely simple constraint:
This option allows flip-flop or latch primitives to be pushed into the following on a global scale: • Input IOB (i) • Output IOB (o) • Input/output IOB (b)
What? Since when is "pushed" in an FPGA end-user's vocab? What does "global scale" mean - Will it really push a flipflop to the other side of the planet? I pick Paris. I've always wanted a latch in Paris.
How about something like:
This option insures that the associated flip-flop or latch will be located with the device's I/O Block. It can be applied to input, output or bidirectional signals.
In doing so this ensures that minimal delays occur between the device pin and the sink or source of the signal within the device.
Not long ago, I spent a good week working through getting a simple deserializer (without any gigabit stuff, either, just the standard ISERDES) with IODELAY2 to work in a Spartan6. The documentation for the clocking and the data path is woefully inadequate. Following what was noted in the user guide simply wouldn't even get past the mapper. This was with locating everything, too, according to the documented rules.
I sent the design to my local FAE and I also opened a WebCase on it. It took the FAE a couple of days to work it all out, and he said that, "you started with what was in the examples and the docs and it should work, but ..."
Sorry, but in this specific instance the docs were a total fail.
BTW, it was XAPP1064 I was using as well as the clocking and SelectIO user guides.
How about this: absolutely piss-poor documentation of the gigabit transceivers themselves, and even worse documentation for the clocking mechanisms needed to run the transceivers?
I've used the Spartan 6's SERDES2 to send TMDS output.
Getting the clocking to work was painful. I found out a lot more about the clocking infrastructure of the IO Banks that I ever wanted to know! Unless I really need the speed I am going to stick with DDR while I play with my little hobby desgns.
Warren has finally started to write some HDL code to implement his chess-playing FPGA, but he's not a professional coder, so he needs our help and advice.
What might we see in new Ultra Low Density (ULD) CPLD families three-to-five years down the road? Are there new technologies or programmable structures that will find their way into ULD devices?
We are ready to consider how to use our Move Generator to traverse the tree of possible moves efficiently and find the sequence that produces the best board position.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.