Well, it's taken a lot of work, but I have achieved failure and am so happy! I've finally been able to demonstrate a metastable error in my own design. It took me a while, but the learning has been worth it.
Here's a quick recap of the design (the current design is Project 2 in the Meta Test portion of my Hamster Wiki):
- The input test signal is first buffered by an input buffer
- This signal is then fed to a flip-flip called "FF_L," where it is captured on the inverted 213MHz clock
- The "Q" output from "FF_L" is passed to flip-flops "FF_A" through "FF_H," where it is captured on the "normal" 213MHz clock
- The "Q" outputs from "FF_A" through "FF_H" are then passed through one more flip-flop, and then compared to check that they are all 1s or all 0s
- If there is an error, the error counter is incremented, and this value is displayed on the LEDs
Below is a technology-level view of part of the design, with the important signal path highlighted:
While trying to demonstrate metastability, I've had to refine the design many times, starting with moving from an initial 32MHz clock to the 224MHz clock. In doing so, I've reduced the time that a flip-flop has available to "snap out" of a metastable state from 31ns to under 5ns. All Programmable Planet member Bob Elkind had an excellent suggestion of using the inverted clock on the first flip-flop, which halved this time again to a little under 2.5ns.
I've also discovered how to control what has to go on in that 2.5ns -- and after trying to read the Xilinx Constraints Guide, I really mean "discovered." By carefully placement of the flip-flops, I've spread the "pin delays" from 0.549ns to 1.276ns. In my simplistic thinking, this should give a region of about 0.7ns (out of 2.5ns) where a change in metastable signal might be detected.
The placement of the flip-flops requires a bit of magic wand-waving in the UCF file in order to instruct the place-and-route process where to locate the resources:
Once built, my project sneaks in at having no timing errors, but with only 0.013ns of slack in the path out of "FF_L," it is as close to the wire as I could make it. Having said this, after a night of running... I still didn't detect any errors (sigh).
A little despondently, I moved onto a related project that might help me out -- an eight-digit frequency counter using a cheap seven-segment display from Deal Extreme (you can see this project on my Hamster Wiki). After a weekend of hacking about, I had this working correct when measuring a signal derived from the development board's own clock. However, when I connected the test signal I was using to the counter, it counted only 60,000 pulses per second -- not the 25,000,000 I was expecting.
"Damn you metastability," I cried. And then it dawned on me... I reprogrammed the FPGA board generating the test signal with my "Test Signal" design, and guess what? The frequency counter now read 25,000,494pps. "I really am a Muppet," I muttered to myself.
Once I'd gotten over the fun of watching the displayed frequency change when I put my warm finger on the board's crystal oscillator, I set the "meta_test2" experiment up and left it running. Nothing happened for the first 10 minutes or so, but then there was a burst of errors. After two hours, I showed 124 errors, but then it struck me that I have only eight LEDs, so maybe I have had thousands of errors that I cannot display. I watched the display a wee while longer -- nothing changed.
What is slightly confusing to me is that the errors are coming in bursts. I've never actually observed the display tick over, so I don't know how closely the errors are clustered. Are the bursts due to the two signals heterodyning? Does it take more than one cycle for the system to recover from a metastability event? Could it be that due to the relative frequencies of the clocks, an error on "FF_A" is soon followed by an error on "FF_B" and so on?
For the moment, I don't care -- I'm going to have a beer. But I would be very interested to hear any theories you have.