Well, it's taken a lot of work, but I have achieved failure and am so happy! I've finally been able to demonstrate a metastable error in my own design. It took me a while, but the learning has been worth it.
Here's a quick recap of the design (the current design is Project 2 in the Meta Test portion of my Hamster Wiki):
The input test signal is first buffered by an input buffer
This signal is then fed to a flip-flip called "FF_L," where it is captured on the inverted 213MHz clock
The "Q" output from "FF_L" is passed to flip-flops "FF_A" through "FF_H," where it is captured on the "normal" 213MHz clock
The "Q" outputs from "FF_A" through "FF_H" are then passed through one more flip-flop, and then compared to check that they are all 1s or all 0s
If there is an error, the error counter is incremented, and this value is displayed on the LEDs
Below is a technology-level view of part of the design, with the important signal path highlighted:
While trying to demonstrate metastability, I've had to refine the design many times, starting with moving from an initial 32MHz clock to the 224MHz clock. In doing so, I've reduced the time that a flip-flop has available to "snap out" of a metastable state from 31ns to under 5ns. All Programmable Planet member Bob Elkind had an excellent suggestion of using the inverted clock on the first flip-flop, which halved this time again to a little under 2.5ns.
I've also discovered how to control what has to go on in that 2.5ns -- and after trying to read the Xilinx Constraints Guide, I really mean "discovered." By carefully placement of the flip-flops, I've spread the "pin delays" from 0.549ns to 1.276ns. In my simplistic thinking, this should give a region of about 0.7ns (out of 2.5ns) where a change in metastable signal might be detected.
The placement of the flip-flops requires a bit of magic wand-waving in the UCF file in order to instruct the place-and-route process where to locate the resources:
Once built, my project sneaks in at having no timing errors, but with only 0.013ns of slack in the path out of "FF_L," it is as close to the wire as I could make it. Having said this, after a night of running... I still didn't detect any errors (sigh).
A little despondently, I moved onto a related project that might help me out -- an eight-digit frequency counter using a cheap seven-segment display from Deal Extreme (you can see this project on my Hamster Wiki). After a weekend of hacking about, I had this working correct when measuring a signal derived from the development board's own clock. However, when I connected the test signal I was using to the counter, it counted only 60,000 pulses per second -- not the 25,000,000 I was expecting.
"Damn you metastability," I cried. And then it dawned on me... I reprogrammed the FPGA board generating the test signal with my "Test Signal" design, and guess what? The frequency counter now read 25,000,494pps. "I really am a Muppet," I muttered to myself.
Once I'd gotten over the fun of watching the displayed frequency change when I put my warm finger on the board's crystal oscillator, I set the "meta_test2" experiment up and left it running. Nothing happened for the first 10 minutes or so, but then there was a burst of errors. After two hours, I showed 124 errors, but then it struck me that I have only eight LEDs, so maybe I have had thousands of errors that I cannot display. I watched the display a wee while longer -- nothing changed.
What is slightly confusing to me is that the errors are coming in bursts. I've never actually observed the display tick over, so I don't know how closely the errors are clustered. Are the bursts due to the two signals heterodyning? Does it take more than one cycle for the system to recover from a metastability event? Could it be that due to the relative frequencies of the clocks, an error on "FF_A" is soon followed by an error on "FF_B" and so on?
For the moment, I don't care -- I'm going to have a beer. But I would be very interested to hear any theories you have.
Adam Taylor 11/17/2012 6:18:59 AM User Rank Blogger
Re: Unexpected code!
Brian good points I did blog about handling metastability in gate level simulations a few months ago like you say it needs careful thought and handling.
Brian Davis 11/16/2012 9:01:35 PM User Rank Clever Clogs
Re: Unexpected code!
@Hamster,
"when a flip-flop goes metastable. I don't think simulators allow STD_LOGIC to have a "metastable, so flip a coin" state :-)"
In a timing simulation of the post-implementation design exported by the tools, that sort of thing can be simulated by replacing the vendor FF primitive model with one of your own nefarious design.
If you are using the Xilinx tools, look up "ASYNC_REG" in the constraints guide, which turns off the normal "X" propagation timing checks for setup/hold violations.
Hamster, I understood and appreciating your eagerness and enthusiasm to prove some of the old unwritten theorems are not sufficient/correct. I know in certain cases, in combined logic circuits FF's exhibit some unexpected behaviors without any direct impact.
I am working on a blog which talks a bit about testbench techniques because debugging FPGAs is actually pretty hard unless you know all your modules work as you expect. here is a very simple example I wrote for a LFSR clock divider, they can get very complex and are often more complex than the design they are checking, you could develop this one to add setup and hold time checking on all the signals for example, or check that the division ratio is correct.~\Desktop\lfsr.vhd.html
1 library ieee; 2 use ieee.std_logic_1164.all; 3 use ieee.numeric_std.all; 4 5 entity test_bench is 6 ---there are no ports to delare so the entity is empty 7 end test_bench; 8 9 architecture behavioural of test_bench is10 --call the device under test from the standard 'work' library11 component lfsr_counter
12 port(13 rst, clk :instd_logic;14 clk_out :outstd_logic);15 endcomponent;16 --declare inputs and initialise them17 signal clock_in :std_logic:='0';18 signal reset :std_logic:='0';19 --declare outputs and intiialise them20 signal clock_out :std_logic:='0';21 --clock period definition22 constant clk_period :time:=10 ns;23 24 begin25 26 --instantiate the unit under test27 uut: lfsr_counter portmap(28 rst => reset,29 clk => clock_in,30 clk_out => clock_out);31 32 --generate the clock with 50% duty cycle33 -- this runs for ever it has no sensitivity list34 clk_process :process35 begin36 clock_in <='0';37 waitfor clk_period/2;38 clock_in <='1';39 waitfor clk_period/2;40 endprocess;41 --generate the stimulus for the unit under test42 43 stim_process :process44 begin45 waitfor100 ns;46 reset<='1';47 waitfor clk_period *2;48 reset <='0';49 wait;50 51 endprocess;52 55 56 57 endarchitecture behavioural;58 59 60 61
I dont think failiure is an good option to have around. True you cannot get through all the time but failing to achive many times is something which should not be tolerated.
jandecaluwe 11/8/2012 5:10:13 AM User Rank Blogger
Re: Failure in Design
"In this case, all I wanted to do was to do was create a design where I can prove to myself that metastability exists"
@hamster I think the goal was very clear and I especially like the attitude of not taking anything for granted. If @Myplanet reads your posts more closely, I'm sure he will understand why he should really congratulate you with your failure :-)
In this case, all I wanted to do was to do was create a design where I can prove to myself that metastability exists, and a single flip-flop on an input isn't enough to protect my designs from the errors that metastability can induce.
Even with the silly stuff-up I did when tinkering with the error counter I've done it - I can put my hand on my heart honestly say that metastability induced errors exists, I've seen it in one of my very own designs, and I now have an better understand what is really going on.
For example, when somebody says "sometimes my reset push-button doesn't work correctly - it is due to a metastability" I know they are most likely mistaken.
I also have a lot more understanding of the issues about releasing async resets, have a working frequency counter design that uses GPS for a timebase, learnt how to do manual placement of flip-flops, and a lot more about routing delays within a design.
jandecaluwe 11/8/2012 3:25:31 AM User Rank Blogger
Re: Unexpected code!
"Maybe somebody who knows the ISE toolset could help us out with how to better manage errors and warnings?"
@hamster To avoid misunderstandings, let me point out that I wasn't targetting any synthesis tool in particular. Actually, all synthesis tools that I know have the same flaw of letting incomplete sensitivity lists pass with a warning.
There is a lot that can be said about warnings, but let it be clear that this particular case is different from most other ones. In this case, it is impossible for a synthesis tool to generate an implementation whose behavior matches that of the model.
The equivalence with gcc would be a situation where it compiles to object code that is guaranteed to be incorrect, and then warns you about it.
"Well, it's taken a lot of work, but I have achieved failure and am so happy!"
Mike, normally developers are happy, when there are no failures or errors in their design/code. But here you become so happy, when trace some errors in your own design. What is the relevance in finding error in your own design, which means that you had a poor design.
I do the same as you with gcc, and even go one step further. I use -Werror, which makes warnings into errors. This forces my programming team to pay attention to all warnings, and helps us all improve our coding... even though it initially causes some groaning when people aren't used to it yet.
If I were an evil genius working on a plan for world domination (with regard to enterprise-level data storage solutions) I would be seriously considering building my design around a Zynq All Programmable SoC.
I would like to present to fellow readers of All Programmable Planet a new technique that I have invented to serialize data within the FPGA's main fabric at 1.5Gb/s.
As with most things, my feeling is that there is no better way to understand high-speed serial links than to implement one from the ground up, so that is what I've set out to do.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.