I remember the first very large FPGA design I attempted. I did not really floorplan the design well; instead, I relied on the synthesis and place-and-route (P&R) engines to "Make the Magic Happen." The design ended up having a 6-bit bus go all the way across the chip to a block RAM and then come back again. As things turned out, the part was large enough that it had enough routing reserves to pull this off. Also, we were lucky that the design only needed to run at 20MHz -- so we managed to get away with it.
Fortunately, Xilinx offers the PlanAhead and PlanAhead Lite floorplanning tools to help alleviate these, and other, issues. Some of the benefits of floorplanning an FPGA design include the following:
In the process of floorplanning, partitioning between major functional boundaries allows timing compliance to be considered on a block-by-block basis.
A floorplan targeted at minimizing the trace lengths of high-activity nets can reduce the switching power consumption of a design.
Some of the key points to remember when creating a floorplan are as follows:
The floorplan is a critical link in the timing closure loop.
A bad floorplan can dramatically reduce the performance of a design.
Floorplanning is a good fit for layouts dominated by routing delay or for highly pipelined designs.
A floorplan usually includes the data path but excludes the control and glue logic.
A floorplan should consider the FPGA resources such as Block RAM, DSP slices, carry chains, CPU cores, etc.
High fanout nets may be worth floorplanning.
Keeping all this in mind, let's correct my old design problem on (virtual) paper. First, consider the original design:
The original, un-floorplanned design.
As we see, the dataflow in the original design goes back and forth across the FPGA rather than forging a simple, clean path. Observe that the length of the datapath is close to three times what it needs to be. Also think about the impact on timing and power consumption associated with switching (charging and discharging) these long routes in the FPGA fabric.
Now let's consider the corrected floorplan as illustrated below:
The new, improved floorplanned design.
In this new, improved floorplan, the dataflow path is reduced and does not traverse back and forth across the FPGA. Observe how the routing paths are reduced, which means there is less path length to charge and discharge, thereby improving the timing (increasing performance) and reducing power consumption.
You are Definately Right -- there probably is a Rents Rule Equivalent for FPGA's, but It also is Design Dependant -- A rats nest of combintorial logic is different than a pipelined data-flow block, is different than a packet switch, is different than a DSP chain -- etc -- There are tools that are in R&D in the Universities that will Work out the Best Floorplan and pass that off to the Floorplanner -- Based on the design type.
Rent's Rule can be used to estimate the average wirelength and the wirelength distribution in VLSI chips, and I assume the same thing holds for FPGAs.
In following with Rent's rule, small designs will be constrained by logic speeds, but as a design grows it will become constrained by routing.
That is why until I started working on larger designs on larger chips I was convinced that floorplanning was something you do to get you the last 10% of performance, now I have a few larger devices it seems controlling placement is essential to getting the last 50%!
However, even if you don't do it explicitly, you end up controlling placement if you use non-global FPGA resources.
- If you use a CLB's carry chain you tie the CLBs into the same column.
- If you use the DSP 'cascade' features, you are constraining the physical placement of the DSP blocks to be directly above each other.
- In some devices DSP blocks and RAM blocks have dedicated routing to some ports or other dependancies that can restrict placement.
Being aware of this features and the implict constriants the create can make a big difference, and can even be a factor in allowing a design to fit.
It also explains with Xilinx take a lot of time explaining how these features work. A DSP implmented as an adder tree will use generic routing resources have much lower performance than a design which leverages the cascade ports, and the DSP blocks might be either ends of the FPGA die!
Talking of placement, I wonder how good XST would be at playing Tetris? :-)
I think the newer tools are able to handle much larger designs without floorplanning them. I've also gotten older and wiser and get as much of the design working on a Starter Kit, or Prototyping System before locking down the pins on a board layout. It also helps to not squeeze the last ounce of logic out of a part and keep it at 80% or so.
My general approach is to let the tools worry about internal placement, until they fail. A suboptimal placement is fine if you meet your timing constraints.
Now it turns out that, at least with S3, the tools are too stupid to realize that specific BUFGs go with specific GCLK pins, and I've seen the placer choose a BUFG on the other side of the chip from the GCLK input. So in this case you have to lock down the BUFG and the DCM.
With S6 I had to lock down specific BUFIOs and delay elements, again because the tools were stupid.
If I have to dig into the floorplanner for logic-timing reasons, I likely have bigger issues.
Adam Taylor 12/20/2012 12:01:46 PM User Rank Blogger
Very Interesting
William a very interesting subject, many people think that writing the RTL is all that really is needed and then the tools chains handle the rest. However if you want to really acheive the best performance for these devices and use the more advanced features you are going to have to get involved within the floor planning and timing closure.
I am a big fan of PlanAhead, it is useful when IO planning a device or creating a complicated design which requires multi interation place and route.
I think floorplanning is a special case. As designers, we must rely on the normal implementation tools (synthesis, MAP, PAR) to do their jobs, as well as designing in a way to allow them to do so.
for Xilinx designs, I think experimenting with different "strategies" and making design edits are a better way to close the timing closure loop. try different max fanouts in synthesis and rerun different backend strategies, say with SmartXplorer.
try area constraints, before floorplanning. this has a floorplanning like effect, but is flexible as the design and netlist changes, whereas floorplanning could get screwed up if the netlist changes and you don't take steps to guard against it.
by all means, learn how to floorplan, but I think starting out with floorplanning is a waste of time. I really think floorplanning is a last resort.
and PlanAhead I think is pretty convoluted. I much preferred FPGA Editor, PACE and floorplanner as independent tools.
When extreme thermal cycling causes circuit boards and chip packages and the silicon die in the packages to expand and contract at different rates, problems may ensue.
In order to simulate a design we need models that represent the functionality and timing characteristics of our design elements, but the timing aspects of these models may be based on uncertain data.
Designing high-temperature electronics can present many challenges for "down-hole" petroleum equipment, ovens and micro-waves, automotive, medical, aerospace, and other applications.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.