Many an engineer has designed an FPGA that has stretched the power supply capability of his board. Alternatively (and perhaps more likely), due to thermal constraints and the fact that leakage current goes up as the board gets hot, he'll have created a design that just does not like to run when the FPGA gets hot. What is one to do about this, given the difficulty of forecasting FPGA power requirements prior to having a completed design?
Well, many FPGA designers are not aware that there is an option to "Synthesize for Low Power" buried in the settings of most of the current synthesis tools. This can take a design that just will not work power-wise or thermal budget-wise and bring it into specification. What are some of the ways that these tools can do this?
During synthesis: The first technique is by means of clock gating -- many of the synthesizer tools are able to see when clocks are not needed and gate them off, cutting the power requirements significantly for those sections of the design.
The next approach is to use some form of "sleep mode." In this case, many of the synthesizers are able to see when a resource such as a multiplier is not in use and to de-select its chip-enable, thereby cutting power.
Post synthesis: Following the main synthesis process, the tool can perform additional gate-level optimizations on the netlist before sending it to place-and-route (P&R). Some of these techniques include:
Sizing the gate for the load that it presents or is driving.
Swapping pins on gates so that high frequency output signals are presented with the lowest capacitance routes and input pins.
Removal of unessential buffers and gates. To quote this whitepaper from Cadence on its tool: "Sometimes timing optimization adds buffers to shield the critical path from a high capacitive load. During timing optimization, the critical path itself could shift. The result is that there are sometimes unnecessary buffers on noncritical paths."
Improving slew rates on slow rising signals by adding buffers where they will benefit power savings more than they consume.
Performing logic restructuring. This is a complex process, but the main idea is that portions of the design are replaced with circuits that use less power but that are functionally identical with regard to their outputs.
In practice, later versions of the Xilinx ISE Design Suite were able to take a 250K Gate Spartan 3E from over 300mW to about 225mW when running a 100MHz Ethernet custom MAC application when the "Synthesize for Low Power" option was selected.
Have you had the opportunity to optimize for power, and -- if so -- what have been your experiences?
Max wrote: "I was just thinking about how originallt gate count was the limiting factor, then timing and area, now power... what next?"
Related to the power issues are bandwidth and latency issues. Like power (connections and heat removal), off-chip bandwidth is limited by surface area (optical interconnects could be a big help in the not too distant future).
On-chip bandwidth and latency are also influenced (like power density) by the shrinking transistor--while keeping chip area more-or-less constant. The number of times a transistor can switch in the time required to send a signal across the chip has been increasing. The number of nodes of N transistors that can fit on a chip is increasing, so a direct all-to-all network will take up more space and power (and more space and power leads to higher latency). A simpler network like a grid will have an impact on latency and bandwidth between arbitrary nodes.
3D can help with "on chip" bandwidth and latency, but would seem to make off-chip bandwidth worse (n-cubed computation and local storage with n-squared surface area for off-chip interconnect vs. n-squared for both with 2D). (The same applies for 3D and power. Local communication will use less power--and reduced latency and improved bandwidth will mean less semi-idle time with static power consumption implications--, but the power/heat use/production [barring dark silicon] would increase at n-cubed while the input/extraction would increase at n-squared.)
One might also note the "software wall". The constraints on single-thread performance (power and memory walls being major contributors) seem to be pushing for explicit parallelism (vectorization and multithreading/multiprogramming) which introduces difficulties in programming (better languages/libraries and programmer education can help, hardware features can help [e.g., performance counters and transactional memory], but some problems are probably just difficult). Use of more power-efficient accelerators can also help, but such also has implications for programming difficulty.
The technology for power-efficiency optimization of software is likely extremely immature (even less mature than that for parallelization).
Now days there are three orders of magintude of gates more in each class of device than there were a few years back -- power has to matter -- or the devices would be hotter than the surface of the sun!
Max Maxfield 8/5/2012 11:06:45 AM User Rank Blogger
Before power was a consideration
It's funny -- when I started designing ASICs back in 1980, the one thing we never worried about was power -- everything we designed was the size of fridge/freezer and power simply wasn't on our list of thngs to worry about. I wonder what the next thing will be...
That white-paper looks like another good trick for the new parts -- It runs after synthesis, unlike the Synopsis, or Cadence, during synthesis optimizations--
As I mentioned under Warren's blog, the race is on. And by eliminating the gates in eg 8051 IP, you can get 2000 gates instead of 10 or 15k. And then you get just 0.081mW/MHz in 180nm process.
Max Maxfield 8/2/2012 8:57:45 AM User Rank Blogger
Strange coincidence
@Warren: Thanks for this article -- very timely -- and what a strange coincidence that this came so close to Warren's blog: FPGAs & Low Power: Where Have We Been? http://bit.ly/MeyXrm
Today's FPGAs already integrate a substantial amount of "stuff" (MCU cores, programmable fabric, on-chip memory, etc.), so what's left to integrate and why is this being left for the future?
When extreme thermal cycling causes circuit boards and chip packages and the silicon die in the packages to expand and contract at different rates, problems may ensue.
In order to simulate a design we need models that represent the functionality and timing characteristics of our design elements, but the timing aspects of these models may be based on uncertain data.
Designing high-temperature electronics can present many challenges for "down-hole" petroleum equipment, ovens and micro-waves, automotive, medical, aerospace, and other applications.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.