As I outlined in a previous What's Next blog, I will be talking to some of the visionaries of the programmable industry to get their ideas on 1) Where have we been? 2) What are today's challenges? and 3) Where might we end up in the next several years? This will provide the rest of us with a starting point for our own discussions, thoughts, and prognostications. In five years' time we can look back and see who was the closest (and who was the furthest) from where we actually end up.
Last week I posted some comments from Steve Trimberger, Xilinx Research Labs fellow, in which we mused about "Where have we been?" with regard to programmable fabric. (This might be a good time to remind you that our very own Max Maxfield has been blogging furiously on FPGA fabric. If your knowledge in this area is a little rusty, it might be a good idea to peruse Max's last few columns to brush up on your fabric terminology before proceeding further.) Today's blog covers some of Steve's thoughts with regard to "Todays challenges" and, as usual, I have started a new message board for us to continue the discussion.
The importance of power
Steve says that he sees a couple of key challenges that will require innovative responses in the future. The most important of these appears to be the operating power. As we add more transistors to each generation of FPGAs (Moore's Law continues to apply to FPGAs), the amount of power required goes up. If this power increase isn't mitigated somehow, then it may become the limiting factor with regard to FPGA capacity. Of course, power and performance are related -- if the power requirement becomes too large, then the performance must be scaled back. For example, signal lines that cross much of the chip may require buffers to reduce signal delay. These buffers require extra power. If this is needed on a large number of signals, the overall device power requirement can skyrocket.
From Steve's point of view, power is just one more variable that needs to be balanced with cost, capacity, and features. Having said this, Steve believes that all of the easy power-saving techniques have already been found. Circuit tricks and optimizations have allowed FPGAs to make incremental power savings in each generation. Voltage scaling in particular has helped tremendously, but how much further can we scale? More recently, architecture changes (like those discussed in my previous blog, such as larger lookup tables with more inputs, dedicated logic functions for DSP and arithmetic, better usage of block memory, etc.) have allowed the FPGA fabric to do more with less power. So where can the next round of power innovations come from?
Another key point about power optimization is that it is important to provide designers with easy-to-use tools. It isn't an optimal solution, from Steve's perspective, if the tools require too much specialized knowledge and/or designer intervention in order to control the FPGA's power consumption. The tools should allow the customer to identify goals, levels of importance, and other system-level characteristics, but they should not require the user to operate at the gate level to control power. This looks to be another area for innovation. Creating tools that allow customers to innovate, but that also provide automated control for power optimization (along with the normal capacity and performance goals) may require some additional capabilities that don't currently exist. It all comes down to the right combination of fabric and tools.
Are there other FPGA challenges you think will be important in the next few years? What types of design goals have been difficult for you to achieve with current FPGA devices? Please jump to the FPGA Fabric: What Are Today's Challenges? message board and add your thoughts and comments.
Max Maxfield 6/27/2012 10:24:16 PM User Rank Blogger
Some thoughts
There are two main portions to the FPGA -- the hard core functions like a microcontroller subsystem and the regular programmable fabric.
In the case of the MCU it should be possible to completely power down and unused functions, like a UART for example.
In the case of the programmable fabric, there are a number of techniques that can be used. One is disabling clock signals and sub signals when they aren't being used. Another involves identifying critical paths and making their logic blocks switch faster but use more power; the blocks used for non-critical tags could be set to switch slower and use less power. There was one company that used this technique ... I think it was Altera on some of their parts ... but I don't know if they still do it in the latest parts ... I'll have to ask them and then report back further.
Max Maxfield 6/28/2012 3:13:15 PM User Rank Blogger
Re: Some thoughts
With regard to my previous comment, I touched base with someone I know at Altera who brought me up to date. Apparently they have several patented technologies around this area; they call it Programmable Power Technology, and they offer it in their high-end device families (namely Stratix). They introduced it in their Stratix III (65nm) parts and they've migrated to their later 40nm and now 28nm Stratix device families.
In a nutshell, this technology allows users to maximize performance and minimize power consumption. Users (by which we mean the engineers creating the design) have the ability to turn up the performance levels of certain logic blocks that require high-performance and turn down the performance level of logic blocks that don't need it.
My contact says that this significantly reduces the power consumption in their high-end device family. (Actually he also says "It's a kick ass technology").
An improvement in memory technology could significantly impact power use. A persistent (non-volatile) memory would tend to use significantly less power than SRAM. For typical LUT uses, one could even sacrifice write energy and time. A denser memory would also tend to reduce the distance between logic elements and so reduce power consumption.
It might be practical to combine memory technologies within a LUT such that a dense (and low-power) memory that is perhaps less reprogrammable (or has some other disadvantage) is used for some configuration. Such might be useful in conjunction with more complex logic elements; the dense memory might be used to establish a 'family' of logical operations while the less dense memory is used to select a specific instance.
There might even be a place for EPROM-style memory where the memory cannot be erased in the field. This would avoid some of the issues with OTP memory, though one might need a slower and/or higher-power SRAM-based version for development.
In the more distant future, there might be ways to directly rewire logic and routing (rather than having a memory that provides a zero or one, the "memory" provides a left or right) and perhaps even (eventually) to convert individual transistors (e.g., by somehow changing N regions into P regions and vice versa).
One might also be able to save power by clocking tricks. There is a many-core design (the name does not come to mind) that uses a clock wave rather than a clock tree. By only guaranteeing minimal clock skew at neighboring nodes in a grid layout, power can be saved.
Various forms of asynchronous design could also be useful.
I do not know if wave pipelining (where multiple waves of operation can coexist within a single clock cycle, avoiding the need for finer-grained latches and a faster clock) would apply to FPGAs, but such could reduce delay or power slightly.
The regularity of the logic array might make timing-based design less impractical (I am guessing).
A denser LUT memory might also make something like Tabula's multiple configurations used at different times even more practical. If I recall correctly, Tabula claims their system increases utilization of logic elements and reduces area (which reduces communication power/delay).
How aggressive are the circuit optimizations of current FPGAs? I could imagine that one could make tradeoffs between speed, area, switching power, and static power. These tradeoffs could be targeted toward typical use rather than arbitrary use, penalizing atypical use by time/performance and/or power while optimizing for the typical use. (E.g., there is a technique which reduces leakage for a SRAM cell with a specific value. By biasing the use of values, some leakage power use could be avoided.)
I suspect that the use of larger logic elements would increase the opportunities for optimization.
There might even be a place for heterogeneous logic elements and routing where the placement of operations could significantly impact performance or power. (I do not know what granularities of placement would be appropriate [e.g., one could imaging row-level granularity where every Nth row has logic elements with different functions]. There would not seem to be a difference between larger logical elements and localized heterogeneity, though such a different way of looking at such might make some optimizations more apparent.)
As an example of heterogeneous routing, there might be a place for permutation or bit rotating units scattered through the fabric. (I seem to recall reading that one weakness of FPGA processors is in handling variable shifts/rotates.)
Connecting hardened circuits with the programmable logic also seems to be an area of potential improvement. Placement of the hardened circuits would seem to be important. There might also be ways to connect portions of a hardened design to more programmable logic. E.g., some DSP functionality could actually be designed into hardened DSPs. Such hardened DSPs would be less area and power efficient than a DSP design that did not also connect multipliers to the external fabric, but the sharing might be a net gain in some cases. (Some of the Sun Niagara processors shared the processor multipliers with the encryption co-processors. I wonder if altering FPGA multipliers to better support encryption would be useful. I seem to recall that the Sun method added about 5% to the area of the multipliers.)
Max Maxfield 6/29/2012 10:20:38 PM User Rank Blogger
Re: LUT memory technology
@Paul: Yes, Tabular have what they call their Spacetime architecture. If you think of a traditional programmable logic block as comprising a LUT, a register, a multiplexer, and so forth. The what tabular do is to have a 1.6 GHz clock and switch the logic block between 8 different contexts (which they call "folds") for each user clock (which means that the fabric still seems to run at a respectable 200MHz as far as the user is concerned.
Of course this doesn't mean that you get a 1/8 size chip. You still have to store the 8 different contexts in local on-chip RAM, but this is much smaller that regular programmable fabric. The end result is to shrink a traditional architecture to approximately 1/3 of the size. The big thing here is that this reduces the average track length by more than 70%, which the folks at Tabular say makes achieving timing closure much easier.
Also, as opposed to shrinking a regular chip to 1/3 of it's size, another way to look at this is that you can build a chip the same size as the regular chip but with 3X the capacity.
Oh yes, I almost forgot. The RAM blocks in regular FPGAs are dual-port RAMs, but the same blocks in Tabula's chips act like 8-port RAMS, which I believe significantly reduces routing congestion (but don't ask me why)
When traversing serial links with optics or backplanes, high-speed signals are degraded by impairments in the link, such as insertion loss, reflections, crosstalk, and optical dispersion.
Warren has finally started to write some HDL code to implement his chess-playing FPGA, but he's not a professional coder, so he needs our help and advice.
What might we see in new Ultra Low Density (ULD) CPLD families three-to-five years down the road? Are there new technologies or programmable structures that will find their way into ULD devices?
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.