Home    Bloggers    Messages    Webinars    Resources   
Tw  |  Fb  |  In  |  Rss
Warren Miller

What's Next: The Future of FPGA Fabric & Today's Challenges

Warren Miller
Max Maxfield
Max Maxfield
6/29/2012 10:20:38 PM
User Rank
Blogger
Re: LUT memory technology
@Paul: Yes, Tabular have what they call their Spacetime architecture. If you think of a traditional programmable logic block as comprising a LUT, a register, a multiplexer, and so forth. The what tabular do is to have a 1.6 GHz clock and switch the logic block between 8 different contexts (which they call "folds") for each user clock (which means that the fabric still seems to run at a respectable 200MHz as far as the user is concerned. Of course this doesn't mean that you get a 1/8 size chip. You still have to store the 8 different contexts in local on-chip RAM, but this is much smaller that regular programmable fabric. The end result is to shrink a traditional architecture to approximately 1/3 of the size. The big thing here is that this reduces the average track length by more than 70%, which the folks at Tabular say makes achieving timing closure much easier. Also, as opposed to shrinking a regular chip to 1/3 of it's size, another way to look at this is that you can build a chip the same size as the regular chip but with 3X the capacity. Oh yes, I almost forgot. The RAM blocks in regular FPGAs are dual-port RAMs, but the same blocks in Tabula's chips act like 8-port RAMS, which I believe significantly reduces routing congestion (but don't ask me why)

50%
50%
Paul A. Clayton
Paul A. Clayton
6/28/2012 7:38:46 PM
User Rank
Beginner
Circuit optimizations?
How aggressive are the circuit optimizations of current FPGAs?  I could imagine that one could make tradeoffs between speed, area, switching power, and static power.  These tradeoffs could be targeted toward typical use rather than arbitrary use, penalizing atypical use by time/performance and/or power while optimizing for the typical use.  (E.g., there is a technique which reduces leakage for a SRAM cell with a specific value.  By biasing the use of values, some leakage power use could be avoided.)

I suspect that the use of larger logic elements would increase the opportunities for optimization.

There might even be a place for heterogeneous logic elements and routing where the placement of operations could significantly impact performance or power.  (I do not know what granularities of placement would be appropriate [e.g., one could imaging row-level granularity where every Nth row has logic elements with different functions].  There would not seem to be a difference between larger logical elements and localized heterogeneity, though such a different way of looking at such might make some optimizations more apparent.)

As an example of heterogeneous routing, there might be a place for permutation or bit rotating units scattered through the fabric.  (I seem to recall reading that one weakness of FPGA processors is in handling variable shifts/rotates.)

Connecting hardened circuits with the programmable logic also seems to be an area of potential improvement.  Placement of the hardened circuits would seem to be important.  There might also be ways to connect portions of a hardened design to more programmable logic.  E.g., some DSP functionality could actually be designed into hardened DSPs.  Such hardened DSPs would be less area and power efficient than a DSP design that did not also connect multipliers to the external fabric, but the sharing might be a net gain in some cases.  (Some of the Sun Niagara processors shared the processor multipliers with the encryption co-processors.  I wonder if altering FPGA multipliers to better support encryption would be useful.  I seem to recall that the Sun method added about 5% to the area of the multipliers.)

50%
50%
Paul A. Clayton
Paul A. Clayton
6/28/2012 6:41:46 PM
User Rank
Beginner
Re: LUT memory technology
A denser LUT memory might also make something like Tabula's multiple configurations used at different times even more practical.  If I recall correctly, Tabula claims their system increases utilization of logic elements and reduces area (which reduces communication power/delay).

50%
50%
Paul A. Clayton
Paul A. Clayton
6/28/2012 6:31:53 PM
User Rank
Beginner
Clock power saving
One might also be able to save power by clocking tricks.  There is a many-core design (the name does not come to mind) that uses a clock wave rather than a clock tree.  By only guaranteeing minimal clock skew at neighboring nodes in a grid layout, power can be saved. 

Various forms of asynchronous design could also be useful.

I do not know if wave pipelining (where multiple waves of operation can coexist within a single clock cycle, avoiding the need for finer-grained latches and a faster clock) would apply to FPGAs, but such could reduce delay or power slightly.

The regularity of the logic array might make timing-based design less impractical (I am guessing).

50%
50%
Paul A. Clayton
Paul A. Clayton
6/28/2012 6:16:13 PM
User Rank
Beginner
LUT memory technology
An improvement in memory technology could significantly impact power use.  A persistent (non-volatile) memory would tend to use significantly less power than SRAM.  For typical LUT uses, one could even sacrifice write energy and time.  A denser memory would also tend to reduce the distance between logic elements and so reduce power consumption.

It might be practical to combine memory technologies within a LUT such that a dense (and low-power) memory that is perhaps less reprogrammable (or has some other disadvantage) is used for some configuration.  Such might be useful in conjunction with more complex logic elements; the dense memory might be used to establish a 'family' of logical operations while the less dense memory is used to select a specific instance.

There might even be a place for EPROM-style memory where the memory cannot be erased in the field.  This would avoid some of the issues with OTP memory, though one might need a slower and/or higher-power SRAM-based version for development.

In the more distant future, there might be ways to directly rewire logic and routing (rather than having a memory that provides a zero or one, the "memory" provides a left or right) and perhaps even (eventually) to convert individual transistors (e.g., by somehow changing N regions into P regions and vice versa).

50%
50%
Max Maxfield
Max Maxfield
6/28/2012 3:13:15 PM
User Rank
Blogger
Re: Some thoughts
With regard to my previous comment, I touched base with someone I know at Altera who brought me up to date. Apparently they have several patented technologies around this area; they  call it Programmable Power Technology, and they offer it in their high-end device families (namely Stratix).  They introduced it in their Stratix III (65nm) parts and they've migrated to their later 40nm and now 28nm Stratix device families.

In a nutshell, this technology allows users to maximize performance and minimize power consumption. Users (by which we mean the engineers creating the design) have the ability to turn up the performance levels of certain logic blocks that require high-performance and turn down the performance level of logic blocks that don't need it.

My contact says that this significantly reduces the power consumption in their high-end device family. (Actually he also says "It's a kick ass technology").

50%
50%
Max Maxfield
Max Maxfield
6/27/2012 10:24:16 PM
User Rank
Blogger
Some thoughts
There are two main portions to the FPGA -- the hard core functions like a microcontroller subsystem and the regular programmable fabric. In the case of the MCU it should be possible to completely power down and unused functions, like a UART for example. In the case of the programmable fabric, there are a number of techniques that can be used. One is disabling clock signals and sub signals when they aren't being used. Another involves identifying critical paths and making their logic blocks switch faster but use more power; the blocks used for non-critical tags could be set to switch slower and use less power. There was one company that used this technique ... I think it was Altera on some of their parts ... but I don't know if they still do it in the latest parts ... I'll have to ask them and then report back further.

50%
50%
More Blogs from Warren Miller
Warren has finally started to write some HDL code to implement his chess-playing FPGA, but he's not a professional coder, so he needs our help and advice.
What might we see in new Ultra Low Density (ULD) CPLD families three-to-five years down the road? Are there new technologies or programmable structures that will find their way into ULD devices?
Following our evaluations, the resources required by a chess-playing FPGA implementation would seem reasonable, even for a small or midsized device.
A number of challenges are faced by the users and manufacturers of ultra-low-density devices (ULDs).
We are ready to consider how to use our Move Generator to traverse the tree of possible moves efficiently and find the sequence that produces the best board position.
flash poll
follow us on twitter
follow Xilinx on twitter
like us on facebook
like Xilinx on facebook
All Programmable Planet     About Us     Contact Us     Help     Register     Twitter     Facebook     RSS