As you may recall, I ended my previous blog, Can You Say 'Statically Determinable'?, with a question about what other types of coding style would make a program difficult for high-level synthesis.
My All Programmable Planet co-blogger, Hamster, identified a couple -- recursion (which can be dealt with if the depth of recursion is limited), and issues created by data dependencies.
I would add to that list anything that uses memory allocation, because the maximum size of the array that has to be created is no longer known. This is a problem with standards such as TLM 2.0 -- a very important standard for the creation of transaction-level models and virtual prototypes. It was the creation of this standard that really brought the whole virtual prototype space together and made it possible for models to be exchanged. But here, just like we have seen so many times in the past, the standard only considers verification and not synthesis.
TLM 2.0 contains a variable length payload and so immediately means that a model created for a virtual prototype cannot be synthesized without manual intervention. Neither can that model be automatically mapped into an emulator or FPGA prototype, because they also rely on being able to synthesize one side of this interface.
Because of this, companies created their own proprietary subsets of TLM 2.0 that do support synthesis, and there is now a draft OSCI standard that defines the synthesizable subset. This went out for review in August 2009, but a standard has not emerged, meaning there must still be some significant unresolved issues. Does anyone know what is happening in this group? If so, please share.
In this blog, I want to concentrate on Hamster's second point -- data dependency within a loop. Consider a simple loop as follows:
In this loop, the calculation of "a[i]" is dependent on the value calculated in the previous iteration "a[i-1]" of the loop. For the purpose of these discussions, let's ignore what happens for the first iteration, because this would require a value to come from "a[0]," which -- in this simple example -- has not been set. This may get flagged as an error if the high-level synthesis program is doing array bound checking, or result in highly spurious results if not checked and not initialized. As a minimum, it wastes a memory location, and it could be worse because a 33-word memory may consume 64 words in order to keep things aligned.
If we were to unroll and pipeline this design with an Initiation Interval (II) of 1, we would get the following:
In this case, the value of "a[1]" is required before it has been calculated. This is sometimes called a "Strongly Connected Component," although I just see it as a data dependency that limits the way in which things can be implemented. Thus, in order to pipeline this, we would need to use an Initiation Interval of 2 as illustrated below:
However, we have now lost the ability to share the adder. In the previous case, we required only one multiplier and one adder. Now we need a multiplier and two adders (unless the multiplier is capable of performing an add operation as well).
This provides a perfect example that illustrates how adjusting the Initiation Interval can impact the amount of resource sharing that can be performed, and that many aspects of system optimization are nor linear in nature. This is another reason for it being called "architectural exploration."
Related posts: