Thus far in our discussions about pipelines, we have concentrated on a single loop. We have assumed that every stage of a pipeline will take the same amount of time and we have made many other simplifications.
Life, of course, is never that simple, and algorithms that go into sophisticated products are much more complex than this. Multiple loops, nested loops, sequences, and many other things are the reality. Questions about pipeline stalls and back pressure have come up, and that is most certainly one of those realities. We have also basically ignored notions of feedback until my previous blog, at which time we touched on dependencies within loop iteration.
Now, I want to once again tackle a general theme that has appeared in many of the comments on my blogs. People have said things such as "I would never trust transaction-level synthesis" or "I can do a better job" or "This does not make it possible for a software engineer to build hardware." In fact, I had a discussion with a few colleagues just a few days ago about how the same things were said of RTL synthesis when it first came out. One of those people said, "We used the optimization stuff, but we would never use the synthesis capabilities." Another said, "It is crazy to think that software can do a better job than engineers at turning Karnaugh maps into hardware."
It took engineers a long time to find out the right way to write RTL code to get good results from synthesis and for style guidelines to become common place. Exactly the same things will apply to transaction-level synthesis (TLS). The users of today are blazing the trail and finding out what does and does not work. The vendors are watching them closely and adjusting their tools accordingly. Perhaps in another 10 years' time, 99 percent of engineers will be using TLS. However, I don't think it will happen that quickly because we don't have the verification tools in place.
Another reason why the transition will take longer is because the gap between a transaction-level description and RTL is much larger than the gap was from RTL to gate. At the same time, we are finding out that without considering very low-level impacts of the fabrication processes it is not always possible to know what the best synthesis results would be. Thus RTL synthesis has had to become layout dependent.
Try and think of it this way. Transaction-level exploration allows you to effectively select which algorithm you may want to use and the best overall structure for that algorithm. While the implementation will have some impact on the final figures, we are considering one or more orders of magnitude larger changes when we select the algorithm compared to routing delays and their impact on the micro-architecture. Also, software is software. We are not synthesizing software to make hardware. We are describing hardware using a language similar to one used for software.
Let me make this clear by providing a real example. Consider the AES encryption standard. This is a four-stage transformation and one of those stages is shown below:
This involves a non-linear substitution of each of the elements forming a 2x2 array. The reference software implementation for this is as follows:
The synthesis of that would have to access memory for each element in the array and -- using a lookup table -- find the new value to write back into the array. The hardware implementation would basically run at the same speed of the software. But this was a reference implementation defined because the equivalent hardware function was too difficult and time-consuming for software. The hardware implementation is as follows:
There is no way to translate, synthesize, or whatever you want to call it from one description to the other, but they both perform the same function at the end of the day. The bottom line is that software implementation is not hardware specification. Please can we move on? I will not address this issue again -- I promise.