In my earlier blogs up to this point, we have primarily been talking about a single block or function, and very simple ones at that. But most systems will contain a number of these blocks chained together.
Consider the H.264 algorithm, for example. Within that algorithm will be a pixelPipeline; and inside of that will be a quantization step. It is probably too big a job to hand the whole algorithm to the ESL engine, so instead it has to be broken down into pieces that are synthesized independently and then connected back together.
Each of the pieces operates independently and concurrently. They may even operate at different clock frequencies. The connection between them has to be considered, and this may require a partitioning of the source code. Many times, the algorithm was developed with a particular implementation in mind, in which case it may be naturally partitioned. However, there are cases where this may not be the optimal partitioning for a particular form of implementation.
In the remainder of this column, we will briefly examine three ways in which blocks may be connected together, although many other approaches may be possible. The simplest technique is to just place a buffer between the two functions as shown below:
Of course, this method assumes that the blocks operate at the same frequency and that they produce and consume data at the same rate across the boundary. Note that when I say "buffer," I do not just mean a register, but whatever size is necessary to pass the maximum amount of data from one block to another that represents a packet or transaction. This could be a single unit of data or an entire video frame, for example.
In a previous blog (Can You Say 'Statically Determinable'?) we talked a little bit about a modification to this in cases where Block A can be stalled if Block B is not ready to accept more data; this modification can be applied to all of the interfacing techniques discussed here.
In many systems, the production and consumption rates associated with different blocks are not going to be the same. Of course, if Block A always produces data at a faster rate than Block B can consume it, then there is a fundamental problem with the design. But what often happens is that extra processing may be required for some of the data packets. Thus, although the blocks may be balanced overall, there may be times when one block may operate faster than the other. In cases where this may occur, a FIFO offers a better interfacing solution because it provides a degree of isolation between the two blocks as illustrated below:
Now, Block A is free to produce multiple packets of data so long as Block B will eventually catch up. But herein lies the problem -- what is the maximum disparity between the two blocks that defines how big the FIFO depth should be? This is a classic mathematical problem, and one model of computation that goes along with this is the Kahn Process Network (KPN). I will talk about this more extensively in another blog, but let me provide a teaser -- there is no defined solution that guarantees that buffers will not overflow, so these almost always have to be combined with some form of flow control.
In both of the methods introduced above, we have not talked about how the buffer or FIFO is actually implemented. If we assume that they are memory-based rather than register-based, there may be a restriction on them only being accessed by one process at a time, unless we are using dual-port memories. Care also has to be taken to ensure that half-written data from one block is not read by the other. One way to overcome this restriction is to use Ping Pong buffers as illustrated below:
The idea here is Block A writes data out one of the buffers. When this has completed, the buffers are flipped so that Block B starts to read out of that buffer and Block A starts to write into the other buffer. This guarantees that only complete data is read and allows a simultaneous read and write. Of course, flow control is also required in this type of system to determine when the buffers is ready to be written to and read from.