In my previous blog, I promised to talk about a formalism that is used to analyze the way in which functional blocks talk to each other. That model of computation is called Kahn Process Networks (KPN).
Fundamentally, for a given input stream, a process in a KPN will always produce exactly the same output stream: it is thus a function from input streams to output streams. Also, all communication channels in a KPN are unbounded and lossless FIFO channels. This means that all write operations are non-blocking and that all read operations on an empty channel will block (stall) until a value is written to the channel. KPNs have the desirable property of having a uniquely defined behavior, which depends on neither the timing characteristics of its processes and communication channels nor on their scheduling. For example, a KPN will produce identical results whether its processes are executed sequentially on a single processor or in parallel on multiple processors and/or hardware accelerators. This model of computation is therefore well-suited for the specification, hardware/software co-design, and design space exploration of embedded systems.
KPNs do, however, have practical limitations when faced with realities. First, they assume infinite-length (unbounded) FIFO channels, whereas clearly only finite-length channels are available in a real-world implementation. This is, of course, true whether the FIFO channels are implemented as dedicated hardware FIFOs or through a shared memory structure. Practical solutions to this rely on flow control, which was discussed in the comments on the previous blog. There are many ways to implement flow control, such as having the sender first send a given number of bytes, or a packet, to the receiver over the channel and then waiting for the receiver to reply with an acknowledge before the sender proceeds to send further data. Conversely, after having read the given number of bytes, the receiver may reply with an acknowledge to the sender over the channel. This flow control ensures that the number of bytes that needs to be stored in a channel is always bounded.
A second limitation of KPNs is that they cannot model several important concepts in real-time processing such as timeouts, polling, race conditions, or real-time constraints due to the absence of non-blocking read operations. Therefore, implementations of KPNs for hardware/software co-design extend the KPN model of computation by allowing non-blocking reads on a channel (polling). These non-blocking reads can be used to model non-deterministic input events and accesses to time-sensitive input/output peripherals.
However, this flexibility comes with a price: an embedded system that uses polling loses the formal properties of KPNs, and the functional behavior of the system may depend on the timing characteristics of its processes and channels or on their scheduling. One partial solution is to have the simulation platform enforce a minimum delay of one clock cycle between two non-blocking reads on the same channel by the same process. This ensures a full execution of the system by preventing processes from polling a channel an infinite number of times in the same clock cycle. In the terminology of timed process networks, the channels are "delta-causal." This ensures the system has a uniquely defined behavior for each given schedule and set of timing characteristics for its processes, channels, and input events.
Have you had any personal experiences with the KPN model of computation? If so, I would be very interested to hear about them in the comments below.
jandecaluwe 1/6/2013 11:41:12 AM User Rank Blogger
Re: personal experiences with the KPN model of computation
@foster "I noticed that you're both talking about the CODING STYLE."
No, I am not talking about coding style.
FSMDs and communicating clocked processes are RTL paradigms that help you to define an architecture for a concrete design problem.
Of course, HDLs support these paradigms very well, and I have indicated how. However, HDLs are in no way a requirement. It is perfectly possible to support these paradigms with a mainly visual system: block diagrams for clocked processs (e.g. as a refinement of a top-level block diagram), bubble diagrams for the FSM part, and then some simple concurrent language for the data operations in each state.
There are/have been commercial systems that work like this - I don't know their current state.
"I almost know the limitations of C++/SystemC and others but finding a general solution for Coding Style in this heterogeneous world is unproductive"
I am not talking about coding style and I am not claiming that what I described is a "general" solution. First, I have called it "partial" myself. Moreover, the only thing I claim is that it is a generic fall-back solution for synchronous digital design (and nothing else). I am not saying that it is a good solution for every case.
jandecaluwe 1/6/2013 11:24:14 AM User Rank Blogger
Re: FSM with data operations
@aser "In the method of FSMD, the FSM design means the operator sheduling. The resourse allocation is implemented by the way. Such a sheduling and allocation are hand made, very complex and depend on the designer skills. Often they does not provide effective solution."
All true - but the FSMD method at the RTL level does provide a generic fall-back solution for all cases of synchronous digital design, including those where HLS is not applicable or available for price reasons.
foster 1/2/2013 11:54:24 PM User Rank Clever Clogs
A good tool for KPN
A good tool for KPN:
C or C++ file as input and KPN description as output.
Having capability to compute FIFO size For each channel based on a deadlock-free schedule.
Note that this particular deadlock-free schedule may not be optimal, and thus the computed buffer sizes may not be valid for the optimal schedule. However, a valid schedule exists for the computed buffer sizes.
MikePDX 1/2/2013 6:24:51 PM User Rank Clever Clogs
Ambric was a KPN in silicon
KPN is an excellent model of computation for massively parallel streaming real-time computing like video, medical imaging, computer vision and wireless baseband. We proved it at Ambric, a chip startup that was killed by the 2008 economic crash.
Ambric's Am2045, released in 2007, was a KPN with bounded buffers, fully implemented in silicon. It was a medium-sized 130nm ASIC with 336 32-bit processor cores, linked through a programmable interconnect of 32-bit flow-controlled channels, and 336 configurable FIFO-or-RAM memory blocks. Each core was programmed in Java, with channel reads blocking on empty and channel writes blocking on full. Tools auto-placed the processors and memories and auto-routed the channels.
Dozens of customers and researchers found KPN a very powerful and easy to learn programming model. My favorite example is the X-Ray customer system with over 13,000 cores in 40 Am2045 chips, doing 3D medical image scan reconstruction in real time, in under 500W, in a single ATCA chassis. Ambric chips were displacing FPGAs and DSPs thanks mainly to higher developer productivity.
It's certainly possible to have deadlocks in any real parallel computing system, including a KPN. The great advantage compared with ordinary shared-memory multicore programming is the KPN naturally halts at a parallelism bug, where it's much easier to find than a bad pointer, unsynchronized access, etc. In real applications it's not very hard to get the FIFOs sized right to make it deadlock-free. And with strictly distributed memory, debugged reusable functions stay reliable.
It's important to stick to blocking inputs for KPN to keep its predictable deterministic properties. "No peeking without taking". For unpredictable inputs like you brought up, Ambric had a special channel element you could use, that always had either a new input or else a null token, so the KPN processor's blocking input would always get something.
Until VCs and acquisitions froze up in 2008 and we had to close, Ambric chips were displacing FPGAs and DSPs, thanks mainly to higher developer productivity. It proved the value of KPN for embedded computing. I still love it. Use KPN!
Compiling Stream Applications for Heterogeneous Architectures
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2011
As you know, Cycle accuracy in using C++ (not SystemC) for designs is a big problem and every HLS tool treats it differently. For example suppose, Block A in your design has 3 loops and they must finish their jobs in 100, 150 and 200 Cycles respectively. Suppose there is also another block named Block B, waiting for these data to consume them in unpredictabe cycles (randomly that you could not guess the right cycles but streamingly).
The first thing that the designer could think of, is using FIFO channels. Yes, he (not she) is almost right, but the big problem is how we should make constraints for our loops inside the both blocks (I mean unrolling, pipelining or using wait routines of course). Block A as distributer and Block B as consumer have different behaviours. You could not even know, in what exact time intervals, the data are read. So, you could never find the best FIFO configurations (both structure and its size) too.
If you try to code the above design in pure C++, you will never have full control on transactions cycle by cycle and if you use SystemC, you still have the problem of finding the best FIFO. If we also consider the Area Budget (like the regular improvements in e.g. the IEEE papers) and if we still insist to use HLS tools and their accompanying FIFOs, How KPN can help to solve this or millions of similar problems.
hash.era 12/31/2012 3:09:02 AM User Rank Clever Clogs
Re: personal experiences with the KPN model of computation
Foster , coding style differs from person to person but the logic remains the same. I think the style will not make that much of an effect on the application itself.
Re: personal experiences with the KPN model of computation
@foster: I am primarily focused on the design process BEFORE STARTING TO CODE.
If as much effort had been spent discussing the process as has already been wasted discussing coding style.............
Something I noticed is that forms of graph(ic)s are surfacing in these discussions. As far as connecting the system together Altera's QSYS looks like a good start and Xilinx probably has something also.
FPGAs are totally different from ASICs but the same tools are used for much of the design.
The FPGA consists of LUTS, dFFs, memory blocks, and interconnect, and is clocked.
There is almost a direct mapping to an OOP cycle simulator. The debugger in the IDE can easily be used to set breakpoints and view internal values.
Here are the base classes needed:
"Register" dFF with a width property, din property, q property, and clock method.
"Memory" array of values, width, count, dina, (dinb for dual port), qa, qb addra, and addrb properties and clock method.
Derived classes of the register and memory base classes are referenced by name.
LUTS are represented by properties in the "user defined" class. In the FPGA the LUT "drives" its output, but in the model when it is used in another assignment the program will evaluate and use its value as if it had been "driven" by the source.
The user class can use any public properties of the derived classes to assign its property values. And everything is connected by name by the compiler. Different custom user classes represent parts of the design.
I believe 3D ICs are basically the replacement for the PCB. In the near future, the PCB will become nothing other than a holder with the ability to add connectors and perhaps a few components that cannot be economically integrated within the chip package.
Is the recent news that Altera will be using the Intel 14nm node with TriGate technology for their future FPGAs significant, or is it just industry noise?
Recent developments in high-level synthesis (HLS) and IP Integration technology mean that software developers can more easily create hardware to accelerate their applications.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.