A couple of years ago, I was lucky enough to be introduced to Andreas Olofsson, president and chief architect of Adapteva. This man single-handedly invented a new computer architecture and designed his own system-on-chip (SoC) from the ground up -- including learning how to use all the EDA tools. He then took the device all the way to working silicon and a packaged prototype. That's when things really started to get interesting!
After that introduction, I wrote an article about Andreas and his incredible design. Since then, I must admit that I've occasionally wondered what was going on at Adapteva. Then, out of the blue, Andreas called to bring me up to date, and you will not believe what he told me.
A few years ago, while working on various aspects of digital signal processing, he began to ponder a problem: The processing solutions on the market, though very versatile, were not inherently efficient in terms of floating-point operations per watt. He was targeting really complex floating-point problems that require a massive amount of flops. This includes the obvious suspects: radar, medical imaging, and communications infrastructure tasks like beam forming. But even battery-powered handheld applications increasingly must perform computationally intensive tasks while consuming as little power as possible.
He designed the Epiphany chip, an array of processor cores, each equipped with its own local memory and a single-precision floating-point engine. Everything is designed to offer optimum performance while consuming as little power as possible. The chip is extremely scalable -- the Epiphany-III (implemented at the 65nm node) boasts an array of 16 processors, while the Epiphany-IV (implemented at the 28nm node) features an array of 64 processors.
When operating at peak performance, running at 800MHz, the Epiphany-IV offers 100Gflops of raw computing power while consuming only 2W. At 50Gflops/Watt, the Epiphany-IV is 50 to 100X more efficient than anything else out there.
What does this have to do with us here on All Programmable Planet? The Epiphany was always conceived as operating as a co-processor that would offload the main processor. In addition to memory, early systems featured three main chips -- an MCU, an FPGA, and an Epiphany, where the FPGA was used to interface the MCU to the Epiphany.
And then Xilinx introduced the Zynq All Programmable SoC. This inspired Andreas to develop a personal supercomputer system called the Parallella, which is based on a combination of the Zynq and the Epiphany, as illustrated in the block diagram below.
Block diagram of the Zynq-based Parallella personal supercomputer.
Andreas told me the Parallella platform will be built on the following principles:
Open access: Absolutely no NDAs or special access are needed. All architecture and SDK documents will be published on the Web as soon as funding is available (more on that later).
Open-source: The Parallella platform will be based on free open-source development tools and libraries. All board design files will be provided on an open-source basis once the Parallella boards are released.
Affordability: Hardware and SDK costs have always been a huge barrier for developers looking to develop high-performance applications. The goal is to bring the Parallella high-performance computer cost below $100, making it an affordable platform for all.
Duane Benson 11/5/2012 6:56:15 PM User Rank Blogger
Parallel and memory
My understanding of multi-core is that for most applications, the gains from adding additional cores really level off after about six because of bus sharing issues. The Adapteva processor here has local memory for each core which makes a lot of sense and could help to mitigate that problem. But it's not a lot of memory, so I would guess that it's more of a cache than plain RAM.
Is there an FPGA or FPGAish system that can, through programmable logic, map RAM to different processor cores dynamically. I don't mean just dedicating an address range in a shared RAM bank, but actually creating a dedicated "local" RAM bank with its own dedicated data and address buses and sized as needed for each core?
Max Maxfield 10/29/2012 9:39:43 AM User Rank Blogger
Re: Finally $898,921 against the requirement of $750,000.
@Myplanet: I must admit that I am very happy with the final result -- Andreas over at Adapteva emailed me to say thanks to everyone for all the support -- I will report further when I have my Paralella in my hands :-)
Finally $898,921 against the requirement of $750,000.
Finally the number becomes to $898,921 against the requirement of $750,000. Thanks for all the contributors and happy to became a part of the project by contributing a little.
Max, there is no wonder that within a couple of years we can have a supercomputer as our desk top. Number of cores and processor per cores are doubling every 18 months, according to the Moor's law. But affordability may be a concern because only high end systems will be available in market.
If it is is set up as it looks in the diagram, the memory is in the wrong place for graphics. :-)
An Intel i3-530 is rated about 28GFLOPS*, (vs 24GFLOPS for the 16 core board) so it won't be undreamed-of power - you won't be doing stuff that you can't already do on a top of the line PC, just the stuff you can't do on a Raspberry Pi.
Excluding of course the juicy helping programmable logic for interfacing, high speed DACs, crypto acceleration, DSP and so on.
Oh, and it will be very fine-grained parallelism (more like FPGAs) rather than the coarse grained stuff that goes on in PC land (multi-threading, semaphores, IPC, spinlocks...). It is going to bring a few thousand people much closer to using programmable logic.
I'm sure somebody will make this board into an excellent workbench tool - high speed logic analyser, multi-channel audio / low bandwidth scope, frequency counter, function generator, spectrum analyser, slick GUI interface, just plug in a mouse, a spare screen and a cell phone charger.
Well I am sure we agree on most things, the interesting thing is that Thales have up until now been doing all their crypto stuff on big lumps of software running on arrays of strongARM processors on PCI cards and if I was them I would be going'oh look that looks almost exactly like our architecture' if we stuck a PCI interface on it it would be the same.
But anyway, games would be good as well. I bet any 3d engine would go pretty fast.
The same conspiracy theories (about 'the spooks in the CIA', 'the government', 'the shop', or -- in the 1960s -- 'the phone company') have been running around for many many years.
Have you ever watched the movie The President's Analyst ? It was released in 1967 and yet it still seems eerily contemporary. It's still funny. The names have changed, but the unshakable and uneasy fear and suspicion remain mostly the same.
The internet and global commerce/competition have made this sort of conspiracy theory even less plausible. Think Wikileaks.
"I dunno why but I always find myself disagreeing with you ..."
Yah, various people have been telling me the same thing since the early 1980s. You are in very good company...
I dunno why but I always find myself disagreeing with you, nothing personal.
If you look at the application which is going to scare governments with this is a crypto cracker, and if you look at products which are out there on the market I can name at least one crypto engine which is simply an array of StrongARM processors with a PCI interface, no mass storage there I am afraid.
Could be that people such as Thales might step in and buy them out is because its putting the kind of technology which the US government tries to classify as munitions into the hands of people who might do naughty stuff with it.
It might be that if the likes of Thales buys them you suddenly cannot get your hands on one.
If you ask most people if they can explain how mirrors work, their knee-jerk reaction will be, "Yes, of course!" After reading this blog they may change their minds...
One alternative to parallel interconnect in the form of busses is to use a serial interconnect setup. This typically involves a special transceiver block inside the device.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.