Most FPGA processor cores provided by FPGA vendors are -- to a large extent -- "Black Boxes." One does not get to see inside the box to learn how to design logic. Similarly, many of the IP vendor processor cores come encrypted, or one must execute an NDA (non-disclosure agreement) and Evaluation License, which may be beyond the scope of what one wants to do for casual hobby learning.
The U.C. Riverside VHDL 8051 project is one 8051 core that -- with a little elbow grease -- can be simulated and synthesized with most FPGA vendor tools.
This project does not have the architectural sophistication or extensive tool support associated with the offerings from IP core vendors like Digital Core Design (DCD) or other commercially-available FPGA IP vendor processor cores -- and it does contain a number of limitations to its functionality and performance -- but it will run simple programs for the purposes of teaching oneself the basics of how a CPU core is "done."
The project consists of models of the ALU, internal RAM and ROM, and external RAM, as well as a testbench and support files. A number of sample programs and ROM *.hex images are included in the project. Also, a utility is included that allows you to create a new ROM containing your own program. You will need to compile this C/C++ file (say, "gcc -Wall i8051_mkr.c") and then run it with your *.hex file as a command line argument (e.g., "a.out myfile.hex). This will then produce a corresponding ROM file in VHDL.
The results of simulating 1ms of the included "sort.c" with the 8051 model are shown below (click here to see a larger, more detailed version of this image):
In the above screenshot, the "reset" and "clock" signals are at the top of the waveform display window. The opcode mnemonics are spit out on the "dbg/logic" line at the bottom of the wave window (these are also written to a file). Other 8051 pins may be active depending on the code being executed. The model does not have a full set of peripherals -- such as timers, serial I/O, and analog-to-digital converters (ADCs) -- so it is very basic. It also does not have interrupts coded into the present version. Again, this is a basic educational project, but one can probe down into the model using a simulator and get the gist of how the ALU, ROM, and RAM function.
With further work, one can separate out the non-synthesizable test-bench and the synthesizable model and build an FPGA with the code in it to play with as illustrated in the image below (click here to see a larger, more detailed version of this image):
In order to get the code to synthesize in Synplify, one must bring out the "reset" and "clock" pins to a port on the top of the testbench file, converting it into a top level design file.
For those of you who are experienced with this sort of thing, what were some of the projects you used to learn about FPGA design and tinker with? Some of the other good ones include the LEON SPARC core by Aeroflex and some of the many cores on Open Cores. What are some of your favorites?
And for those of you just starting out, what kinds of questions do you have on processor cores and how simple ones work? Have you tried this core, and did you find any bugs you would care to share, or features you would like to add?
But what you implement depends on your design criteria. If your design criteria is to implement an 8051 so that you can use the existing code base, then you wind up with an 8051 with its features and limitations.
I wanted someting suitable for FPGA housekeeping, so my criteria were to design something that was small and fast (so that timing is never a problem), that was in pure Verilog, that had a simple instruction set, and that could be easily used on either Linux or Windows boxes. I also wanted to reduced some of the nuisance work required to implement a processor.
What I designed is a 9-bit opcode, 8-bit data, stack-based processor that uses about 60 slices in a Spartan 6 and synthesizes to 160 MHz. A single command runs the Python scripts that construct the processor and its peripherals and runs the assembler.
If you're curious, this Small Stack-Based Computer Compiler is hosted at https://github.com/sinclairrf/SSBCC
@frisbee -- All good points for some commercial cores that do processors. However , some of the DCD 8051 cores for example are quite small, have excellent commercial grade tools to work with, and are quite fast as well. One of the big disadvantages I've found with doing my own CPU core, is lack of tools to support it for any really complex SW effort that requires a CPU. ( C compiler, JTAG ICE, etc.) One is left with the options of modifying a version of SDCC, or some other compiler and doing "Burn and Learn" for debugging for the simple cores that one does one's self, or coding in machine code, and suffering the lack of maintainability that comes with hand rolled machine code.
rfrisbee 2/7/2013 5:54:17 AM User Rank Clever Clogs
CPUs in FPGAs
The problem with implementing soft core implementations of classic CPUs in FPGAs is that they often come out HUGE and/or SLOW. In addition, many 8-bit cores (including the 8051) are hideous from a programming point of view. The main advantages some of the old cores have going for them are large pre-existing code bases and high level language support.
The 8-bit, 2 clocks/instruction Picoblaze architecture is a good place to start for anyone wanting to design a simple softcore processor. With some significant tweaking of the instruction encoding and adding relative jump instructions, the 1kWord program space can be extended to 8kWord. Adding pointer hardware and a couple of additional instructions allows large amounts of external data memory to accessed. My implementation has an fmax > 64MHz in the slowest speed grade Altera Cyclone and consumes fewer than 900LEs. The simplest versions come in at under 600LEs, a large fraction of which are consumed by the register file, since Altera Cyclone devices don't have the ability to use LEs as 16x1 RAMs as Xilinx FPGAs do.
I'm currently working on a 32-bit soft core based on a design by Wirth. My implementation runs at 4 clocks/instruction, has a barrel shifter, hardware signed and unsigned multiplication and division and 32 prioritized hardware interrupts. The core has an fmax > 64MHz and consumes about 2000LEs, including logic to boot from the FPGA configuration device.
For comparison, a 4 clock/instruction 8-bit megaAVR soft core with no peripherals that I've written consumes over 2000LEs.
Some may have noticed that all the cores I've written require multiple clock cycles to execute an instruction, whereas many modern processors are pipelined and can typically execute up to one instruction every clock cycle. The reasons I've gone down this route are:
1) If I need speed I've got FPGA hardware to perform or accelerate calculations. I'd rather not eat up logic and memory resources to implement a "fast" CPU to implement functionality that could be performed more efficiently with hardware.
2) Dealing with pipeline hazards requires significant amounts of extra logic and extensive testing to insure correct operation under all sequences of instructions and interrupts. Code with lots of branches in it somewhat negates the advantages of pipelining unless extra logic is added to perform branch prediction. The cores I've written only go as far as overlapping the fetching of the next instruction with the execution of the current instruction.
3a) The systems I design FPGAs for typically don't have the memory bandwidth to feed instruction words faster than about 20MIPS.
3b) Adding useful amounts of cache logic would significantly increase the size of the core, consume many valuable internal memory blocks and makes the execution time non-deterministic.
@Max -- Print Your Own ASIC -- not sure about a fully packaged Die, But possibly for trailing edge stuff, it might be possible to do a micro-FAB in some kind of shape or form for universities and research Labs that was in a small form-factor, and moveable -- The IC lab at my university was about the size of two large shipping containers for floor area, 30 years ago. So it would not have to be terribly large now days -- The supplier might be able to supply tubes of "Blank Die" in a cartridge or tube form that could be loaded into the machine, and the chemicals/materials could be in special sealed containers. The lithography might be done with scanable lasers and possibly a special set of masks that are for just one prototype die at a time. The device might have a longer time duration for each of the steps than a large FAB. I see analog and RF / MixedSignal prototypes as the bigger users.
Max Maxfield 2/6/2013 11:19:54 AM User Rank Blogger
Re: my favourite
@William: So do you think there will ever come a day where designers have a photocopier-size maching in their office upon which they can "print" their own packaged ASICs/ASSPs/SoCs?
The real key is most FPGA designs are a case of to make $1Million one must sell 1,000 $1000 items, where as most 8051 designs sales volume wise start in an FPGA, but are converted to an ASIC and are a case of selling 200K $5 items or even more at even less cost.
@Max -- The free tools for the 8051 (embedded compilers) vs the GNU compiler(also often Free for ARM) for the ARM also mean the 8051 image is many times smaller than the Flash/ROM image for an ARM -- this can be a significant cost edge as well. (Free or Low cost Core, Smaller ROM size, Less Power, Less Silicon Area -- all vital when doing a SIM, or USB stick like application that the selling price for the OEM is less than $1-5 for the product, and the performance is "Good Enough" -- The real key is Good Enough for the right Price, In time for the market.
Max Maxfield 2/5/2013 4:58:51 PM User Rank Blogger
Re: my favourite
@William: I probably should write about putting FreeRTOS on an FPGA
That would be an interesting one -- you could start with one blog by talking about RTOSs in general and lead into FreeRTOS -- then have a follow-up blog describing the process of implementing the port.
@Max 8051-RTOS -- Actually it was a FreeRTOS port, (author Richard Barry of the UK wrote FreeRTOS) It is quite easy to get a port of it up and running (Less than a week) I probably should write about putting FreeRTOS on an FPGA -- There are ports for Xilinx's CPU's already done even. I just need a bit better FPGA Dev Board to do it. (LX-9 Micro Board) May have to look into one. This would also run an 8051 Core, but quite a bit of work compared to uBlaze)
When extreme thermal cycling causes circuit boards and chip packages and the silicon die in the packages to expand and contract at different rates, problems may ensue.
In order to simulate a design we need models that represent the functionality and timing characteristics of our design elements, but the timing aspects of these models may be based on uncertain data.
Designing high-temperature electronics can present many challenges for "down-hole" petroleum equipment, ovens and micro-waves, automotive, medical, aerospace, and other applications.
To save this item to your list of favorite All Programmable Planet content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.