A fully functional RV32IM soft-core processor with Hardware Math, Dynamic Branch Prediction, and Full-Duplex UART, deployed on the Nexys A7 FPGA.
This repository contains a professional-grade, 5-stage pipelined RISC-V (RV32IM) processor written in Verilog. It is capable of executing bare-metal C applications compiled with the standard GCC toolchain. The processor is built for real-world deployment, featuring cycle-accurate hazard handling, MMIO peripherals, and robust FPGA synthesis configurations.
- Architecture: 5-Stage Pipeline (Fetch, Decode, Execute, Memory, Writeback).
- ISA Support: Full
RV32Ibase integer instruction set +MExtension (Hardware Multiplication & Division). - Advanced Pipelining:
- Comprehensive Data Forwarding (RAW hazard resolution).
- Stall/Flush logic for Load-Use and Control Hazards.
- Dynamic Branch Prediction: Integrated Branch Target Buffer (BTB) to minimize pipeline flushes on loops and conditional branches.
- Memory Subsystem: Configurable multi-cycle memory latency (currently 3-cycle) with internal byte-level bypassing.
- Peripherals: Memory-Mapped I/O (MMIO) featuring a highly robust Full-Duplex UART (TX/RX) with 16x oversampling and hardware interlocking.
- FPGA Ready: Designed and optimized for the Digilent Nexys A7 (Artix-7), utilizing DSPs for multiplication, properly buffered clock domains (
BUFG), and dedicated constraints.
| Directory | Description |
|---|---|
modules/ |
Contains all the Verilog hardware modules (ALU, Pipeline Stages, Memory, UART, etc.) and the FPGA Constraints (constraints.xdc). |
simulation/ |
Houses the testbenches, simulation models, and scripts for running cycle-accurate verification in Vivado (xsim) or Icarus Verilog. |
workloads/ |
Contains the bare-metal C programs (e.g., uart_hello.c, Matrix Multiplication), the linker scripts, and the Makefile to compile standard C code into RISC-V Hex binaries. |
docs/ |
Supplemental documentation, Block Diagrams, and project flow notes. |
- Multiplier (
rv32m_multiplier.v): Synthesizes directly to FPGA DSP slices for single-cycle latency multiplication (MUL,MULH,MULHU,MULHSU). - Divider (
rv32m_divider.v): A 32-cycle sequential shift-register divider (DIV,DIVU,REM,REMU). It features explicit hardware comparisons to ensure 100% correct logic inference during Vivado synthesis, completely preventing dropped/corrupted math sequences.
- Memory Map:
0x00010000: TX Data (Write) / RX Data (Read)0x00010004: UART Status (Read) — Bit 0:tx_ready, Bit 1:rx_ready.
- Robustness:
- The RX module uses a double-flop synchronizer and half-bit-period sampling for extreme glitch immunity.
- The TX module uses an "Armed Latch" to perfectly synchronize asynchronous pipeline stalls with the physical hardware rate, ensuring bytes are never dropped or double-fired.
The workloads/ directory contains a Makefile that seamlessly cross-compiles your C code into a 32-bit little-endian hexadecimal file (.hex) readable by Verilog's $readmemh.
cd workloads/
make SRC=uart.cWhat happens automatically:
- Compiles the C code using
riscv64-unknown-elf-gcc(with-O2 -march=rv32im). - Links it against a bare-metal startup script (sets stack pointer, etc.).
- Converts the resulting
.elfinto a cleanly formatted hex file using a Python script. - Auto-copies the final
.hexfile into the simulation and top-level directories so the Instruction Memory initializes with it.
You can run a local terminal-based verification using Vivado's xsim.
cd simulation/
make TAR=Pipeline_Top_tb- The testbench will execute the pipeline cycle-by-cycle, printing a live monitor of the
PCD, current instruction, ALU results, and hazard statuses (Stalls/Flushes). - When testing computationally heavy tasks (like matrix math), the testbench automatically dumps the final state of the RAM into
final_dmem_dump.hex, which you can parse using the provided Python scripts.
- Clock Configuration: The project runs on a highly stable clock derived from the
100MHzboard oscillator. Intop_fpga.v, aBUFG(Global Clock Buffer) ensures zero clock-skew across the pipeline. - UART Setup: The UART operates at
28800 baud (8N1). Ensure your terminal application is configured correctly:sudo picocom -b 28800 /dev/ttyUSB1
- Interactive Mode: Flash the board and open your terminal. The
uart_hello.cworkload includes a fully interactive prompt where the RISC-V processor will actively ask you for decimal numbers, hex numbers, and strings, performing hardware math and echoing the results back over the USB cable in real-time!