Skip to content

GRP-3-RISCV/HighPerformance_5stage_Pipeline_Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 High-Performance 5-Stage RISC-V Pipeline Processor

A fully functional RV32IM soft-core processor with Hardware Math, Dynamic Branch Prediction, and Full-Duplex UART, deployed on the Nexys A7 FPGA.


📖 Overview

This repository contains a professional-grade, 5-stage pipelined RISC-V (RV32IM) processor written in Verilog. It is capable of executing bare-metal C applications compiled with the standard GCC toolchain. The processor is built for real-world deployment, featuring cycle-accurate hazard handling, MMIO peripherals, and robust FPGA synthesis configurations.

✨ Key Features

  • Architecture: 5-Stage Pipeline (Fetch, Decode, Execute, Memory, Writeback).
  • ISA Support: Full RV32I base integer instruction set + M Extension (Hardware Multiplication & Division).
  • Advanced Pipelining:
    • Comprehensive Data Forwarding (RAW hazard resolution).
    • Stall/Flush logic for Load-Use and Control Hazards.
  • Dynamic Branch Prediction: Integrated Branch Target Buffer (BTB) to minimize pipeline flushes on loops and conditional branches.
  • Memory Subsystem: Configurable multi-cycle memory latency (currently 3-cycle) with internal byte-level bypassing.
  • Peripherals: Memory-Mapped I/O (MMIO) featuring a highly robust Full-Duplex UART (TX/RX) with 16x oversampling and hardware interlocking.
  • FPGA Ready: Designed and optimized for the Digilent Nexys A7 (Artix-7), utilizing DSPs for multiplication, properly buffered clock domains (BUFG), and dedicated constraints.

📂 Directory Structure

Directory Description
modules/ Contains all the Verilog hardware modules (ALU, Pipeline Stages, Memory, UART, etc.) and the FPGA Constraints (constraints.xdc).
simulation/ Houses the testbenches, simulation models, and scripts for running cycle-accurate verification in Vivado (xsim) or Icarus Verilog.
workloads/ Contains the bare-metal C programs (e.g., uart_hello.c, Matrix Multiplication), the linker scripts, and the Makefile to compile standard C code into RISC-V Hex binaries.
docs/ Supplemental documentation, Block Diagrams, and project flow notes.

🛠️ Hardware Subsystems

1. Hardware Math (RV32M)

  • Multiplier (rv32m_multiplier.v): Synthesizes directly to FPGA DSP slices for single-cycle latency multiplication (MUL, MULH, MULHU, MULHSU).
  • Divider (rv32m_divider.v): A 32-cycle sequential shift-register divider (DIV, DIVU, REM, REMU). It features explicit hardware comparisons to ensure 100% correct logic inference during Vivado synthesis, completely preventing dropped/corrupted math sequences.

2. Full-Duplex UART (uart_tx.v, uart_rx.v)

  • Memory Map:
    • 0x00010000: TX Data (Write) / RX Data (Read)
    • 0x00010004: UART Status (Read) — Bit 0: tx_ready, Bit 1: rx_ready.
  • Robustness:
    • The RX module uses a double-flop synchronizer and half-bit-period sampling for extreme glitch immunity.
    • The TX module uses an "Armed Latch" to perfectly synchronize asynchronous pipeline stalls with the physical hardware rate, ensuring bytes are never dropped or double-fired.

🚀 Getting Started

1. Compiling C Workloads

The workloads/ directory contains a Makefile that seamlessly cross-compiles your C code into a 32-bit little-endian hexadecimal file (.hex) readable by Verilog's $readmemh.

cd workloads/
make SRC=uart.c

What happens automatically:

  1. Compiles the C code using riscv64-unknown-elf-gcc (with -O2 -march=rv32im).
  2. Links it against a bare-metal startup script (sets stack pointer, etc.).
  3. Converts the resulting .elf into a cleanly formatted hex file using a Python script.
  4. Auto-copies the final .hex file into the simulation and top-level directories so the Instruction Memory initializes with it.

2. Simulation & Verification

You can run a local terminal-based verification using Vivado's xsim.

cd simulation/
make TAR=Pipeline_Top_tb
  • The testbench will execute the pipeline cycle-by-cycle, printing a live monitor of the PCD, current instruction, ALU results, and hazard statuses (Stalls/Flushes).
  • When testing computationally heavy tasks (like matrix math), the testbench automatically dumps the final state of the RAM into final_dmem_dump.hex, which you can parse using the provided Python scripts.

3. FPGA Deployment (Nexys A7)

  1. Clock Configuration: The project runs on a highly stable clock derived from the 100MHz board oscillator. In top_fpga.v, a BUFG (Global Clock Buffer) ensures zero clock-skew across the pipeline.
  2. UART Setup: The UART operates at 28800 baud (8N1). Ensure your terminal application is configured correctly:
    sudo picocom -b 28800 /dev/ttyUSB1
  3. Interactive Mode: Flash the board and open your terminal. The uart_hello.c workload includes a fully interactive prompt where the RISC-V processor will actively ask you for decimal numbers, hex numbers, and strings, performing hardware math and echoing the results back over the USB cable in real-time!

"A processor built to demonstrate that high-performance pipelining, accurate hazard resolution, and robust hardware peripherals can beautifully coalesce in pure Verilog."

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors