Skip to content

VU0 inline-asm migration, cpu_vec_4/cpu_mat_44 fast paths, and unit-test harness#8

Open
fjtrujy wants to merge 1 commit into
masterfrom
fix_asm
Open

VU0 inline-asm migration, cpu_vec_4/cpu_mat_44 fast paths, and unit-test harness#8
fjtrujy wants to merge 1 commit into
masterfrom
fix_asm

Conversation

@fjtrujy
Copy link
Copy Markdown
Member

@fjtrujy fjtrujy commented May 26, 2026

Summary

Migrate ps2stuff VU0 vector / matrix primitives from the legacy SCEA
ee-gcc inline-asm form (which relied on a custom "j" constraint that no
longer exists in upstream GCC) to a portable inline-asm form that builds
cleanly with the current mips64r5900el-ps2-elf-g++ toolchain, and add a
comprehensive unit-test harness that runs on hardware and PCSX2.

What's in here

Inline-asm migration (VU0 path)

  • include/ps2s/vector.h, vector_common.h, matrix.h, matrix_common.h,
    vu.h: rewrite all VU0 inline-asm blocks to standard GCC syntax
    ("r" GPR constraints + explicit qmtc2/qmfc2/lqc2/sqc2/ctc2).
  • Fix ~20 latent asm constraint bugs across matrix.h /
    matrix_common.h (mat_33/mat_43 mult_tilde, get_row*, quaternion
    set, etc.) by replacing the obsolete "j" constraint with "r" +
    explicit transfers.
  • transform_t operators and mat_33::inverse / transform_t::inverse use
    direct lqc2/sqc2 memory paths to bypass a GCC TI-mode split-register
    bug that truncates the upper half of 128-bit values when routed through
    GPRs.

CPU compatibility / fast-path additions

  • New include/ps2s/cpu_compat_types.h: complete C++ fallback for
    vec_x/y/z/w/xy/xyz/xyzw, mat_33/34/43/44, transform_t with no VU0
    dependencies. Enables building / testing the higher-level API on plain
    CPU MIPS code.
  • cpu_vec_4 is now alignas(16); +, -, unary -, * operators get
    VU0 macro-mode asm fast paths (gated on !NO_VU0_VECTORS).
  • cpu_mat_44 * cpu_vec_4 and cpu_mat_44 * cpu_mat_44 get VU0 macro-mode
    asm fast paths using vmulax/vmadday/vmaddaz/vmaddw.

Tests

  • New tests/test_shared.cpp: 140 VU0-build tests covering vec_3 / vec_4,
    mat_33 / mat_34 / mat_43 / mat_44 arithmetic, transposes, rotations,
    mult_tilde, inverse, transform_t operations, accumulator ops, and
    the new cpu_vec_4 / cpu_mat_44 VU0 fast paths.
  • Same source file produces a 147-test CPU-fallback build (test_cpu_shared.elf)
    via -DUSE_CPU_COMPAT, exercising cpu_compat_types.h and the scalar
    paths of cpu_vector.h / cpu_matrix.h.
  • Includes an emulator vs. hardware detection helper for VU 1*X != X
    multiply quirk (see https://fobes.dev/ps2/detecting-emu-vu-floats).
  • tests/README.md documents how to build/run on hardware (ps2client)
    and in PCSX2.

Build

  • CMakeLists.txt / tests/CMakeLists.txt: add ENABLE_VU0_VECTORS and
    ENABLE_ASM options; wire up both test_vu0.elf and
    test_cpu_shared.elf targets.
  • CMAKE_BUILD.md: updated build instructions.

Test results

Both test binaries verified in PCSX2:

  • test_vu0.elf: 140 / 140 passed
  • test_cpu_shared.elf: 147 / 147 passed

Notes for reviewers

  • The migrated inline-asm is necessarily more verbose than the original
    SCEA form because upstream GCC has no register class for VU0 $vfNN
    registers and no "j" constraint. Restoring a first-class VU0 backend
    in upstream GCC would let us shrink these blocks back to near-original
    size; this is documented separately (not included in this PR) for a
    future effort.
  • The hardware-only vsuba $ACC, $vf0, $vf0 pattern is used instead of
    vadda $ACC, $vf0, $vf0 because the latter does not reliably sync the
    MAC pipeline in this codebase's contexts.

cmake: enable VU0/ASM options and wire test execution

docs(vu0): add emulation caveats and tests README note (fobes.dev)

docs(vu0): add copilot-instructions and consolidate VU0 sources

tests: add VU mul emulator-detection helper and global flag

tmp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant