Skip to content

CI Test#255

Closed
LaurenzV wants to merge 68 commits into
linebender:mainfrom
LaurenzV:avx512-yes-really
Closed

CI Test#255
LaurenzV wants to merge 68 commits into
linebender:mainfrom
LaurenzV:avx512-yes-really

Conversation

@LaurenzV

Copy link
Copy Markdown
Collaborator

No description provided.

Shnatsel added 30 commits May 24, 2026 18:24
…edicated AVX-512 implementations for complex int/float vector operations that benefit the most.

LLM summary of the changes:

Implemented:
- Added `X86::Avx512` in the generator with Ice Lake feature set, `native_width = 512`, `max_block_size = 512`.
- Generated new `fearless_simd/src/generated/avx512.rs`.
- Wired public API: `Avx512`, `x86::Avx512`, `Level::Avx512`, `Level::as_avx512`, dispatch, and `kernel!` support.
- Updated runtime/static detection so Ice Lake AVX-512 is selected before AVX2, while `as_avx2()` and `as_sse4_2()` downgrade correctly.
- Bumped MSRV/docs/CI/check-target metadata to Rust 1.89.

Generator/backend behavior:
- 512-bit vectors use native `__m512`, `__m512d`, and `__m512i`.
- AVX-512 masks now use raw compact `__mmask8/16/32/64` storage, with no aligned wrapper.
- Generic `SimdFrom<__mmask*, S>` / `From<mask*, __mmask*>` now route through `from_bitmask` / `to_bitmask`, so they are correct for non-AVX-512 `S` too.
- Added AVX-512 compare/select paths using mask-returning compares and mask blends.
- Added direct conversion paths, including `f32 <-> i32/u32` and `u8 <-> u16`.
- Added AVX-512 vector slides for vectors only; masks intentionally have no slide support.
- Added dedicated AVX-512 zip/unzip/interleave/deinterleave using `permutex2var`, especially for 256/512-bit widths.

Tests/coverage:
- Extended `#[simd_test]` to include AVX-512.
- Added AVX-512 detection/dispatch coverage.
- Updated mask bitwise tests for canonical boolean mask lanes.
- Added a regression test that AVX-512 mask public types are compact and match `__mmask*` sizes.
…ackend, and specialize it for AVX-512. Add test coverage that sets every single bit and verifies it was set correctly.
…rage. Only for 8-bit left shift LLVM autovectorizes the scalar fallback into GFNI instructions on 256-bit halves which emits more instructions but schedules better and ends up being slightly faster according to llvm-mca on sapphire rapids; but the difference isn't huge and I don't want to rely on autovectorization because of its fragility.
… so they didn't show up earlier when I removed those methods.
…ppy --tests` without a reported location, I've failed to isolate it to a specific crate and suppress it there
…an't enforce Pod without an external dependency.
# Conflicts:
#	fearless_simd/src/generated/avx2.rs
#	fearless_simd/src/generated/neon.rs
#	fearless_simd/src/generated/sse4_2.rs
#	fearless_simd/src/generated/wasm.rs
#	fearless_simd_gen/src/generic.rs
#	fearless_simd_gen/src/level.rs
…ame name but different semantics from the production code to avoid confusion
Shnatsel and others added 29 commits May 27, 2026 21:50
PR linebender#237 only updates NEON load construction. The AVX512 branch-specific unsafe load sites were already adapted in the PR linebender#233 follow-up, and a search found no remaining load intrinsics needing the linebender#237 pattern.
Includes the regenerated AVX-512 output from the same generator update.
Includes regenerated AVX-512 slide helpers for the same safety cleanup.
Includes regenerated AVX-512 interleaved load/store output.
@LaurenzV LaurenzV closed this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants