Skip to content

docs(arrow): add SAFETY comments to lance-arrow unsafe blocks#7511

Open
LuciferYang wants to merge 2 commits into
lance-format:mainfrom
LuciferYang:docs/lance-arrow-unsafe-safety-comments
Open

docs(arrow): add SAFETY comments to lance-arrow unsafe blocks#7511
LuciferYang wants to merge 2 commits into
lance-format:mainfrom
LuciferYang:docs/lance-arrow-unsafe-safety-comments

Conversation

@LuciferYang

@LuciferYang LuciferYang commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Why

Project rule requires every unsafe block to carry a // SAFETY: comment.
lance-arrow had five blocks without one.

What

Annotate 5 unsafe blocks with the invariant chain that makes each sound:

File:line Operation
deepcopy.rs:23 NullBuffer::new_unchecked
deepcopy.rs:50 ArrayDataBuilder::build_unchecked
bfloat16.rs:152 build_unchecked (FromIterator)
bfloat16.rs:182 build_unchecked (From<Vec>)
bfloat16.rs:337 slice::from_raw_parts in as_slice

Plus minor hardening so the soundness arguments are self-contained:

  • lib.rs: crate-root compile_error! gating little-endian (required by
    as_slice's byte reinterpretation).
  • bfloat16.rs: debug_assert on the 2-byte alignment precondition of
    as_slice — arrow-rs only guarantees 1-byte alignment for
    FixedSizeBinary(n), so externally-built arrays (FFI/IPC) could violate it.
  • floats.rs + bfloat16.rs: # Panics and # Preconditions rustdoc on
    the trait + impl so the contract is visible at call sites.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy -p lance-arrow --tests -- -D warnings
  • RUSTDOCFLAGS="-D warnings" cargo doc -p lance-arrow --no-deps
  • cargo test -p lance-arrow — 91 unit + 6 doc tests

Per the project's Rust security rule that every unsafe block must
document its invariants, annotate five unsafe blocks in lance-arrow:

- deepcopy.rs deep_copy_nulls (NullBuffer::new_unchecked): null_count is
  taken from the source NullBuffer, which already upheld the invariant
  over its bit slice. NullBuffer::slice only adjusts BooleanBuffer's
  bit_offset/bit_len and never byte-advances the inner Buffer, so the
  deep copy reproduces the same bit pattern at the same offsets and the
  unset-bit count is preserved.

- deepcopy.rs deep_copy_array_data (ArrayDataBuilder::build_unchecked):
  reproduces data structurally — data_type/len/offset forwarded, each
  buffer byte-copied via deep_copy_buffer (MutableBuffer-allocated, at
  least as aligned as the source), nulls deep-copied with the same bit
  offset/length and unset-bit count, child_data recursively cloned. Every
  value-level invariant the source upheld (UTF-8, monotonic offsets,
  in-bounds dictionary indices, run-end monotonicity, struct
  child-length matching) transfers to the copy.

- bfloat16.rs FromIterator<Option<bf16>> build_unchecked: the value
  buffer contains exactly 2 * len bytes (two bytes per iteration of the
  loop above, including null-slot zero-fill); the null bit buffer has
  len bits, satisfying FixedSizeBinary(2)'s layout.

- bfloat16.rs From<Vec<bf16>> build_unchecked: each bf16 writes its two
  little-endian bytes once, so the buffer is exactly 2 * data.len()
  bytes; no null buffer is attached, every element is logically valid.

- bfloat16.rs FloatArray::as_slice slice::from_raw_parts: the assert
  pins value_size == 2; FixedSizeBinaryArray::From<ArrayData> slices the
  value buffer at construction so value_data() is offset-adjusted; bf16
  is #[repr(transparent)] over u16 with every bit pattern valid;
  alignment is the caller's responsibility per the documented
  precondition (a debug_assert above the unsafe block catches violations
  in debug/test builds); the slice borrows from self.

Additional hardening (no semantic change for compliant callers):

- lib.rs: add #[cfg(not(target_endian = "little"))] compile_error! so
  the byte-reinterpretation in as_slice's unsafe block is guaranteed by
  the crate itself rather than transitively through a dependent crate.

- floats.rs: hoist # Panics and # Preconditions sections into the
  FloatArray::as_slice trait method docstring so polymorphic callers see
  the contract documented at the trait level.

- bfloat16.rs: add a debug_assert checking 2-byte alignment with the
  exact modulo form ((ptr as usize) % align_of::<bf16>()) to catch
  externally-built FixedSizeBinaryArrays (FFI, IPC,
  Buffer::from_custom_allocation) that arrow-rs alone does not require
  to be more than 1-byte aligned.

- bfloat16.rs as_slice impl: add rustdoc # Preconditions and
  # Endianness sections so the contract is visible without opening the
  source.

Verification: cargo fmt --all -- --check, cargo clippy -p lance-arrow
--tests -- -D warnings, and cargo test -p lance-arrow (91 unit tests +
6 doc tests) all pass.
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 29, 2026
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

`FixedSizeBinary` is a variant of `arrow_schema::DataType`, not a
standalone item, so `[`FixedSizeBinary(2)`]` had no resolvable target.
With rustdoc's broken-intra-doc-links lint promoted to a deny-warning
in CI (the rustdoc job runs with `-D warnings`), this failed the build.

Convert the references to plain code-formatted prose. The lint also
covers the new doc comments added in cfd691d.

Verified locally: RUSTDOCFLAGS="-D warnings" cargo doc -p lance-arrow
--no-deps now passes. cargo clippy -p lance-arrow --tests -- -D warnings
and cargo test -p lance-arrow (91 unit + 6 doc tests) still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant