docs(arrow): add SAFETY comments to lance-arrow unsafe blocks#7511
Open
LuciferYang wants to merge 2 commits into
Open
docs(arrow): add SAFETY comments to lance-arrow unsafe blocks#7511LuciferYang wants to merge 2 commits into
LuciferYang wants to merge 2 commits into
Conversation
Per the project's Rust security rule that every unsafe block must document its invariants, annotate five unsafe blocks in lance-arrow: - deepcopy.rs deep_copy_nulls (NullBuffer::new_unchecked): null_count is taken from the source NullBuffer, which already upheld the invariant over its bit slice. NullBuffer::slice only adjusts BooleanBuffer's bit_offset/bit_len and never byte-advances the inner Buffer, so the deep copy reproduces the same bit pattern at the same offsets and the unset-bit count is preserved. - deepcopy.rs deep_copy_array_data (ArrayDataBuilder::build_unchecked): reproduces data structurally — data_type/len/offset forwarded, each buffer byte-copied via deep_copy_buffer (MutableBuffer-allocated, at least as aligned as the source), nulls deep-copied with the same bit offset/length and unset-bit count, child_data recursively cloned. Every value-level invariant the source upheld (UTF-8, monotonic offsets, in-bounds dictionary indices, run-end monotonicity, struct child-length matching) transfers to the copy. - bfloat16.rs FromIterator<Option<bf16>> build_unchecked: the value buffer contains exactly 2 * len bytes (two bytes per iteration of the loop above, including null-slot zero-fill); the null bit buffer has len bits, satisfying FixedSizeBinary(2)'s layout. - bfloat16.rs From<Vec<bf16>> build_unchecked: each bf16 writes its two little-endian bytes once, so the buffer is exactly 2 * data.len() bytes; no null buffer is attached, every element is logically valid. - bfloat16.rs FloatArray::as_slice slice::from_raw_parts: the assert pins value_size == 2; FixedSizeBinaryArray::From<ArrayData> slices the value buffer at construction so value_data() is offset-adjusted; bf16 is #[repr(transparent)] over u16 with every bit pattern valid; alignment is the caller's responsibility per the documented precondition (a debug_assert above the unsafe block catches violations in debug/test builds); the slice borrows from self. Additional hardening (no semantic change for compliant callers): - lib.rs: add #[cfg(not(target_endian = "little"))] compile_error! so the byte-reinterpretation in as_slice's unsafe block is guaranteed by the crate itself rather than transitively through a dependent crate. - floats.rs: hoist # Panics and # Preconditions sections into the FloatArray::as_slice trait method docstring so polymorphic callers see the contract documented at the trait level. - bfloat16.rs: add a debug_assert checking 2-byte alignment with the exact modulo form ((ptr as usize) % align_of::<bf16>()) to catch externally-built FixedSizeBinaryArrays (FFI, IPC, Buffer::from_custom_allocation) that arrow-rs alone does not require to be more than 1-byte aligned. - bfloat16.rs as_slice impl: add rustdoc # Preconditions and # Endianness sections so the contract is visible without opening the source. Verification: cargo fmt --all -- --check, cargo clippy -p lance-arrow --tests -- -D warnings, and cargo test -p lance-arrow (91 unit tests + 6 doc tests) all pass.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
`FixedSizeBinary` is a variant of `arrow_schema::DataType`, not a standalone item, so `[`FixedSizeBinary(2)`]` had no resolvable target. With rustdoc's broken-intra-doc-links lint promoted to a deny-warning in CI (the rustdoc job runs with `-D warnings`), this failed the build. Convert the references to plain code-formatted prose. The lint also covers the new doc comments added in cfd691d. Verified locally: RUSTDOCFLAGS="-D warnings" cargo doc -p lance-arrow --no-deps now passes. cargo clippy -p lance-arrow --tests -- -D warnings and cargo test -p lance-arrow (91 unit + 6 doc tests) still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Project rule requires every
unsafeblock to carry a// SAFETY:comment.lance-arrowhad five blocks without one.What
Annotate 5
unsafeblocks with the invariant chain that makes each sound:deepcopy.rs:23NullBuffer::new_uncheckeddeepcopy.rs:50ArrayDataBuilder::build_uncheckedbfloat16.rs:152build_unchecked(FromIterator)bfloat16.rs:182build_unchecked(From<Vec>)bfloat16.rs:337slice::from_raw_partsinas_slicePlus minor hardening so the soundness arguments are self-contained:
lib.rs: crate-rootcompile_error!gating little-endian (required byas_slice's byte reinterpretation).bfloat16.rs:debug_asserton the 2-byte alignment precondition ofas_slice— arrow-rs only guarantees 1-byte alignment forFixedSizeBinary(n), so externally-built arrays (FFI/IPC) could violate it.floats.rs+bfloat16.rs:# Panicsand# Preconditionsrustdoc onthe trait + impl so the contract is visible at call sites.
Test plan
cargo fmt --all -- --checkcargo clippy -p lance-arrow --tests -- -D warningsRUSTDOCFLAGS="-D warnings" cargo doc -p lance-arrow --no-depscargo test -p lance-arrow— 91 unit + 6 doc tests