Skip to content

Sampling distribution documentation and tailoring #221

Description

@Dietr1ch

I played with this library this afternoon and noticed that there's a bias towards edge values like 0, ?::MAX ?::MIN and I get that they work wonders in fuzzing, but I ran into the problem that the bias ended up producing simple test cases.

I was generating a sequence of push(value)/pop operations on a heap. My approach could be simplified to,

#[derive(Arbitrary)]
struct OperationBatch {
  seed: u64,  // Fixes the push/pop sequence. (I biased towards pushing small batches)
  numbers_to_push: Vec<u16>,
}

The bias resulted in my operation sequences using mostly the same numbers on the heap, which doesn't stress the heap too much.

I ended up implementing OperationBatch::from_seed(u64) and customising the number distribution, but would appreciate documentation around default distributions and mention to helpers to tailor the distribution of values when the defaults are a bad fit.

At least from the docs around output distributions it wasn't clear to me that there's this bias nor how sharp it is. I feel that I'd have had an easier time if I ran into arbitrary_len docs, but from the README I initially thought that sprinkling a few attributes would be all I needed.


Maybe I just ran out of entropy because of a poor size_hint, but I'd expect errors instead of silently generating bad samples (Looking around might be related to #219 (comment)). Also it seems that there's a missing set of attributes to specify collection sizes that uses arbitrary_len underneath.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions