Yardbird

This chicken lays eggs...

Artifact Evaluation

Quick Start

For a quick assessment of the artifact, follow these steps:

0. Login to the VM

An .ova of the Ubuntu 22.04 image with all dependencies installed is given in yardbird-vm.ova.

username: yardbird password: yardbird

1. Build the Tool

If there are any problems with Z3 header files during the build, be sure to run source ~/.bashrc to put the correct headers in the path.

cd yardbird

# Build yardbird
cargo build -p yardbird --release

# Verify build
./target/release/yardbird --help

2. Run a Single Example

This will automatically use the BMC Cost strategy.

# Run on a simple array copy example (completes in ~1 second)
./target/release/yardbird --filename examples/array/array_copy.vmt --depth 10

3. Run Light Review Benchmark Suite

For a more comprehensive evaluation with depth 10 (completes in less than 5 minutes):

# Build the benchmarking tool
cargo build -p garden --release

# Run light review configuration (depth 10, 20s timeout per benchmark)
./target/release/garden --config garden/benchmark_config.yaml --matrix light-review --output light_review_results.json

This runs all array benchmarks at depth 10 with both BMC Cost and Z3 array theory strategies, generating a JSON file with detailed results.

4. Use The Unified Evaluation Entry Point

main_eval.py is the top-level orchestration script for benchmark runs and reports.

# Local run with a combined workbook report
python3 main_eval.py \
  --env local \
  --benchmark-type deep-abstract \
  --benchmark-type deep-concrete \
  --name paper-smoke

# AWS launch only: records a local run manifest and exits immediately
python3 main_eval.py \
  --env aws \
  --benchmark-type deep-abstract \
  --benchmark-type deep-concrete \
  --name paper-aws

# Later, refresh status for an AWS-backed run
python3 main_eval.py --aws-run-id <run-id> --status

# When the AWS run is complete, download artifacts and build the report
python3 main_eval.py --aws-run-id <run-id> --generate-report

Artifacts are stored under benchmark_results/main_eval/<run-id>/. Raw benchmark JSON files land in per-matrix raw/ directories using MM_DD_YYYY_HH_MM.json names, while generated figures and the Typst workbook live under report/.

Reproducing Paper Results

To fully reproduce the paper's evaluation:

cargo build --release -p yardbird
cargo build --release -p garden

# Run BMC Cost
./target/release/garden \
  --config garden/benchmark_config.yaml \
  --matrix deep-abstract \
  --output paper_results_abstract.json

# Run AST Size
./target/release/garden \
  --config garden/benchmark_config.yaml \
  --matrix deep-abstract-ast \
  --output paper_results_ast.json

# Run Z3 Array theory baseline
./target/release/garden \
  --config garden/benchmark_config.yaml \
  --matrix deep-concrete \
  --output paper_results_z3.json

# Run Z3 MBQI
./target/release/garden \
  --config garden/benchmark_config.yaml \
  --matrix  deep-abstract-with-quantifiers \
  --output paper_results_mbqi.json

Each of these runs will take 1.5 to 2 hours on an AWS EC2 instance. Times may vary locally and by hardware.

These 4 runs will reproduce the main results from the paper. The additional cost functions can be run in a similar way: deep-abstract-prefer-write, deep-abstract-prefer-constants, deep-abstract-prefer-read.

Functional Badge Criteria

1. Documentation and Inventory

Is the artifact documented with an inventory of artifacts and sufficient description to enable exercise?

YES. Complete inventory:

Core Tool: src/ - Yardbird
- src/main.rs - CLI entry point
- src/driver.rs - Verification orchestration
- src/strategies/ - Proof strategies (abstract, concrete)
- src/cost_functions/ - Heuristics for term selection
- src/z3_ext.rs, src/smt_problem.rs - Z3 integration
Parsing Library: smt2parser/ - VMT and SMT2 parsing with array abstraction
Benchmarking Suite: garden/ - Automated benchmark runner
- Configuration-driven execution
- Supports parameter matrices
- JSON output with metadata
Examples: examples/array/ - VMT benchmark files

Is the artifact consistent and relevant to the associated paper, contributing to its main results?

YES. The artifact directly supports the paper's claims:

Main Claim: Yardbird performs bounded model checking with cost-guided abstraction refinement
- Implemented in src/strategies/abstract.rs using egg-based term rewriting
- Cost functions in src/cost_functions/ implement heuristics discussed in paper
Evaluation Results: All paper benchmarks are included in examples/array/
- The garden tool reproduces the exact parameter configurations from the paper
- Configuration deep-abstract matches paper's main evaluation (depth 50, 120s timeout)
- All "deep" configurations in garden/benchmark_config.yaml when run with garden will replicate results given in the paper
Comparison with Baselines:
- Concrete strategy (src/strategies/concrete.rs) implements baseline approach
Performance Claims: Quantifier instantiation counts and runtimes are logged and can be reproduced and verified from the output JSON file.

3. Completeness

Is the artifact complete, with all components relevant to the paper included?

YES. All paper components are included:

All benchmarks from evaluation section
All strategies discussed (abstract, concrete)
All cost functions evaluated (bmc-cost, ast-size, prefer-read, prefer-write, prefer-constants)
Complete source code (no proprietary components)
Benchmark runner to reproduce all experiments

4. Runnability

Can the software be executed successfully and can data be accessed and manipulated?

YES. Multiple execution modes:

Direct CLI: cargo run -- --filename <file> [options]
Benchmark Suite: garden --config <yaml> --matrix <name>

All benchmarks are accessible VMT files in examples/. Results are output as:

Human-readable logs to stdout
Structured JSON for automated processing
Compatible with graphics generation pipeline

Reusable Badge Criteria

1. License for Reuse

Does the artifact have a license allowing reuse and repurposing?

YES. This project is licensed under the MIT License (see LICENSE file), which allows:

Special provision for artifact evaluation: The LICENSE file includes an explicit addendum granting artifact evaluation committees the right to download, execute, modify, and redistribute the artifact for evaluation purposes.

All dependencies are also permissively licensed:

Rust standard library: MIT/Apache-2.0
Z3: MIT
egg (e-graphs): MIT

2. Dependencies Documentation

Are all dependencies and libraries well documented and up to date?

YES. Complete dependency documentation:

System Dependencies:

Rust 1.89.0 (specified in rust-toolchain.toml)
Z3 SMT solver 4.15.2+ (any modern version compatible)
libclang-dev (for Z3 bindings)
Standard build tools (gcc/clang, cmake)

Rust Dependencies: (all pinned in Cargo.lock)

z3 = "0.8" - Z3 solver bindings
egg = "0.9" - E-graph library for term rewriting
Full list in Cargo.toml

3. Usage Beyond the Paper

Does the README explain how the artifact can be used beyond the paper?

YES. Multiple extension points:

1. Verify New Programs:

# Create your own VMT file describing a transition system
# Run yardbird on it
cargo run --release -- --filename my_program.vmt --depth 20

2. Add New Cost Functions:

Implement the egg::CostFunction trait in src/cost_functions/:

pub trait CostFunction {
    fn cost(&self, enode: &ENode) -> Cost;
    fn name(&self) -> &str;
}

Example: src/cost_functions/ast_size.rs shows a simple implementation.

Register in src/main.rs to make it available via --cost-function flag.

3. Extend with New Strategies:

Add to src/strategies/ following the pattern in abstract.rs or concrete.rs. Strategies coordinate:

SMT problem construction
Incremental solving
Abstraction refinement

4. Benchmark New Tool Configurations:

Create custom YAML configs in garden/:

parameter_matrices:
  my_experiment:
    depths: [10, 20, 30]
    strategies: ["my-new-strategy"]
    cost_functions: ["my-cost-function"]
    timeout_seconds: 60

Run with: garden --config my_config.yaml --matrix my_experiment

5. Integration with Other Tools:

Use --print-vmt to output abstracted transition systems
We can dump the solver state to SMTLIB2 at any time during execution, giving us the ability to use other solvers or tools.

4. Documented Interfaces and Open Source

Does the artifact provide documented interfaces for extensions, or is it open source?

YES, both:

Open Source: Complete source code available, no proprietary components
Documented Extension Interfaces:
- Cost Functions: CostFunction trait in src/cost_functions/mod.rs
- Strategies: Pattern established in src/strategies/
- Output Formats: JSON schema in garden/src/main.rs
- VMT Parsing: Public API in smt2parser/

5. Cross-Environment Usage

Can the artifact be used in different environments (different systems, outside VM)?

YES.

Z3 is available on all major platforms and available through a python package with necessary header files
Rust provides consistent cross-platform builds
VMT files are platform-independent text files
No OS-specific system calls or dependencies

Running Example

cargo run -- --filename examples/array/array_copy.vmt

IC3IA

If ic3ia binary is located in your system PATH, you can run

cargo run -- --filename examples/array/array_copy.vmt --invoke-ic3ia --print-vmt

This gives you the IC3IA output on the decorated transition system generated by yardbird.

Performance Sampling

We've found that using samply is a nice way to find out where yardbird is spending time. Unsurprisingly, it spends a lot of time making Z3 calls, but we've still been able to get some speedups by looking at the callgraphs graphs and heatmaps.

If you don't have `samply` installed already

cargo install --locked samply

Running `yardbird` with `samply`

cargo build
samply record ./target/debug/yardbird --filename examples/array/array_copy.vmt

Building on Linux

I encountered the following error when building on a Linux machine:

error: failed to run custom build command for `z3-sys v0.8.1`

Caused by:
  process didn't exit successfully: `/home/cole/Documents/yardbird/target/debug/build/z3-sys-6ba06f331cb40b8a/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-env-changed=Z3_SYS_Z3_HEADER
  cargo:rerun-if-changed=wrapper.h
  cargo:rerun-if-env-changed=TARGET
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS

  --- stderr

  thread 'main' panicked at /home/cole/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/bindgen-0.66.1/lib.rs:604:31:
  Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This was resolved by running:

sudo apt install libclang-dev

Name		Name	Last commit message	Last commit date
Latest commit History 591 Commits
.github/workflows		.github/workflows
.hooks		.hooks
benchmark-synth		benchmark-synth
coop		coop
examples		examples
garden		garden
lab		lab
notes		notes
paper-graphics		paper-graphics
plans		plans
scripts		scripts
smt2parser		smt2parser
src		src
terraform		terraform
tests		tests
to_vmt		to_vmt
tools		tools
yardbird_eval		yardbird_eval
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
analyze_core.py		analyze_core.py
artifact-setup.sh		artifact-setup.sh
benchmark-gen.sh		benchmark-gen.sh
install_hooks.sh		install_hooks.sh
main_eval.py		main_eval.py
rust-toolchain.toml		rust-toolchain.toml
terraform_runner.py		terraform_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yardbird

Artifact Evaluation

Quick Start

0. Login to the VM

1. Build the Tool

2. Run a Single Example

3. Run Light Review Benchmark Suite

4. Use The Unified Evaluation Entry Point

Reproducing Paper Results

Functional Badge Criteria

1. Documentation and Inventory

3. Completeness

4. Runnability

Reusable Badge Criteria

1. License for Reuse

2. Dependencies Documentation

3. Usage Beyond the Paper

4. Documented Interfaces and Open Source

5. Cross-Environment Usage

Running Example

IC3IA

Performance Sampling

If you don't have `samply` installed already

Running `yardbird` with `samply`

Building on Linux

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Yardbird

Artifact Evaluation

Quick Start

0. Login to the VM

1. Build the Tool

2. Run a Single Example

3. Run Light Review Benchmark Suite

4. Use The Unified Evaluation Entry Point

Reproducing Paper Results

Functional Badge Criteria

1. Documentation and Inventory

3. Completeness

4. Runnability

Reusable Badge Criteria

1. License for Reuse

2. Dependencies Documentation

3. Usage Beyond the Paper

4. Documented Interfaces and Open Source

5. Cross-Environment Usage

Running Example

IC3IA

Performance Sampling

If you don't have samply installed already

Running yardbird with samply

Building on Linux

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

If you don't have `samply` installed already

Running `yardbird` with `samply`

Packages