Skip to content

audit: sync code that could be async in fbuild-build (sub-issue of #813) #820

Description

@zackees

Intro

This audits crates/fbuild-build/ for synchronous code that could run on the daemon's tokio runtime instead of std::thread / synchronous-subprocess pools. fbuild-build is the highest-leverage crate for the meta migration in #813 because:

  • It hosts the per-compile hot path (compile_sourceFbuildZccacheService::compile_blockingruntime.block_on(svc.compile(req))). That block_on is the most costly anti-pattern in the daemon today: every TU compile blocks one tokio worker for the full gcc duration plus the embedded ZccacheService::compile round-trip, even though the embedded service is already async-native on the same runtime.
  • It owns all parallel-compile fan-out (parallel.rs + compile_many.rs stage-2 workers) on std::thread::scope, which tokio-console cannot see.
  • It owns the entire per-platform BuildOrchestrator surface — 14 sync orchestrators. Whether the trait method becomes async fn is the single biggest design question for audit: go fully async — whole-app tokio runtime sharing for tokio-console (meta) #813's end-state.

The good news: fbuild-build never reaches for raw std::process::Command or std::thread::spawn directly — both are routed through wrappers (fbuild_core::subprocess::run_command, std::thread::scope) that are easy to replace centrally. The cascade is bounded; converting subprocess::run_command to async plus an async fn build on BuildOrchestrator would mechanically cascade through every finding below.

Findings

Severity File:line Current pattern Proposed async replacement Cascade notes
CRITICAL crates/fbuild-build/src/zccache_embedded.rs:198-224 (FbuildZccacheService::compile_blocking) runtime.block_on(async move { inner.compile(req).await }) from a sync caller. The underlying ZccacheService::compile is already async on the daemon's runtime. Delete compile_blocking; expose pub async fn compile(...) -> Result<EmbeddedCompileOutcome, _> that just .awaits inner.compile(req). Cascades into compiler::compile_source (the sole call site) and from there into the Compiler::compile_one trait. With BuildOrchestrator::build going async too (see below), the whole stack collapses to native .await. Removes the only block_on in fbuild-build and the only one in compile_backend.rs (runtime field becomes dead).
CRITICAL crates/fbuild-build/src/compiler.rs:573-685 (pub fn compile_source) Sync pub fn. Calls svc.compile_blocking(runtime, ...). Runs once per TU on every parallel-compile worker, so on a 100-TU project it holds N tokio workers blocked for the entire build if hosted on the runtime. Currently runs on std::thread::scope workers in parallel.rs, so it doesn't block tokio today — but it can't be moved onto the runtime until this is async. pub async fn compile_source(...) -> Result<CompileResult>. Replace compile_blocking call with .await. std::fs::create_dir_all + std::fs::write (lines 599, 675) → tokio::fs::*. This is the cascade root for the whole pipeline. Every Compiler::compile_one impl in avr/, esp32/, esp8266/, rp2040/, stm32/, teensy/, nrf52/, apollo3/, ch32v/, nxplpc/, renesas/, sam/, silabs/, generic_arm/ must become async fn. Need async-trait (Rust 1.94.1 native AFITs work but with dyn limitations — Compiler is used as &dyn Compiler widely, so async-trait is the pragmatic choice).
CRITICAL crates/fbuild-build/src/parallel.rs:80-143 (compile_sources_parallel) std::thread::scope spawning N workers, each calling compiler.compile(...) sync. Uses std::sync::Mutex<work_iter>, Mutex<Option<String>>, Mutex<Vec<String>>, AtomicUsize. Held across sync compiler.compile (which itself goes through block_on today). Replace thread::scope with futures::stream::FuturesUnordered driven by tokio::task::spawn. Mutex<work_iter> → cheaper to drain into a Vec upfront and index with AtomicUsize. Per-task compiler.compile(...).await. Bound concurrency with a tokio::sync::Semaphore(jobs) instead of fixed worker count. Hot-path fan-out: every pipeline::compile_sources / compile_local_libraries / ESP32 build calls this. Cascades from compile_source async conversion. Once both flip, the entire per-build compile graph is visible to tokio-console.
CRITICAL crates/fbuild-build/src/lib.rs:201-204 (BuildOrchestrator trait) fn build(&self, params: &BuildParams) -> Result<BuildResult> — sync. 14 impls. async fn build(...) via #[async_trait] (used as Box<dyn BuildOrchestrator> in get_orchestrator, so AFITs are awkward). Cascades into all 14 per-platform orchestrators: apollo3/, avr/, ch32v/, esp32/orchestrator/build.rs, esp8266/, nrf52/, nxplpc/, rp2040/, sam/, silabs/, stm32/orchestrator/mod.rs, teensy/, renesas/, plus the trait's caller in fbuild-daemon. See the dedicated subsection below for the design call.
HIGH crates/fbuild-build/src/compile_many.rs:665-736 (run_stage2) std::thread::scope fanning out per-sketch builds across N workers. Each worker calls builder.build(SketchBuildInputs)get_orchestrator(platform).build(&params) (sync chain). Uses AtomicUsize queue + Mutex<Option<SketchResult>> per slot. let _ = h.join(); (line 734). If BuildOrchestrator::build goes async: replace with FuturesUnordered + per-task orchestrator.build(...).await. Mutex<Option<SketchResult>> per slot becomes a plain Vec<Option<SketchResult>> populated by index from a JoinSet<(usize, SketchResult)>. Primary stage-2 fan-out for fbuild compile-many (#238 / #335). Tokio-console gains per-sketch task visibility. Cascades from BuildOrchestrator::build async conversion.
HIGH crates/fbuild-build/src/compiler.rs:438 (compiler_versionsubprocess::run_command(["gcc", "-dumpversion"], …)) Sync subprocess via fbuild_core::subprocess::run_command. Cached in COMPILER_IDENTITY_CACHE: OnceLock<Mutex<HashMap<...>>>. Runs once per distinct compiler path. subprocess::run_command_async(...).await. Cache → tokio::sync::OnceCell per compiler path or keep OnceLock<Mutex<HashMap>> because the lock is only held for the cache lookup (not across await — fine to keep). Setup-ish, but called from inside compile_source (signature derivation), so on the hot path. Cascades into fbuild-core::subprocess (separate audit).
HIGH crates/fbuild-build/src/pipeline/compile.rs:167-178 (log_toolchain_version) Sync subprocess::run_command(["gcc", "-dumpversion"], …). .await. Per-build (not per-TU). Same cascade through fbuild-core::subprocess.
HIGH crates/fbuild-build/src/linker.rs:309, 351, 398, 432 (LinkerBase::archive / report_size / analyze_symbols / objcopy_firmware) Sync subprocess::run_command(args, None, None, None). .await. Linker hot path, one invocation per build per archive/size/objcopy step. Cascades into every per-platform *_linker.rs.
HIGH crates/fbuild-build/src/{avr,esp32,esp8266,ch32v,sam,silabs,teensy,renesas,nrf52,generic_arm}/[*_]linker.rs All share the same shape: use fbuild_core::subprocess::run_command; … run_command(&args_ref, None, None, None)?; for the link driver invocation. ~10 files, each one call. .await on run_command_async. Cascades from subprocess::run_command async conversion. Linker invocations are ~1s+ on ESP32; making them async lets tokio-console show the "linking" task.
HIGH crates/fbuild-build/src/esp32/orchestrator/embed.rs:26-80 (process_embed_files) Sync subprocess::run_command([objcopy, …], project_dir, …) in a for file in embed_files loop. Each file is one objcopy invocation. .await per file, or FuturesUnordered for parallel embed if file count grows. Cascades from subprocess::run_command async. ESP32-only, but real per-build cost on FS / partition / OTA embeds.
HIGH crates/fbuild-build/src/esp32/orchestrator/boot_artifacts.rs:90, 141 Sync subprocess::run_command(...) for esptool elf2image + bootloader merging. .await. Cascades from subprocess::run_command async. ESP32-only, once per build.
HIGH crates/fbuild-build/src/stm32/orchestrator/arduino_mbed.rs:272 Sync subprocess::run_command(...) (memap or Mbed-specific tool). .await. Cascades from subprocess::run_command async. STM32-only.
HIGH crates/fbuild-build/src/script_runtime.rs:111, 297 Sync subprocess::run_command([python, harness, input], …) to evaluate PlatformIO extra_scripts. .await. Cascades from subprocess::run_command async. Per-build (once).
HIGH crates/fbuild-build/src/symbol_analyzer/mod.rs:15, 161, 216, 359 Sync subprocess::run_command_with_stdin + run_command for nm / c++filt. .await. Cascades from subprocess async. Off the hot per-TU path but still per-build.
HIGH crates/fbuild-build/src/shrink/probe.rs:33, 348 Sync subprocess::run_command_with_stdin + run_command for preprocessor probes (#493, #502). .await. Cascades from subprocess async. Per-build (once per shrink decision).
MEDIUM crates/fbuild-build/src/compile_many.rs:121-171 (seed_stage2_core_from_stage1) std::fs::read_dir + std::fs::hard_link / std::fs::copy loop on every stage-2 worker. tokio::fs::read_dir + tokio::fs::hard_link / tokio::fs::copy. Cascades from run_stage2 async conversion. Per stage-2 sketch, so up to N times per compile-many. Hardlink is fast but on cold-cache CI runs the byte-copy fallback is real I/O.
MEDIUM crates/fbuild-build/src/framework_core_cache.rs:124-176 (hydrate / store / copy_artifact_files) std::fs::read, std::fs::create_dir_all, std::fs::read_dir, std::fs::copy, std::fs::remove_file, std::fs::File::options().open(). tokio::fs::*. Per-build (hydrate + store calls bracket the framework compile step). The set_modified call on a tokio::fs::File needs the std handle — into_std().await then sync metadata write is OK to keep sync inside a spawn_blocking.
MEDIUM crates/fbuild-build/src/pipeline/compile.rs:61-114 (compile_local_libraries) std::fs::read_dir(&local_lib_dir) + std::fs::create_dir_all(&lib_build_dir) inside per-library loop. tokio::fs::*. Cascades from the compile-pipeline async conversion. Per-build (once per library).
MEDIUM crates/fbuild-build/src/pipeline/sequential.rs:116, 229 (build_log_mutex) let build_log_mutex = std::sync::Mutex::new(ctx.build_log); shared across parallel compile workers. Mutex::lock calls inside worker bodies (collect_warnings path in compile.rs). Once workers are tokio tasks, switch to tokio::sync::Mutex<BuildLog> — but only if the lock would be held across .await. The current usage holds the lock only for log.push / collect_warnings, both sync. Safer: keep std::sync::Mutex and assert no .await while held. Held only for log.push(...) / BuildLog::collect_warnings(...). If we keep these sync inside the lock scope, std::sync::Mutex is fine in async code per the tokio guidelines. Document the contract.
MEDIUM crates/fbuild-build/src/esp32/orchestrator/build.rs:595 Same std::sync::Mutex<BuildLog> pattern as sequential.rs. Same: keep std::sync::Mutex but document the no-await-while-held contract. Same as above.
MEDIUM crates/fbuild-build/src/build_output.rs:13, 22, 330 and lib.rs:164 (BuildParams::log_sender: Option<std::sync::mpsc::Sender<String>>) std::sync::mpsc::Sender carrying log lines from build workers back to the caller. If the receiver is on the runtime (the daemon's WebSocket task pumps build output), switch to tokio::sync::mpsc to avoid the receiver doing a recv() that blocks a tokio worker. If senders run in spawn_blocking and the receiver is a sync thread, std::sync::mpsc is fine. The receiver side lives in fbuild-daemon (build endpoint streams). Decision belongs to the partner audit but flagged here because BuildParams carries the type.
MEDIUM crates/fbuild-build/src/source_scanner.rs:270, 483, 584, 589 std::fs::read_to_string + std::fs::write during source discovery (per-build, not per-TU). tokio::fs::*. Per-build setup (once); lower priority than compile-time fs ops.
MEDIUM crates/fbuild-build/src/zccache_embedded.rs:282-309 (check_fingerprint_embedded, mark_fingerprint_success_embedded) std::fs::create_dir_all + sync TwoLayerCache::check (rayon under the hood). tokio::fs::create_dir_all. Wrap TwoLayerCache::check in spawn_blocking since it spins rayon workers. Per-build (once before each compile phase).
LOW crates/fbuild-build/src/compiler.rs:416 (COMPILER_IDENTITY_CACHE: OnceLock<Mutex<HashMap<PathBuf, String>>>) Sync Mutex held only for HashMap insert/lookup. Keep as std::sync::Mutex — never held across .await. Document. No change needed; documenting the contract is enough.
LOW crates/fbuild-build/src/parallel.rs:16-25 (default_jobs, effective_jobs) std::thread::available_parallelism(). Keep sync. Pure computation; no I/O.

The BuildOrchestrator trait question

Should BuildOrchestrator::build become async fn build(...)?

Pros (in order of leverage)

  1. One runtime end-to-end. Today fbuild-daemon calls orchestrator.build(&params) from inside an axum HTTP handler. That handler is on the tokio runtime; the call into build is sync and blocks the worker for the entire compile (seconds to minutes). Making build async lets the worker yield naturally at every .await point — the dominant pattern in audit: go fully async — whole-app tokio runtime sharing for tokio-console (meta) #813's motivation.
  2. Tokio-console visibility for the whole build pipeline. Every tokio::spawn inside an async build (parallel compiles, linker, esptool) shows up in the console. Today the console sees the HTTP request task and then a silent block.
  3. Structured cancellation. The daemon's tokio::sync::watch shutdown signal can tokio::select! against the build future and abort all in-flight subprocesses transitively. Sync build can only check a flag between phases.
  4. Per-request deadlines compose (audit: blocking operations with no timeout (meta) #802). The partner audit at audit: blocking operations with no timeout (meta) #802 wants tokio::time::timeout(...) around build calls. That only works on an async future.
  5. Mechanical cascade. Once compile_source and subprocess::run_command are async, every per-platform build impl is mostly .await insertions — no logic changes.

Cons (and mitigations)

  1. 14 orchestrator impls to rewrite. Apollo3, AVR, CH32V, ESP32, ESP8266, NRF52, NXP LPC, RP2040, SAM, Silabs, STM32, Teensy, Renesas, plus the synthetic generic_arm path. Each is hundreds of lines of orchestration. Mitigation: the actual changes are mechanical — async fn build + .await on every compile_* / link_* / subprocess::* call. No business-logic changes. Land it as one PR per orchestrator behind a feature gate.
  2. async fn in traits — dyn support. Rust 1.94.1 supports native async fn in traits, but Box<dyn BuildOrchestrator> doesn't compose cleanly with native AFITs (the returned future isn't Send-bounded by default; you'd need return_type_notation or a typed-future return). Mitigation: use #[async_trait] (the async-trait crate). It already works with &dyn / Box<dyn>. The runtime overhead (boxed future per call) is trivial vs the build time itself.
  3. PlatformSupport::install_deps and Compiler::compile_one need the same treatment. Both are sync traits called from inside build. Mitigation: same async_trait, same mechanical cascade.
  4. compile_many orchestration code lives in fbuild-build and calls get_orchestrator(platform).build(...) from a sync thread-pool worker today. Mitigation: compile_many itself becomes async (see CRITICAL row for run_stage2). The stage-2 fan-out becomes FuturesUnordered<async_compute> instead of thread::scope.

Recommendation

Convert BuildOrchestrator::build to async fn via #[async_trait]. The mechanical cascade is bounded, the wins are load-bearing for #813's end-state, and the only "real" design question — async_trait vs native AFITs — has an obvious near-term answer (async_trait while dyn support for native AFITs is still rough). Revisit native AFITs in a follow-up once return_type_notation stabilises and Rust's dyn AsyncTrait ergonomics catch up.

What was searched

Patterns (via Grep on crates/fbuild-build/src/):

  • std::thread::spawn|thread::spawn — 0 hits (all parallelism goes through std::thread::scope).
  • JoinHandle|\.join\(\) — 2 hits, both thread::scope joins in parallel.rs + compile_many.rs.
  • std::thread::scope|scope.spawn — 2 sites: parallel.rs:80-143, compile_many.rs:665-736.
  • rayon|par_iter|ThreadPool — 0 direct uses (rayon shows up only in upstream zccache::fingerprint, which is documented as intentionally sync-on-CPU-pool in zccache_embedded.rs:274).
  • Command::new|process::Command — 0 direct uses; all subprocess routing goes through fbuild_core::subprocess::run_command(_*).
  • subprocess::run_command (the wrapper) — 28 call sites across compiler.rs, pipeline/compile.rs, linker.rs, every *_linker.rs, esp32/orchestrator/embed.rs, esp32/orchestrator/boot_artifacts.rs, stm32/orchestrator/arduino_mbed.rs, script_runtime.rs, symbol_analyzer/mod.rs, shrink/probe.rs.
  • Mutex::new|RwLock::new — 7 hits. None held across .await today (no .await in the crate). The 3 that would matter post-conversion are listed under MEDIUM findings.
  • std::sync::mpsc|crossbeambuild_output.rs exposes std::sync::mpsc::Sender<String> on BuildParams.log_sender. Cross-crate API; flagged for the partner audit.
  • std::fs:: — 329 occurrences across 50 files. The ones on the per-compile / per-build hot path are listed (compile_many seed/copy, framework_core_cache hydrate/store, pipeline compile_local_libraries, compiler.rs output writes).
  • block_on — 1 real call site: zccache_embedded.rs:216 inside compile_blocking. (Plus four docstring references and one in compile_backend.rs documenting the runtime handle.)
  • impl BuildOrchestrator for|fn build\( — confirmed 14 sync fn build impls across the platform tree, plus the trait itself in lib.rs:201-204.

Files specifically read end-to-end: compile_many.rs, parallel.rs, compiler.rs (compile_source + trait), pipeline/mod.rs, pipeline/compile.rs, pipeline/sequential.rs, zccache_embedded.rs, compile_backend.rs, linker.rs (subprocess sites), avr/avr_linker.rs (representative per-platform linker), esp32/orchestrator/build.rs (representative orchestrator top-level), esp32/orchestrator/embed.rs, build_output.rs, framework_core_cache.rs.

Out of scope (deliberately left alone)

  • zccache::fingerprint's rayon parallelism — documented in zccache_embedded.rs:274 as intentionally sync-on-CPU-pool. Genuinely CPU-bound (BLAKE3 + path walks). Keep on rayon; only wrap callers in spawn_blocking when they're invoked from the runtime.
  • std::thread::available_parallelism() in parallel.rs:17, compile_many.rs:71/83, stage2_jobs_per_worker — pure computation, no I/O.
  • Test-only code — every #[cfg(test)] block in parallel.rs, compile_many.rs, framework_core_cache.rs, compiler_tests.rs uses std::thread::sleep / std::fs::write legitimately and is not on the production runtime.
  • Drop impls — none in fbuild-build reach for sync I/O on a hot path.
  • One-shot startup config reads — e.g. arduino_props.rs parsing platform.txt at orchestrator construction; fine to keep sync.
  • std::sync::Mutex<BuildLog> in pipeline/sequential.rs:116 and esp32/orchestrator/build.rs:595 — flagged MEDIUM, but the lock is only held for sync log.push / collect_warnings. Per tokio's "do not block the runtime" guidance, sync Mutex is legal in async code as long as the critical section doesn't .await. Document the contract rather than swap the type.
  • COMPILER_IDENTITY_CACHE (compiler.rs:63) — same as above; cache lookup never crosses .await.

Sub-issue of #813.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions