Skip to content

audit: cross-cutting async/sync architecture (sub-issue of #813) #818

Description

@zackees

Sub-issue of #813.

Intro

The 6 sibling audits cover per-crate findings (fbuild-build, fbuild-daemon, fbuild-serial, fbuild-deploy, fbuild-packages, fbuild-cli+fbuild-python, plus fbuild-core/foundationals). This audit covers the structural / cross-cutting concerns that don't fit cleanly in one crate, plus the two decisions that gate the whole migration:

  1. Whether to adopt async-trait vs native async-fn-in-trait for the dispatch traits.
  2. Which trait to convert first — i.e. the dependency-ordering of the trait migration.

Everything else (per-crate std::thread::spawntokio::spawn, reqwest::blockingreqwest, etc.) is in the sibling audits' scope.


1. The async-trait decision (RECOMMENDATION)

Constraints

  • MSRV is 1.94.1 (rust-toolchain.toml) — supports native async fn in traits (RFC 3185, stabilized 1.75).
  • All four dispatch traits today use Box<dyn Trait>-style dynamic dispatch:
    • Box<dyn BuildOrchestrator> returned by get_orchestrator() (crates/fbuild-build/src/lib.rs:207)
    • Box<dyn PlatformSupport> returned by get_platform_support() (crates/fbuild-build/src/lib.rs:72)
    • Box<dyn Deployer> constructed inside spawn_blocking (crates/fbuild-daemon/src/handlers/operations/deploy.rs:418)
    • Box<dyn Compiler> is the per-orchestrator field type
  • Native async-fn-in-trait is NOT dyn-compatible without boilerplate. Returning impl Future from a trait method requires -> impl Future<Output = ...> which is auto-RPIT and incompatible with object safety without a manual Box::pin(async move { ... }) wrapper in every impl. That's worse than async-trait.
  • async-trait is already a workspace dep (Cargo.toml:69, async-trait = "0.1") and already wired into 4 crates: fbuild-build, fbuild-deploy, fbuild-serial, fbuild-daemon.
  • Existing in-tree precedent: EmulatorRunner in crates/fbuild-daemon/src/handlers/emulator/runners.rs:17 is the established pattern — #[async_trait::async_trait] on the trait + each impl block, with three impls (QemuRunner, Avr8jsRunner, SimavrRunner).

Recommendation: adopt async-trait for all four dispatch traits

BuildOrchestrator, Compiler, Deployer, PlatformSupport should all be migrated to #[async_trait::async_trait] with async fn methods. The cost is one Box::pin per call (negligible against a multi-second build/deploy). The benefit:

  • One macro per trait, copy/paste from EmulatorRunner.
  • No allocator overhead on the call sites that today are already Box<dyn ...>-dispatched.
  • Keeps the existing Box<dyn ...> factory API unchanged (get_orchestrator(platform) -> Result<Box<dyn BuildOrchestrator>>) — the daemon and CLI call sites need zero signature changes outside the .await insertion.

Static-dispatch traits (McuConfig, WatchSetStampCache, BootModeClassifier, DtrRtsControl, etc.) where there's no Box<dyn ...> storage do NOT need async-trait — if they go async, they can use native async-fn-in-trait. Most of these are sync-only and probably stay sync.


2. Trait migration order (RECOMMENDATION)

The dependency graph of "which sync API does this call" determines the conversion order. Convert leaves first, parents last.

Convert in this order:

  1. fbuild_core::subprocess::run_command / run_command_with_stdin / run_command_passthrough (crates/fbuild-core/src/subprocess.rs:81,100,151) → tokio::process::Command-backed async equivalents.

    • Why first: 48 call sites across 32 files. Every compiler, linker, deploy tool, esptool, avrdude, pio CLI delegation, c++filt, clang-format, size, objcopy, and pyocd invocation flows through this. Today the daemon wraps each into spawn_blocking. After conversion, callers stop needing the wrapper and the daemon's tokio worker isn't blocked on running_process::NativeProcess::wait.
    • Cascade: every impl Compiler, impl Linker, impl Deployer, and every install_deps flow.
    • Migration path: dual-track for one sprint. Keep sync run_command as pub fn run_command(...) for non-daemon callers (CLI diagnostic subcommands, fbuild-test-support). Add pub async fn run_command_async(...). Migrate the daemon hot paths first.
  2. Compiler::compile_one (crates/fbuild-build/src/compiler.rs:71) → async fn compile_one(...).

    • Why second: 11 impls (AvrCompiler, Esp32Compiler, Esp8266Compiler, TeensyCompiler, Stm32* via ArmCompiler, Nrf52Compiler, Rp2040* via ArmCompiler, SamCompiler, RenesasCompiler, Ch32vCompiler, SilabsCompiler, plus FakeCompiler in tests). The default compile_c / compile_cpp / compile provided methods cascade automatically. Today the compile_blocking path in zccache_embedded.rs:198 exists specifically because compile_one is sync and needs to call into the async ZccacheService. After this conversion, compile_blocking collapses into a direct .await.
    • Cascade: every pipeline::compile, parallel.rs::compile_many, framework_libs::compile_*.
  3. BuildOrchestrator::build (crates/fbuild-build/src/lib.rs:203) → async fn build(&self, params: &BuildParams) -> Result<BuildResult>.

    • Why third: 14 impls. Once compilers are async, the orchestrators' build() bodies — which internally call compile_one through compile_c/compile_cpp — can .await naturally. The daemon's spawn_blocking(move || orchestrator.build(&params)) at handlers/operations/build.rs:327 collapses into a direct .await.
    • Side-effect: BuildParams.log_sender: Option<std::sync::mpsc::Sender<String>> (crates/fbuild-build/src/lib.rs:164) and BuildLog::sender (crates/fbuild-core/src/build_log.rs:21) should convert to tokio::sync::mpsc::UnboundedSender<String> at the same time — the existing daemon code already converts via a bridge spawn_blocking task at build.rs:201,314-323 whose only purpose is sync→async mpsc. That bridge deletes itself when the field type flips.
  4. PlatformSupport::install_deps (crates/fbuild-build/src/lib.rs:63) → async fn install_deps(...).

    • Why fourth: 14 impls. Today the daemon's install_deps handler wraps the whole call in spawn_blocking (handlers/operations/install_deps.rs:96). All install_deps impls ultimately call into fbuild-packages which has the ridiculous-looking pattern of constructing a new tokio::runtime::Runtime per ensure_installed (24 occurrences). After install_deps is async, those Runtime::new() calls (fbuild-packages/src/toolchain/*.rs, fbuild-packages/src/library/*.rs) collapse — the existing block_on_package_future helper at fbuild-packages/src/lib.rs:56 is the same anti-pattern, written once instead of inlined 24 times, and it disappears entirely.
  5. Deployer::deploy (crates/fbuild-deploy/src/lib.rs:96) → async fn deploy(...).

    • Why last: 6 impls (Esp32Deployer, AvrDeployer, TeensyDeployer, LpcDeployer, plus two test deployers). Smaller blast radius than build, but the deploy handler does the most complex spawn_blocking dance (handlers/operations/deploy.rs:237,395-418,772) including the trusted_hash_update callback pattern. Doing this last gives the migration time to settle the daemon's preemption + serial-recovery contracts before they get rewritten.
  6. Package::ensure_installed, Toolchain / Framework trait methods (fbuild-packages/src/lib.rs:71,83,155) — these are already implemented in terms of staged_install (which IS async), so they can be flipped at the same time as step 4. The migration deletes block_on_package_future and all 24 tokio::runtime::Runtime::new() per-impl scaffolds.

The EmulatorRunner trait (runners.rs:17) is already async and is the reference shape.


3. Inventory of spawn_blocking / block_on

#[tokio::main] / runtime construction (50 sites)

Location Purpose After migration
crates/fbuild-daemon/src/main.rs:33 Daemon #[tokio::main] — the One True Runtime Keep
crates/fbuild-cli/src/main.rs:22 CLI multi-thread runtime for HTTP client Keep (separate process)
crates/fbuild-cli/src/cli/port_scan.rs:120 Dedicated runtime to escape current-thread context Remove (caller becomes async)
crates/fbuild-python/src/serial_monitor.rs:69 PyO3-internal runtime per SerialMonitor Keep (sync→async bridge to Python)
crates/fbuild-python/src/lib.rs:312,330,350 PyO3 sync HTTP wrappers Keep (PyO3 boundary)
crates/fbuild-packages/src/lib.rs:63 (block_on_package_future) Bridge for sync ensure_installed callers Delete after step 4
crates/fbuild-packages/src/toolchain/*.rs × 8 Per-toolchain Runtime::new() inside ensure_installed Delete after step 4
crates/fbuild-packages/src/library/*.rs × 16 Per-library Runtime::new() inside ensure_installed Delete after step 4
crates/fbuild-daemon/src/handlers/emulator/tests_process.rs:195 Test-only N/A
crates/fbuild-serial/src/manager/tests.rs:136 Test-only N/A

spawn_blocking (16 production call sites)

Location Classification Notes
fbuild-daemon/src/handlers/operations/build.rs:314 Removable Sync→async mpsc bridge for log_sender. Deletes after step 3.
fbuild-daemon/src/handlers/operations/build.rs:327 Removable Wraps orchestrator.build(). Deletes after step 3.
fbuild-daemon/src/handlers/operations/build.rs:520 Removable Build-related sync call. Deletes after step 3.
fbuild-daemon/src/handlers/operations/install_deps.rs:96 Removable Wraps install_platform_deps. Deletes after step 4.
fbuild-daemon/src/handlers/operations/deploy.rs:237 Removable Deploy pre-step. Deletes after step 5.
fbuild-daemon/src/handlers/operations/deploy.rs:418 Removable Main Deployer::deploy wrap. Deletes after step 5.
fbuild-daemon/src/handlers/operations/deploy.rs:772 Removable Post-deploy recovery. Deletes after step 5.
fbuild-daemon/src/handlers/operations/reset.rs:44 Removable ESP hard reset. Deletes after esp_reset is async.
fbuild-daemon/src/handlers/cache.rs:11,49 Removable Cache reads. tokio::fs after migration.
fbuild-daemon/src/handlers/emulator/select.rs:310 Forced-sync (today) Emulator probe. May be removable.
fbuild-cli/src/cli/compile_many.rs:156 Forced-sync CLI-side, calls sync compile_many API. Removable when compile_many goes async.
fbuild-serial/src/manager.rs:127,200,406,816,863 Genuinely forced-sync serialport::open and esp_hard_reset_blocking wrap the OS-level serialport crate which has NO async API. These stay as spawn_blocking indefinitely.

block_on (5 production call sites + 24 transitional in fbuild-packages)

Location Classification Notes
fbuild-build/src/zccache_embedded.rs:216 (compile_blocking) Transitional bridge Documented — disappears when Compiler::compile_one goes async (step 2).
fbuild-cli/src/main.rs:26 Genuine CLI runtime entry. Stays.
fbuild-python/src/serial_monitor.rs × 13 Genuine PyO3 sync→async bridge. Stays — that's the whole point of the crate.
fbuild-python/src/lib.rs:313,331,351 Genuine PyO3 bridge. Stays.
fbuild-packages/src/lib.rs:61 (block_on_package_future) Transitional Deletes after step 4. Currently called from 12+ sites in library/ and toolchain/.
fbuild-daemon/src/handlers/emulator/tests_process.rs:197 Test-only N/A

4. Findings table

Severity Concern Location Cascade Proposal
CRITICAL Compiler::compile_one is sync; 11 impls; on the daemon's hottest path. Every compile goes through this. The compile_blocking workaround at zccache_embedded.rs:198 exists ONLY because this is sync. crates/fbuild-build/src/compiler.rs:71 11 impls + compile_c/compile_cpp/compile default methods + every pipeline::compile + parallel.rs + framework_libs.rs #[async_trait] + async fn compile_one; delete compile_blocking.
CRITICAL BuildOrchestrator::build is sync; 14 impls; daemon wraps in spawn_blocking. crates/fbuild-build/src/lib.rs:203 14 impls + daemon build.rs:327 + install_deps.rs cascade #[async_trait] + async fn build; remove daemon's spawn_blocking.
CRITICAL Deployer::deploy is sync; 6 impls; daemon wraps in spawn_blocking with complex callback closures. crates/fbuild-deploy/src/lib.rs:96 6 impls + daemon deploy.rs:418 (the biggest spawn_blocking in the codebase) #[async_trait] + async fn deploy. Watch the trusted_hash_update move-out pattern at deploy.rs:423 — needs an Arc<Mutex<...>> or oneshot::Sender.
CRITICAL PlatformSupport::install_deps is sync; 14 impls; cascades into fbuild-packages where each impl rebuilds a tokio Runtime per call. crates/fbuild-build/src/lib.rs:63 14 impls + 24 Runtime::new() constructions in fbuild-packages #[async_trait] + async fn install_deps; collapse 24 per-impl Runtime::new() blocks; delete block_on_package_future.
HIGH BuildParams.log_sender: Option<std::sync::mpsc::Sender<String>> is a sync mpsc; the daemon spawns a dedicated spawn_blocking bridge task whose entire job is sync→async mpsc relay. crates/fbuild-build/src/lib.rs:164 and daemon bridge crates/fbuild-daemon/src/handlers/operations/build.rs:201,314-323 Every orchestrator's BuildLog plumbing + build_output.rs constructors at lines 13,22,330 Convert to tokio::sync::mpsc::UnboundedSender<String> simultaneously with step 3. Bridge task deletes itself.
HIGH fbuild_core::BuildLog::sender: Option<std::sync::mpsc::Sender<String>> is the field underlying the log_sender plumbing. crates/fbuild-core/src/build_log.rs:11,21 Every BuildLog::push call site in every orchestrator Flip to tokio::sync::mpsc::UnboundedSender<String>. UnboundedSender::send is sync from any context (no .await required), so push call sites don't have to become async.
HIGH Package::ensure_installed / Toolchain / Framework trait methods are sync, forcing all 24 impls to build a per-call tokio Runtime. The block_on_package_future helper centralizes the anti-pattern but doesn't remove it. crates/fbuild-packages/src/lib.rs:56,71,83,155 All 16 library/* + 8 toolchain/* impls Trait → #[async_trait]; delete block_on_package_future; delete 24 Runtime::new() blocks.
MEDIUM Daemon handlers/operations/reset.rs:44 wraps esp_hard_reset_blocking in spawn_blocking. The underlying esp_reset.rs:94 is genuinely synchronous serialport I/O. crates/fbuild-serial/src/esp_reset.rs:94, crates/fbuild-daemon/src/handlers/operations/reset.rs:44 None outside reset path Keep as spawn_blockingserialport crate has no async API. Document the contract.
MEDIUM 5 spawn_blocking calls in fbuild-serial/src/manager.rs wrap serialport::open. crates/fbuild-serial/src/manager.rs:127,200,406,816,863 None Keep as spawn_blocking — same serialport crate limitation. The existing comments at manager/tests.rs:124 already document this as a regression guard.
MEDIUM port_scan.rs:120 (CLI) constructs a dedicated runtime explicitly to escape the surrounding #[tokio::main] context. crates/fbuild-cli/src/cli/port_scan.rs:120 None Refactor to take an explicit runtime handle or restructure as spawn_blocking from the CLI's existing runtime.
LOW EmulatorRunner is already #[async_trait] and is the reference shape. Other dispatch traits should follow the exact same pattern. crates/fbuild-daemon/src/handlers/emulator/runners.rs:16-23,45-46,160,281 N/A — reference doc Cite this trait in the design RFC and copy the macro placement verbatim.
LOW tokio::runtime::Runtime::new() is called 33+ times across fbuild-packages and fbuild-python. Most are inside library/toolchain ensure_installed impls and are the same anti-pattern. crates/fbuild-packages/src/toolchain/*.rs (8 files), crates/fbuild-packages/src/library/*.rs (16 files), crates/fbuild-python/src/lib.rs × 3, crates/fbuild-python/src/serial_monitor.rs × 1 Same as the block_on_package_future cascade After steps 3 + 4 + 6 land, only the PyO3 + CLI runtime constructions remain.

Mutex<T> across .await review (no findings)

Audited every Mutex<...> field in pub structs in the workspace (21 hits). None are held across .await:

  • fbuild-serial/src/manager.rs:39Mutex<VecDeque<String>> for serial buffer; only held briefly inside sync code, never across await.
  • fbuild-daemon/src/status_manager.rs:96Mutex<DaemonStatus>; held briefly for snapshot reads.
  • fbuild-daemon/src/device_manager.rs:167,173,174 — three Mutex<...> for device tracking; held in sync handler code only.
  • fbuild-packages/src/disk_cache/index/mod.rs:83Mutex<Connection> for SQLite; held only inside spawn_blocking-wrapped code.

After the trait migration these need re-auditing because today's spawn_blocking boundaries are what keep these safe. Several may need to flip to tokio::sync::Mutex once the surrounding code becomes async.

Cross-crate std::sync::mpsc channels (1 finding)

The only cross-crate sync mpsc is the BuildLog/BuildParams.log_sender chain (fbuild-corefbuild-buildfbuild-daemon). Tracked above as a HIGH finding. No other cross-crate sync channels exist.


5. What was searched

  • #[tokio::main] — 1 production use (fbuild-daemon/src/main.rs:33) + 3 doc comments.
  • Runtime::new / Builder::new_multi_thread — 33 production sites, 2 test sites.
  • spawn_blocking — 16 production sites + 5 in serial for serialport::open.
  • block_on — 5 production sites + 24 transitional in fbuild-packages + 13 PyO3 + 1 transitional compile_blocking.
  • async-trait / #[async_trait] — 4 trait + 3 impl uses (all in runners.rs); workspace dep already wired in 4 crates.
  • pub trait \w+ — 17 public traits, of which 4 are Box<dyn ...>-dispatched (the migration targets) and 1 (EmulatorRunner) is already async.
  • pub async fn (101 total) vs pub fn (1086 total) — async surface is ~9% today; expected to land near 30-40% post-migration.
  • Mutex<...> / RwLock<...> in pub-struct fields — 21 hits, none held across .await today.
  • std::sync::mpsc::Sender|Receiver — 1 cross-crate chain (log_sender).
  • BuildOrchestrator / Compiler / Deployer / BuildLog / log_sender — confirmed cascade counts: 14 / 11 / 6 / pervasive / pervasive impls respectively.
  • subprocess::run_command* — 48 call sites across 32 files (the dominant cascade target).

6. Out-of-scope notes


TL;DR for the roadmap: adopt async-trait (it's already a workspace dep and EmulatorRunner is the reference). Migrate in 6 ordered steps: subprocess::run_commandCompiler::compile_oneBuildOrchestrator::build + BuildLog/log_senderPlatformSupport::install_depsDeployer::deployPackage/Toolchain/Framework. Five spawn_blocking sites in fbuild-serial/src/manager.rs stay forever (no async serialport crate exists). The compile_blocking bridge and block_on_package_future helper both disappear when their parent traits go async.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions