Sub-issue of #813 — fbuild-packages audit lens: convert sync code to async on the daemon's tokio runtime. Partner audit is #805 ("blocking HTTP without timeout"); this issue covers the same call sites under a different lens (async-ification). Per #813's intro: the fixes compose — the end-state is a single shared async reqwest::Client configured with .timeout(...) and .connect_timeout(...), so #805's CRITICAL findings collapse to "set a timeout on the shared client" once this issue's plumbing lands.
The crate is in surprisingly good shape on the network side — almost every HTTP call already uses async reqwest::get(...).await. The dominant async anti-pattern is not reqwest::blocking, it's the block_on(...) sandwich: every Package::ensure_installed is a sync trait method that owns an async staged_install, so each implementation does tokio::runtime::Handle::block_on(...). That's ~30 call sites in this crate alone, hides every install from tokio-console, and prevents axum from doing anything else on the worker during a 30-second toolchain download.
The shared async Client proposal
Today every HTTP call in the crate uses the convenience reqwest::get(url), which constructs a brand-new reqwest::Client per request — no connection pooling, no shared timeout config, no shared connector. The seven call sites that need this are:
downloader.rs:113 (get_with_retry)
downloader.rs:178 (open_with_retry for streaming download)
toolchain/clang.rs:147 (clang manifest fetch)
library/registry.rs:24 / :77 / :189 (PlatformIO registry: 3 calls)
library/arduino_api.rs:51 (the only reqwest::blocking::* in the crate)
Proposed: add crates/fbuild-packages/src/http.rs:
use std::time::Duration;
use tokio::sync::OnceCell;
use reqwest::Client;
static HTTP_CLIENT: OnceCell<Client> = OnceCell::const_new();
pub async fn client() -> &'static Client {
HTTP_CLIENT
.get_or_init(|| async {
Client::builder()
.connect_timeout(Duration::from_secs(10))
.timeout(Duration::from_secs(60 * 10)) // big toolchains
.pool_idle_timeout(Duration::from_secs(90))
.user_agent(concat!("fbuild/", env!("CARGO_PKG_VERSION")))
.build()
.expect("reqwest client builder is infallible with our config")
})
.await
}
Then every call site becomes http::client().await.get(url).send().await — picks up timeouts uniformly (closes #805) and reuses TCP/TLS connections across the daemon's lifetime. The blocking feature on the workspace reqwest = { features = [..., "blocking"] } can be dropped once arduino_api.rs migrates.
OnceCell (tokio) is the right primitive — OnceLock (std) doesn't have async init, and the existing pattern in disk_cache/lease.rs:15 uses OnceLock for non-async state, so the precedent is "tokio for async init, std for sync init".
Findings table
| Severity |
File:line |
Current pattern |
Proposed async replacement |
Notes |
| CRITICAL |
library/arduino_api.rs:30-66 (ensure_arduino_api) |
sync fn; reqwest::blocking::get(ARDUINO_API_URL) + sync std::fs::write + sync extractor::extract |
make ensure_arduino_api async; use http::client().await; await response body; wrap extractor::extract in spawn_blocking |
Only reqwest::blocking left in the crate. Called from arduino_core.rs, attiny_core.rs, arduino_mbed_core.rs framework validators — currently runs inside staged_install's validate closure, which is sync. Either widen the validate signature to FnOnce(&Path) -> BoxFuture<...> or arrange for ArduinoCore-API injection to happen outside the sync validate. |
| CRITICAL |
downloader.rs:113, :178 |
reqwest::get(url).await (creates a new Client per call) |
http::client().await.get(url).send().await |
Composes with #805 (timeout). Each Client allocation costs a TLS connector teardown — currently every retry allocates a fresh connector. |
| CRITICAL |
lib.rs:60-67 (block_on_package_future) |
tokio::task::block_in_place(|| handle.block_on(future)) |
delete; make Package::ensure_installed an async fn (or return BoxFuture) |
Bridge is invoked from ~30 call sites (see below). block_in_place is a yellow-flag — it shifts the current worker into a blocking pool slot just so it can .block_on. Tokio-console sees a gap during the entire install. |
| HIGH |
library/library_manager.rs:290-322 (ensure_libraries_sync) |
handle.block_on(ensure_libraries(...)) from sync entry |
delete the _sync wrapper; caller (build orchestrator) is already async or should be |
Already has an async ensure_libraries. Callers in fbuild-build should .await directly. |
| HIGH |
library/library_compiler.rs:202-261 (compile_library_with_jobs parallel path) |
std::thread::scope + std::sync::Mutex work-stealing queue across N OS threads |
JoinSet<Result<...>> of spawn_blocking tasks (compiler subprocess is sync), or a Semaphore-bounded FuturesUnordered of tokio::process::Command::output().await futures |
Subprocess spawn is sync today via fbuild_core::subprocess::run_command. CPU is dominated by gcc/g++ child process, not by fbuild itself, so this can stay rayon/spawn_blocking — but the std::thread::scope ties N tokio-invisible OS threads to a single library compile. The work-queue mutex must remain std::sync::Mutex (never held across .await) on the spawn_blocking path; tokio's Mutex is wrong here. |
| HIGH |
All 22 toolchain/*.rs + library/*_core.rs + library/*_framework.rs Package::ensure_installed impls |
each one: if let Ok(handle) = Handle::try_current() { handle.block_on(staged_install) } else { Runtime::new().block_on(staged_install) } |
change Package trait: async fn ensure_installed(&self) -> Result<PathBuf> (or fn ensure_installed(&self) -> BoxFuture<...> if dyn-compatibility is needed) |
The entire duplicated Handle::try_current / fresh-runtime fallback dance disappears. Full call-site list in the "What was searched" section. Each one is a 1-2 hour mechanical conversion once the trait flips. |
| HIGH |
lnk/resolver.rs:117 |
sync fetch_lnk_blob calls block_on_package_future(async { download_file(...) }) |
make fetch_lnk_blob (and callers up to Resolver::resolve) async; .await directly |
lnk/resolver.rs is in the link-blob fetch path — runs at link time on the daemon. |
| HIGH |
toolchain/esp32_metadata.rs:74-84 (resolve_toolchain_url_sync) |
crate::block_on_package_future(resolve_toolchain_url(...)) |
delete _sync wrapper; only call site is toolchain/esp32.rs:79 which is already inside an ensure_installed that should be async |
Pure transitive consequence of the Package trait flip. |
| HIGH |
library/esp32_framework/libs.rs:124, :132, :184, :192 |
nested handle.block_on(crate::downloader::download_file(...)) inside ESP32 framework lib download |
make the surrounding function async; await directly |
Currently the block_on is inside another sync function called from ensure_installed. Doubly anti-pattern. |
| MEDIUM |
downloader.rs:299-319 (download_all) |
tokio::spawn per URL + await each JoinHandle sequentially |
use futures::stream::FuturesUnordered or tokio::task::JoinSet for first-error fail-fast |
The current pattern works but doesn't fail-fast: if URL #1 takes 60s and URL #5 errors at 2s, we wait for URL #1 before observing the error. JoinSet::join_next would surface URL #5's error immediately. |
| MEDIUM |
library/library_manager.rs:73-93 parallel download collect loop |
Vec<JoinHandle> collected sequentially via handle.await |
JoinSet with join_next() for early-error reporting |
Same fail-fast argument as download_all. |
| MEDIUM |
downloader.rs:322-339 (verify_checksum, sync) |
sync std::fs::read + CPU hashing |
make callers always use the existing verify_checksum_async at :342 (uses tokio::fs::read) and delete sync version |
The sync version is called from staged_install (lib.rs:405) inside an async fn — i.e. blocking I/O on the runtime. Easy fix. Hashing itself can stay sync (CPU-bound). |
| LOW |
lib.rs:37-54 dir_size (called from record_install_in_disk_cache inside staged_install) |
sync recursive std::fs::read_dir walk on potentially large extracted toolchains |
wrap in tokio::task::spawn_blocking |
Runs once per install (post-extract), not on a hot path. Thousands of stat syscalls on large ESP-IDF tarballs. |
| LOW |
lib.rs:354, :376-380, :421, :427, :435 |
sync std::fs::create_dir_all / remove_dir_all / remove_file / rename / write inside staged_install (which is async) |
tokio::fs::* equivalents |
Each call is microseconds (single inode op), so this is genuine LOW. Still: a recursive remove_dir_all of a partial toolchain extract is unbounded in size. |
What was searched
Patterns (all under crates/fbuild-packages/src/):
reqwest:: — 11 hits, all async except arduino_api.rs:51
Client::(new|default|builder) — 0 hits (no shared client today)
std::thread::spawn / JoinHandle / thread::scope — 1 hit (library_compiler.rs:206)
rayon:: / par_iter — 0 hits
std::process::Command::new / Command::new — 0 hits in this crate (subprocess is via fbuild_core::subprocess::run_command, audited under that crate's sub-issue)
block_on — 30+ hits across all Package::ensure_installed impls + 4 _sync wrappers + lnk/resolver.rs:117
Mutex::new / RwLock::new — only std::sync::Mutex for in-process throttling and rusqlite conn protection; none held across .await
mpsc:: — 0 hits
OnceLock / OnceCell — OnceLock used for nonce/touch throttle (sync init, correct); no OnceCell yet
flate2 / tar / zip / xz2 / bzip2 / zstd — out of scope per spec
Modules audited:
downloader.rs, extractor.rs (extraction excluded per spec)
install_lock.rs — already correct (tokio::time::sleep, no std::thread::sleep)
lib.rs (PackageBase::staged_install, block_on_package_future)
library/library_manager.rs, library/library_downloader.rs, library/library_compiler.rs
library/arduino_api.rs, library/registry.rs
toolchain/{clang,avr,arm,arm_gcc8,esp32,esp32_metadata,esp8266,esp_qemu,riscv,rp2040_pqt,teensy_arm}.rs
- Every
library/*_core.rs and library/*_framework.rs ensure_installed impl
Full block_on ensure_installed call-site list (severity HIGH, one trait-flip fixes all):
toolchain/avr.rs:115, toolchain/arm.rs:106, toolchain/arm_gcc8.rs:74, toolchain/esp32.rs:169, toolchain/esp8266.rs:98, toolchain/esp_qemu.rs:172, toolchain/riscv.rs:183, toolchain/rp2040_pqt.rs:104, toolchain/teensy_arm.rs:97, library/apollo3_core.rs:131, library/arduino_core.rs:129, library/arduino_core_lpc8xx.rs:171, library/arduino_mbed_core.rs:180, library/attiny_core.rs:106, library/avr_framework.rs:177, library/ch32v_core.rs:140, library/cmsis_atmel.rs:107, library/cmsis_framework.rs:93, library/esp32_framework/mod.rs:136, library/esp32_platform.rs:178, library/esp8266_framework.rs:193, library/nrf52_core.rs:137, library/renesas_core.rs:161, library/rp2040_core.rs:161, library/sam_core.rs:116, library/samd_core.rs:120, library/silabs_core.rs:107, library/stm32_core.rs:129, library/teensy_core.rs:151.
Out-of-scope notes
- Archive extraction (
extractor.rs, flate2 / tar / zip / xz2 / bzip2 / zstd): sync crates with no async equivalents fbuild already uses. Per spec, do not rewrite — but each call from async context should be wrapped in tokio::task::spawn_blocking to vacate the tokio worker during a multi-GB ESP-IDF tarball decompress. Currently staged_install calls extractor::extract(&archive_path, &staging_path) directly on the runtime worker (lib.rs:418). That's worth a follow-up issue, but doesn't belong in this audit's scope.
- SHA-256 hashing (
sha2::Sha256 in downloader.rs:325, library_compiler.rs:13): CPU-bound, sync is correct. The verify_checksum_async wrapper does the right thing (async file read + sync hash). If invoked from async hot paths, wrap in spawn_blocking if the file is large.
- Test code:
lib.rs:502 (test-only tokio::spawn), avr.rs:364 (gated #[ignore] integration test), install_lock.rs test module — all out of scope.
Drop impls: install_lock.rs:163 InstallLockGuard::drop does sync std::fs::remove_dir_all — must stay sync (Drop is not async).
- In-process Mutex (
lib.rs:480-483 mark_package_touch_needed): std::sync::Mutex<HashSet<String>> for in-process throttle, never held across .await. Correct as-is.
Relationship to #805
#805 will land timeouts via the same reqwest::Client::builder() change proposed above. The clean fix is for one of these issues to land the shared http::client() module first (this issue does the lift, #805 then adds the timeout config to the builder) — recommend coordinating so they touch the same file once, not twice.
Sub-issue of #813 —
fbuild-packagesaudit lens: convert sync code to async on the daemon's tokio runtime. Partner audit is #805 ("blocking HTTP without timeout"); this issue covers the same call sites under a different lens (async-ification). Per #813's intro: the fixes compose — the end-state is a single shared asyncreqwest::Clientconfigured with.timeout(...)and.connect_timeout(...), so #805's CRITICAL findings collapse to "set a timeout on the shared client" once this issue's plumbing lands.The crate is in surprisingly good shape on the network side — almost every HTTP call already uses async
reqwest::get(...).await. The dominant async anti-pattern is notreqwest::blocking, it's theblock_on(...)sandwich: everyPackage::ensure_installedis a sync trait method that owns an asyncstaged_install, so each implementation doestokio::runtime::Handle::block_on(...). That's ~30 call sites in this crate alone, hides every install from tokio-console, and prevents axum from doing anything else on the worker during a 30-second toolchain download.The shared async
ClientproposalToday every HTTP call in the crate uses the convenience
reqwest::get(url), which constructs a brand-newreqwest::Clientper request — no connection pooling, no shared timeout config, no shared connector. The seven call sites that need this are:downloader.rs:113(get_with_retry)downloader.rs:178(open_with_retryfor streaming download)toolchain/clang.rs:147(clang manifest fetch)library/registry.rs:24/:77/:189(PlatformIO registry: 3 calls)library/arduino_api.rs:51(the onlyreqwest::blocking::*in the crate)Proposed: add
crates/fbuild-packages/src/http.rs:Then every call site becomes
http::client().await.get(url).send().await— picks up timeouts uniformly (closes #805) and reuses TCP/TLS connections across the daemon's lifetime. The blocking feature on the workspacereqwest = { features = [..., "blocking"] }can be dropped oncearduino_api.rsmigrates.OnceCell(tokio) is the right primitive —OnceLock(std) doesn't have async init, and the existing pattern indisk_cache/lease.rs:15usesOnceLockfor non-async state, so the precedent is "tokio for async init, std for sync init".Findings table
library/arduino_api.rs:30-66(ensure_arduino_api)fn;reqwest::blocking::get(ARDUINO_API_URL)+ syncstd::fs::write+ syncextractor::extractensure_arduino_apiasync; usehttp::client().await; await response body; wrapextractor::extractinspawn_blockingreqwest::blockingleft in the crate. Called fromarduino_core.rs,attiny_core.rs,arduino_mbed_core.rsframework validators — currently runs insidestaged_install's validate closure, which is sync. Either widen the validate signature toFnOnce(&Path) -> BoxFuture<...>or arrange for ArduinoCore-API injection to happen outside the sync validate.downloader.rs:113,:178reqwest::get(url).await(creates a newClientper call)http::client().await.get(url).send().awaitClientallocation costs a TLS connector teardown — currently every retry allocates a fresh connector.lib.rs:60-67(block_on_package_future)tokio::task::block_in_place(|| handle.block_on(future))Package::ensure_installedanasync fn(or returnBoxFuture)block_in_placeis a yellow-flag — it shifts the current worker into a blocking pool slot just so it can.block_on. Tokio-console sees a gap during the entire install.library/library_manager.rs:290-322(ensure_libraries_sync)handle.block_on(ensure_libraries(...))from sync entry_syncwrapper; caller (build orchestrator) is already async or should beasync ensure_libraries. Callers infbuild-buildshould.awaitdirectly.library/library_compiler.rs:202-261(compile_library_with_jobsparallel path)std::thread::scope+std::sync::Mutexwork-stealing queue across N OS threadsJoinSet<Result<...>>ofspawn_blockingtasks (compiler subprocess is sync), or aSemaphore-boundedFuturesUnorderedoftokio::process::Command::output().awaitfuturesfbuild_core::subprocess::run_command. CPU is dominated by gcc/g++ child process, not by fbuild itself, so this can stay rayon/spawn_blocking— but thestd::thread::scopeties N tokio-invisible OS threads to a single library compile. The work-queue mutex must remainstd::sync::Mutex(never held across.await) on thespawn_blockingpath; tokio'sMutexis wrong here.toolchain/*.rs+library/*_core.rs+library/*_framework.rsPackage::ensure_installedimplsif let Ok(handle) = Handle::try_current() { handle.block_on(staged_install) } else { Runtime::new().block_on(staged_install) }Packagetrait:async fn ensure_installed(&self) -> Result<PathBuf>(orfn ensure_installed(&self) -> BoxFuture<...>if dyn-compatibility is needed)Handle::try_current/ fresh-runtime fallback dance disappears. Full call-site list in the "What was searched" section. Each one is a 1-2 hour mechanical conversion once the trait flips.lnk/resolver.rs:117fetch_lnk_blobcallsblock_on_package_future(async { download_file(...) })fetch_lnk_blob(and callers up toResolver::resolve)async;.awaitdirectlylnk/resolver.rsis in the link-blob fetch path — runs at link time on the daemon.toolchain/esp32_metadata.rs:74-84(resolve_toolchain_url_sync)crate::block_on_package_future(resolve_toolchain_url(...))_syncwrapper; only call site istoolchain/esp32.rs:79which is already inside anensure_installedthat should be asyncPackagetrait flip.library/esp32_framework/libs.rs:124,:132,:184,:192handle.block_on(crate::downloader::download_file(...))inside ESP32 framework lib downloadasync; await directlyblock_onis inside another sync function called fromensure_installed. Doubly anti-pattern.downloader.rs:299-319(download_all)tokio::spawnper URL +awaiteachJoinHandlesequentiallyfutures::stream::FuturesUnorderedortokio::task::JoinSetfor first-error fail-fastJoinSet::join_nextwould surface URL #5's error immediately.library/library_manager.rs:73-93parallel download collect loopVec<JoinHandle>collected sequentially viahandle.awaitJoinSetwithjoin_next()for early-error reportingdownload_all.downloader.rs:322-339(verify_checksum, sync)std::fs::read+ CPU hashingverify_checksum_asyncat:342(usestokio::fs::read) and delete sync versionstaged_install(lib.rs:405) inside an async fn — i.e. blocking I/O on the runtime. Easy fix. Hashing itself can stay sync (CPU-bound).lib.rs:37-54dir_size(called fromrecord_install_in_disk_cacheinsidestaged_install)std::fs::read_dirwalk on potentially large extracted toolchainstokio::task::spawn_blockinglib.rs:354,:376-380,:421,:427,:435std::fs::create_dir_all/remove_dir_all/remove_file/rename/writeinsidestaged_install(which isasync)tokio::fs::*equivalentsremove_dir_allof a partial toolchain extract is unbounded in size.What was searched
Patterns (all under
crates/fbuild-packages/src/):reqwest::— 11 hits, all async exceptarduino_api.rs:51Client::(new|default|builder)— 0 hits (no shared client today)std::thread::spawn/JoinHandle/thread::scope— 1 hit (library_compiler.rs:206)rayon::/par_iter— 0 hitsstd::process::Command::new/Command::new— 0 hits in this crate (subprocess is viafbuild_core::subprocess::run_command, audited under that crate's sub-issue)block_on— 30+ hits across allPackage::ensure_installedimpls + 4_syncwrappers +lnk/resolver.rs:117Mutex::new/RwLock::new— onlystd::sync::Mutexfor in-process throttling and rusqlite conn protection; none held across.awaitmpsc::— 0 hitsOnceLock/OnceCell—OnceLockused for nonce/touch throttle (sync init, correct); noOnceCellyetflate2/tar/zip/xz2/bzip2/zstd— out of scope per specModules audited:
downloader.rs,extractor.rs(extraction excluded per spec)install_lock.rs— already correct (tokio::time::sleep, nostd::thread::sleep)lib.rs(PackageBase::staged_install,block_on_package_future)library/library_manager.rs,library/library_downloader.rs,library/library_compiler.rslibrary/arduino_api.rs,library/registry.rstoolchain/{clang,avr,arm,arm_gcc8,esp32,esp32_metadata,esp8266,esp_qemu,riscv,rp2040_pqt,teensy_arm}.rslibrary/*_core.rsandlibrary/*_framework.rsensure_installedimplFull
block_onensure_installedcall-site list (severity HIGH, one trait-flip fixes all):toolchain/avr.rs:115,toolchain/arm.rs:106,toolchain/arm_gcc8.rs:74,toolchain/esp32.rs:169,toolchain/esp8266.rs:98,toolchain/esp_qemu.rs:172,toolchain/riscv.rs:183,toolchain/rp2040_pqt.rs:104,toolchain/teensy_arm.rs:97,library/apollo3_core.rs:131,library/arduino_core.rs:129,library/arduino_core_lpc8xx.rs:171,library/arduino_mbed_core.rs:180,library/attiny_core.rs:106,library/avr_framework.rs:177,library/ch32v_core.rs:140,library/cmsis_atmel.rs:107,library/cmsis_framework.rs:93,library/esp32_framework/mod.rs:136,library/esp32_platform.rs:178,library/esp8266_framework.rs:193,library/nrf52_core.rs:137,library/renesas_core.rs:161,library/rp2040_core.rs:161,library/sam_core.rs:116,library/samd_core.rs:120,library/silabs_core.rs:107,library/stm32_core.rs:129,library/teensy_core.rs:151.Out-of-scope notes
extractor.rs,flate2/tar/zip/xz2/bzip2/zstd): sync crates with no async equivalents fbuild already uses. Per spec, do not rewrite — but each call from async context should be wrapped intokio::task::spawn_blockingto vacate the tokio worker during a multi-GB ESP-IDF tarball decompress. Currentlystaged_installcallsextractor::extract(&archive_path, &staging_path)directly on the runtime worker (lib.rs:418). That's worth a follow-up issue, but doesn't belong in this audit's scope.sha2::Sha256indownloader.rs:325,library_compiler.rs:13): CPU-bound, sync is correct. Theverify_checksum_asyncwrapper does the right thing (async file read + sync hash). If invoked from async hot paths, wrap inspawn_blockingif the file is large.lib.rs:502(test-onlytokio::spawn),avr.rs:364(gated#[ignore]integration test),install_lock.rstest module — all out of scope.Dropimpls:install_lock.rs:163InstallLockGuard::dropdoes syncstd::fs::remove_dir_all— must stay sync (Drop is not async).lib.rs:480-483mark_package_touch_needed):std::sync::Mutex<HashSet<String>>for in-process throttle, never held across.await. Correct as-is.Relationship to #805
#805 will land timeouts via the same
reqwest::Client::builder()change proposed above. The clean fix is for one of these issues to land the sharedhttp::client()module first (this issue does the lift, #805 then adds the timeout config to the builder) — recommend coordinating so they touch the same file once, not twice.