Skip to content

verifier/tpm: share a pooled registrar client and cache bindings#187

Open
jialez0 wants to merge 1 commit into
openanolis:mainfrom
jialez0:verifier-registrar-binding-cache
Open

verifier/tpm: share a pooled registrar client and cache bindings#187
jialez0 wants to merge 1 commit into
openanolis:mainfrom
jialez0:verifier-registrar-binding-cache

Conversation

@jialez0

@jialez0 jialez0 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Problem

The TPM (tpm) and Hygon TPM (hygontpm) verifiers perform a keylime
registrar binding check whenever the evidence carries a keylime_agent_uuid:
they confirm that the EK certificate and AK public key in the evidence match
what the registrar recorded for that agent.

The current implementation builds a brand-new reqwest::Client and issues
two serial HTTPS round-trips (/version, then /v{ver}/agents/{uuid}) to
the registrar on every attestation. Under load this makes attestation
throughput a hostage of the registrar — a low-concurrency, DB-backed service —
while the AS itself stays idle on CPU.

Measured on a 4-core box with a single-worker registrar answering in ~150 ms:
end-to-end /attestation throughput collapses to ~3.3 QPS with the AS
process using only 0–13 % of the 4 cores, even though the pure verification
path (no keylime_agent_uuid) sustains 600–900 QPS. This exactly matches a
low-QPS / low-CPU report from a Hygon TPM benchmarking run.

Fix

Introduce a shared verifier::tpm_registrar module used by both TPM-family
verifiers that:

  • reuses a single process-wide reqwest::Client, so TCP connections and TLS
    sessions are pooled instead of re-established on every call;
  • caches the registrar API version per registrar URL;
  • caches the per-UUID registrar results with a TTL
    (KEYLIME_REGISTRAR_CACHE_TTL_SECS, default 300 s).

Repeated attestations for the same agent therefore no longer touch the
registrar at all. Each verifier still compares the returned EK/AK material
against the evidence being validated, so the binding guarantee is unchanged;
the TTL bounds how long a stale registration can be trusted after an agent
re-registers.

The change removes the duplicated per-request HTTP logic from both
deps/verifier/src/tpm/mod.rs and deps/verifier/src/hygon_tpm/mod.rs
(net −83 lines there) and adds no new dependencies (uses std::sync::OnceLock);
MSRV is unaffected.

Results

Same 4-core box, benchmark client at concurrency 10, single-worker registrar at
~150 ms/call. "no-uuid" = pure verification path (registrar not involved).

Scenario Before After
Hygon TPM, no-uuid (pure verify) 667 QPS 571 QPS
Intel TPM, no-uuid (pure verify) 640 QPS 708 QPS
Hygon TPM, with-uuid + registrar (300 req) 3.31 QPS 98.88 QPS
Intel TPM, with-uuid + registrar (300 req) 3.31 QPS 98.88 QPS
Intel TPM, with-uuid + registrar (1000 req) 3.31 QPS 329.92 QPS

Before the fix, throughput is independent of request count (every request pays
the ~300 ms registrar cost). After the fix, the registrar is hit once per UUID
(cold start) and every subsequent attestation is served from cache, so
throughput scales with batch size toward the pure-verification ceiling.
Under sustained cached load the AS process now uses ~320 % of the 4 cores
(CPU-bound) instead of sitting idle at 0–13 %.

The no-uuid path is unchanged within run-to-run noise, confirming no regression
to the verification hot path.

@shankailun-aliyun

Copy link
Copy Markdown
Collaborator

@jialez0 ,您好,您的请求已接收,请耐心等待结果。

@shankailun-aliyun

Copy link
Copy Markdown
Collaborator

@jialez0 ,您好,未检测到有镜像需要构建,如需重新检测请评论 /start

@jialez0 jialez0 force-pushed the verifier-registrar-binding-cache branch from 2ae96aa to 1762b95 Compare July 2, 2026 06:16
@shankailun-aliyun

Copy link
Copy Markdown
Collaborator

@jialez0 ,您好,您的请求已接收,请耐心等待结果。

@shankailun-aliyun

Copy link
Copy Markdown
Collaborator

@jialez0 ,您好,未检测到有镜像需要构建,如需重新检测请评论 /start

The TPM and Hygon TPM verifiers performed the keylime registrar binding
check by building a fresh reqwest::Client and issuing two serial HTTPS
round-trips (/version then /v{ver}/agents/{uuid}) on every attestation.
Under load this made attestation throughput a hostage of the registrar
(a low-concurrency, DB-backed service) while the AS itself stayed idle on
CPU: a single-worker registrar at ~150ms/call capped end-to-end throughput
at ~3.3 QPS even though the pure verification path sustains 500-900 QPS.

Introduce a shared tpm_registrar module that:
  * reuses one process-wide reqwest::Client so TCP/TLS is pooled instead
    of re-established per request,
  * caches the registrar API version per registrar URL, and
  * caches the per-UUID registrar results with a TTL
    (KEYLIME_REGISTRAR_CACHE_TTL_SECS, default 300s),

so repeated attestations for the same agent no longer touch the registrar.
Each verifier still compares the returned EK/AK material against the
evidence being validated, so the binding guarantee is unchanged.

No new dependencies (std OnceLock); MSRV unaffected.

Signed-off-by: Jiale Zhang <zhangjiale@linux.alibaba.com>
@jialez0 jialez0 force-pushed the verifier-registrar-binding-cache branch from 1762b95 to 5d14420 Compare July 2, 2026 06:40
@shankailun-aliyun

Copy link
Copy Markdown
Collaborator

@jialez0 ,您好,您的请求已接收,请耐心等待结果。

@shankailun-aliyun

Copy link
Copy Markdown
Collaborator

@jialez0 ,您好,未检测到有镜像需要构建,如需重新检测请评论 /start

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants