Skip to content

KZT, refactor: split loader object tracking and patch planning#3

Draft
LaurenIsACoder wants to merge 19 commits into
masterfrom
kzt-refactor
Draft

KZT, refactor: split loader object tracking and patch planning#3
LaurenIsACoder wants to merge 19 commits into
masterfrom
kzt-refactor

Conversation

@LaurenIsACoder

@LaurenIsACoder LaurenIsACoder commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

This draft PR carries the staged KZT loader refactor through stages 1 to 8.
The series documents the overall design, introduces guest object identity,
adds in-memory Dynamic Table parsing, records GOT patch decisions, prefers
guest-owner wrapper targets where possible, stops side-loading ordinary guest
dependencies, isolates lazy binding policy, makes guest dl APIs authoritative,
and narrows the private glibc hook into a fallback loader-event source.

The current local head has passed a debug build and the focused KZT loader
regression script. A full dEQP-EGL.* A/B run from the earlier convergence
point matched the pre-refactor baseline over 4111 EGL CTS cases; the latest
head should still go through a final full CTS A/B comparison before review is
considered complete.

中文详细说明

中文详细说明

设计意图

这组补丁的目标不是一次性重写整个 loader,而是把原来耦合在一起的 KZT
loader 路径拆成可验证、可替换的边界。

旧方案把 guest loader 通知、ELF 文件解析、maplib 重解析、wrapper 查找、
GOT 修改、lazy binding、dlopen/dlsym/dlclose 状态,以及 glibc 私有
hook 同步流程混在一起。核心问题是:guest loader 已经知道对象和绑定结果,
但 KZT 仍经常通过文件、全局符号范围或 maplib 重新推导一次,导致对象身份、
目标来源和引用生命周期都不清晰。

本次重构按阶段解决这些问题:

  • GuestObjectRegistry 记录 guest loader 观察到的对象身份。
  • 内存 Dynamic Parser 从 link_map/l_ld 解析运行时 ELF 元数据。
  • Patch Planner 把每一次 GOT 写入变成可记录的决策。
  • guest-owner lookup 优先使用 GOT 当前值所属 guest object 选择 wrapper。
  • 普通 guest 依赖不再由 KZT 旁路加载。
  • lazy binding 决策从通用 relocation 中隔离。
  • guest dlopen/dlsym/dlclose 等 API 返回结果成为权威。
  • glibc 私有 hook 被收窄成 fallback loader-event source。

当前补丁组织

当前系列按 review 粒度整理为 19 个提交:

  • Patch 1: 设计文档,说明整体重构计划。
  • Patch 2: Stage 1,引入 GuestObjectRegistry。
  • Patch 3: Stage 2,引入内存 Dynamic Parser。
  • Patch 4: Stage 3,引入 Patch Planner。
  • Patch 5: Stage 4,比较 guest-owner patch target。
  • Patch 6: Stage 4,优先使用成功的 guest-owner target。
  • Patch 7: Stage 5,停止普通 guest dependency side loading。
  • Patch 8: Stage 6,隔离 lazy binding 决策。
  • Patch 9-13: Stage 7,重构 guest dl API 边界。
  • Patch 14-18: Stage 8,建立 loader event source 边界并 harden fallback。
  • Patch 19: Stage 7/8 收敛后的 dl API 健壮性修复。

是否符合设计预期

总体符合预期。

这组补丁已经把 KZT 从“自己重做 guest loader 决策”推进到“观察 guest loader
结果,并在明确边界后做 wrapper 替换”的方向。对象身份、Dynamic Table 解析、
GOT patch 决策、guest-owner wrapper 选择、dependency loading、lazy binding、
dl API 语义和 loader event source 都已经拆出明确边界。

Stage 8 仍是过渡形态。glibc 私有 hook 还没有完全移除,但它已经不再承担整个
装载同步流程,只作为 fallback event source 上报 link_map 事件。后续可以在
这个边界后替换成 r_debug、mprotect、QEMU loader/mmap event,或更小的
版本隔离 fallback hook。

已解决的原有缺陷

已解决主要结构缺陷:

  • 原来 guest 对象身份不清晰;现在由 GuestObjectRegistry 明确记录。
  • 原来依赖文件解析和 Section Header;现在已有基于 l_ld 的内存 parser。
  • 原来 GOT 修改不可观察;现在 Patch Planner 记录 object、relocation、
    symbol、version、old target、old owner、new bridge 和 reason。
  • 原来 wrapper 选择大量依赖 maplib/global symbol 重解析;现在优先使用
    GOT 当前值所属 guest object。
  • 原来普通 guest DT_NEEDED 会被 KZT 旁路加载;现在交回 guest loader。
  • 原来 lazy binding 修改分散在通用 relocation 中;现在集中在独立 helper。
  • 原来 dlopen/dlsym/dlclose 有合成句柄、重复引用计数和旁路 reload;
    现在 guest loader 返回结果成为权威。
  • 原来 glibc hook 直接驱动同步流程;现在只负责发出 loader event。

最近收敛的修复

最新本地提交补充了 Stage 7/8 收敛后的健壮性修复:

  • RTLD_DEFAULT 先询问 guest dlsym,只有 guest miss 后才走兼容 fallback。
  • wrapper handle 的 dlsym/dlvsym 先由 guest link_map 决定符号是否存在,
    成功后才允许 local native bridge 替换返回值。
  • dlvsym 保持版本化 guest lookup 路径,缺失 metadata 时不再退回普通
    非版本 dlsym
  • 缺失 local metadata、dladdr1(RTLD_DL_LINKMAP) fallback、关闭后的 handle
    等路径返回普通 dl error,而不是 assert 或写入无效结果。
  • wrapper handle 的显式 dlopen 会持有 guest loader 引用,使后续 dlclose
    与 guest 侧引用计数配对。
  • dlprivate handle 表扩容时初始化所有并行数组,避免未使用槽位残留状态。
  • Stage 8 fallback hook 安装和事件读取路径增加防御:安装失败会关闭 KZT,
    异常 env/hook/reg 不再读取坏寄存器。

新风险点

本次修改也引入或暴露了需要继续关注的风险:

  • Stage 7 改变 dl API 的权威来源,RTLD_NEXTRTLD_DEFAULT
    RTLD_NOLOADdlmopen namespace、versioned symbol、dladdr1 等边角
    语义仍需要更多真实应用覆盖。
  • local wrapper metadata 仍需和 guest-owned handle 生命周期保持一致;这比旧
    方案更清晰,但仍是高风险区域。
  • Stage 8 仍保留 glibc 私有 hook 作为 fallback,虽然已经增加 bounds check、
    pattern validation 和 missing-source fallback,但不同 glibc 版本仍可能需要
    额外适配。
  • 当前完整 A/B 覆盖过 dEQP-EGL.*,但最新 HEAD 还需要最终整体 CTS 对比;
    GLX、Vulkan、Wine 和真实应用路径仍需要后续覆盖。

已完成验证

最新本地 HEAD 已完成:

env CCACHE_DIR=/tmp/latx-ccache ninja -C build64-dbg
sh tests/latx-x86_64/run-kzt-loader-regressions.sh build64-dbg/latx-x86_64

此前收敛过程中的完整 EGL CTS A/B 对比:

tests/latx-x86_64/run-kzt-cts-compare.sh \
  --baseline /tmp/lat-kzt-baseline-artifact/latx-x86_64.before-kzt-refactor-local \
  --current /home/loongson/work/code/lat-opensource/lat/build64-dbg/latx-x86_64 \
  --cts-dir /home/loongson/data_2T/x86_test/kzt/xzy/VK-GL-CTS/builds-x86-egl/external/openglcts/modules \
  --timeout 7200 -- --deqp-visibility=hidden --deqp-watchdog=enable -n 'dEQP-EGL.*'

结果:

baseline/current per-case results matched
Passed:        2128/4111 (51.8%)
Failed:        32/4111 (0.8%)
Not supported: 1949/4111 (47.4%)
Warnings:      2/4111 (0.0%)
Waived:        0/4111 (0.0%)

后续工作

后续工作主要有三类:

  • 对最新 HEAD 执行最终完整 CTS A/B 对比。
  • 扩展验证到 GLX、Vulkan、Wine 和真实应用路径。
  • 在 Stage 8 边界后实验 r_debug 通知、RELRO mprotect、QEMU mmap/loader
    event 等方案,逐步减少 glibc 私有 hook 依赖。
English Detailed Description

English Detailed Description

Design Intent

This series does not try to rewrite the whole loader in one step. It splits
the existing KZT loader path into explicit, testable, and replaceable
boundaries.

The old path mixed guest loader notification, ELF file parsing, maplib
re-resolution, wrapper lookup, GOT patching, lazy binding, dl API bookkeeping,
and the private glibc hook synchronization path. The central problem was that
the guest loader already knew the object and binding result, while KZT often
derived the target again through files, global symbol ranges, or maplib. That
made object identity, target provenance, and handle lifetime difficult to
reason about.

The staged refactor addresses this by:

  • recording guest object identity in GuestObjectRegistry;
  • parsing runtime ELF metadata from link_map/l_ld;
  • recording GOT writes as Patch Planner decisions;
  • preferring wrapper targets from the guest object that owns the current GOT
    value;
  • returning ordinary guest dependency loading to the guest loader;
  • isolating lazy binding policy from generic relocation;
  • making guest dlopen/dlsym/dlclose results authoritative;
  • narrowing the private glibc hook into a fallback loader-event source.

Current Patch Organization

The current series is organized into 19 reviewable patches:

  • Patch 1 documents the overall refactor plan.
  • Patch 2 covers stage 1 and introduces GuestObjectRegistry.
  • Patch 3 covers stage 2 and adds the in-memory Dynamic Parser.
  • Patch 4 covers stage 3 and introduces the Patch Planner.
  • Patch 5 covers stage 4 shadow comparison for guest-owner patch targets.
  • Patch 6 completes stage 4 by preferring successful guest-owner targets.
  • Patch 7 covers stage 5 and stops ordinary guest dependency side loading.
  • Patch 8 covers stage 6 and isolates lazy binding decisions.
  • Patches 9-13 cover stage 7 and refactor the guest dl API boundary.
  • Patches 14-18 cover stage 8 and establish the loader event source boundary.
  • Patch 19 contains the final stage 7/8 dl API hardening fixes.

Design Conformance

The current implementation matches the refactor direction.

The code now has explicit boundaries for guest object identity, runtime dynamic
metadata parsing, GOT patch decisions, guest-owner wrapper lookup, dependency
loading, lazy binding, guest dl API authority, and loader event sources. This
moves KZT away from redoing guest loader decisions and toward observing guest
loader results before applying wrapper replacement.

Stage 8 is still transitional. The private glibc hook has not been fully
removed, but it has been narrowed to a fallback event source. It reports
link_map events and no longer owns the whole synchronization flow. That
boundary is the point where future r_debug, mprotect, QEMU loader/mmap event,
or smaller version-isolated fallback sources can be plugged in.

Defects Addressed

The series addresses the main structural defects of the old scheme:

  • Guest object identity is now explicit in GuestObjectRegistry.
  • Runtime Dynamic Table parsing no longer depends only on files or Section
    Headers.
  • GOT writes are observable through Patch Planner decisions.
  • Wrapper lookup can prefer the guest object that owns the current GOT target
    instead of redoing global maplib resolution first.
  • Ordinary guest DT_NEEDED dependencies are no longer side-loaded by KZT.
  • Lazy binding policy is isolated from generic relocation code.
  • Guest dl API results are authoritative, removing synthetic handles,
    duplicated reference counting, and side reload logic.
  • The private glibc hook is reduced to an event source instead of driving the
    whole loader sync path.

Recent Hardening

The latest local commit adds final stage 7/8 hardening:

  • RTLD_DEFAULT asks guest dlsym first and uses compatibility fallback only
    after a guest miss.
  • Wrapper-handle dlsym/dlvsym ask the guest link_map to decide symbol
    existence before replacing a successful result with a local native bridge.
  • dlvsym stays on the versioned guest lookup path and no longer falls back to
    a plain non-versioned dlsym when local metadata is missing.
  • Missing local metadata, dladdr1(RTLD_DL_LINKMAP) fallback, and closed-handle
    paths report ordinary dl errors instead of asserting or writing invalid data.
  • Explicit wrapper-handle dlopen retains the guest loader reference so later
    dlclose calls are paired on the guest side.
  • dlprivate handle-table growth initializes all parallel arrays.
  • Stage 8 fallback hook installation and event capture are defensive: install
    failure disables KZT, and invalid env/hook/reg state no longer reads a bad
    guest register.

New Risks

The refactor also introduces or exposes risks that still need attention:

  • Stage 7 changes the authoritative source for dl API behavior. Edge cases such
    as RTLD_NEXT, RTLD_DEFAULT, RTLD_NOLOAD, dlmopen namespaces,
    versioned symbols, and dladdr1 still need more real-application coverage.
  • Local wrapper metadata must stay consistent with guest-owned handle lifetime.
    This is clearer than the old model but remains a sensitive area.
  • Stage 8 still uses a private glibc hook as a fallback source. Bounds checks,
    pattern validation, and missing-source fallback reduce the risk, but new glibc
    layouts may still require additional adaptation.
  • A full dEQP-EGL.* A/B run has matched during convergence, but the latest
    head still needs a final full CTS comparison. GLX, Vulkan, Wine, and real
    application paths also need follow-up coverage.

Validation Completed

Latest local head:

env CCACHE_DIR=/tmp/latx-ccache ninja -C build64-dbg
sh tests/latx-x86_64/run-kzt-loader-regressions.sh build64-dbg/latx-x86_64

Earlier full EGL CTS A/B comparison during convergence:

tests/latx-x86_64/run-kzt-cts-compare.sh \
  --baseline /tmp/lat-kzt-baseline-artifact/latx-x86_64.before-kzt-refactor-local \
  --current /home/loongson/work/code/lat-opensource/lat/build64-dbg/latx-x86_64 \
  --cts-dir /home/loongson/data_2T/x86_test/kzt/xzy/VK-GL-CTS/builds-x86-egl/external/openglcts/modules \
  --timeout 7200 -- --deqp-visibility=hidden --deqp-watchdog=enable -n 'dEQP-EGL.*'

Result:

baseline/current per-case results matched
Passed:        2128/4111 (51.8%)
Failed:        32/4111 (0.8%)
Not supported: 1949/4111 (47.4%)
Warnings:      2/4111 (0.0%)
Waived:        0/4111 (0.0%)

Follow-up Work

Follow-up work is focused on three areas:

  • Run the final full CTS A/B comparison for the latest head.
  • Extend validation to GLX, Vulkan, Wine, and real application paths.
  • Replace the stage 8 event source with r_debug, RELRO mprotect, QEMU
    mmap/loader events, or a smaller version-isolated fallback hook.

Document the KZT loader refactor as an engineering plan instead of a
progress report.

The document explains why the loader path needs to be split, why the
staged rewrite is feasible, and how the work should be organized across
the planned stages.

Series organization:

- Patch 1 documents the overall design and review structure.
- Patch 2 belongs to stage 1 and introduces guest object identity
  tracking.
- Patch 3 belongs to stage 2 and adds in-memory Dynamic Table parsing
  with legacy comparison.
- Patch 4 belongs to stage 3 and introduces patch planner decisions for
  GOT writes.
- Patch 5 belongs to stage 4 and compares guest-owner targets with
  maplib-selected targets.
- Patch 6 belongs to stage 4 and selects successful guest-owner targets
  while preserving maplib fallback.

Progress reports and latest validation numbers are intentionally left
out of this design document because they change as the series evolves.
Stage 1 of the loader refactor records guest ELF object identity in a
dedicated registry instead of relying on scattered loader state.

The loader callback registers object name, base, load range, and
dynamic table information while preserving the legacy processing path
as a compatibility fallback.

This creates an explicit address-to-guest-object boundary that later
stages use for dynamic parsing and guest-owner patch target selection.
Stage 2 of the loader refactor parses guest Dynamic Table data from
guest memory instead of relying only on the legacy file and section
header path.

The new parser materializes dynamic symbols, relocations, version
indices, version needs, version definitions, and string table views from
the runtime dynamic information.

Keep the legacy parser active and add comparison coverage so parser
differences are observable before the old file-based dependency is
removed.
Stage 3 of the loader refactor materializes GOT writes as patch
decisions before applying them.

The planner records the guest object, relocation, symbol version, old
target, old owner, selected bridge, target source, and decision reason.
Immediate relocations, lazy setup, PLT resolver writes, and unresolved
slots all go through the same decision formatting path.

This does not change the selected target yet.  It makes the existing
maplib-based behavior observable so later guest-owner changes can be
compared against a recorded decision.
Stage 4 of the loader refactor starts using the current guest GOT value
to identify the guest object that owns a resolved target.

Add guest-object address lookup, guest-owner wrapper lookup, owner
relation classification, and shadow planner logs.  The shadow mode
records whether the guest-owner target matches the maplib-selected
target and classifies failures such as missing guest owners, self PLT
targets, missing wrappers, missing libraries, and missing symbols.

The selected bridge still comes from maplib in this patch.  Keeping the
old target source authoritative makes the guest-owner path observable
before it is allowed to replace maplib resolution.
Complete stage 4 by allowing a successful guest-owner probe to select
the native wrapper bridge directly.

The relocation resolver now probes the guest object that owns the
current GOT value before repeating the global maplib lookup.  When that
probe resolves a wrapper target, the planner records the decision as a
guest-owner target and skips the maplib reparse for that slot.

Failed probes keep the existing maplib fallback and carry the classified
failure into the patch decision.  Lazy binding remains outside this
stage and keeps its existing resolver path.
Stage 5 of the loader refactor lets the guest loader own ordinary guest
dependencies.

KZT now registers wrappers only for objects that the guest loader has already
reported.  It no longer expands guest RPATH/RUNPATH, recursively loads guest
DT_NEEDED entries, probes wrappers by constructing temporary libraries, or
side-loads libGL for SDL/CgGL objects.

Remove the old LoadNeededLibs() guest dependency path while keeping
AddNeededLib() available for native wrapper registration and wrapper host
dependencies.
Complete stage 6 by moving lazy JUMP_SLOT policy and resolver state out
of RelocateElfRELA().

Move lazy slot classification, binding metadata, resolver eligibility,
deferred patch decisions, PLT resolver installation, PLT frame handling,
resolver actions, patch planning inputs, and lazy logging behind
guestlazy helpers.

Keep the selected target, patch decision reasons, resolver slot writes,
and first-call KZT resolver behavior unchanged.

This creates the boundary needed to later let guest ld.so bind first and
replace the slot after binding without modifying generic RelocateElfRELA().
Document the stage 7 rule that guest dynamic loader results are
authoritative for dlopen, dlsym, dlclose, dlinfo, dladdr, and dlvsym.

The local wrapper side should keep only the metadata needed to bridge
native wrappers back to guest-owned handles.  It must not synthesize
independent guest handles, duplicate guest reference counts, or reload
guests behind the guest loader.
Split the common guest dl helper calls out of the wrapper entry points.

Keep the behavior unchanged while naming the handle lookup path,
RunFunctionWithState() forwarding, and guest dlclose forwarding points.
These helpers provide the mechanical boundary used by the following
patches to make guest loader results authoritative.
@LaurenIsACoder LaurenIsACoder force-pushed the kzt-refactor branch 2 times, most recently from 99f8a9a to 81ab7db Compare June 16, 2026 06:37
Make dlopen reuse and close handling follow the handles returned by the
guest loader.

Track recycled handles, reopen-after-close, self handles, local maplib
metadata, and closed-handle deactivation without reloading guests behind
the guest loader.
Route dlsym and dlvsym through guest-owned lookup results before using
local wrapper metadata as a compatibility fallback.

Handle RTLD_DEFAULT, RTLD_NEXT, wrapped link_map lookups, missing symbol
errors, versioned dlvsym requests, dlinfo delegation, and symbol sync
failures through named helpers.
Finish stage 7 by pairing local wrapper metadata with guest-owned
handle lifetime.

Close guest handles before local release, clear guest identity on final
close, return guest link maps from dlinfo and dladdr1, forward non-base
dlmopen, preserve dlerror semantics, and validate native wrapper dlsym
and dlvsym symbols before returning them.
Document the stage 8 rule that loader synchronization should consume
loader events instead of depending directly on a private glibc hook.

The glibc hook remains as a fallback event source during this stage, but
it should only report link_map events.  Guest object registration and
legacy compatibility processing live behind the event boundary.
Convert the loader callback path into a loader-event submission path.

Build immutable event snapshots from link_map data, validate missing or
empty link_map records before dispatch, tag event sources in logs, and
keep the glibc callback as an adapter that submits events rather than
running the whole synchronization flow directly.
Move guest object setup and legacy processing behind the loader-event
boundary.

Centralize guest object state updates, debug logging, legacy ELF
preparation, successful finalization, missing ELF state, object key
selection, and observation logging.  This lets future event sources
share the same registration and compatibility path.
Represent the private glibc hook as a loader-event source object.

Name the hook metadata, fixed-candidate and pattern scanning helpers,
source tag, hook resolver, cached hook state, callback bridge helpers,
and bridge install path.  The hook still exists, but the rest of KZT now
sees it through the event-source boundary.
Finish Stage 8 by keeping the private glibc hook behind the loader event
source boundary.

Harden the fallback source that still scans the glibc loader by bounding
fixed hook checks, pattern probes, scan cursors, and backtracking.  Also
validate loader images, reject invalid hook registers, free resolved
loader paths, and disable KZT cleanly when no fallback source is found.

Keep Dynamic parser comparison logs attached to the active event source
instead of hard-coding kzt_tb_callback.  This lets future r_debug,
mprotect, mmap, or smaller fallback sources reuse the same dispatcher
without carrying the old hook name through the compatibility path.
Ask the guest loader before using local wrapper or global-symbol fallback
results for dlsym.

Stage 7 makes guest dl API results authoritative.  RTLD_DEFAULT now uses
the LATX_RELOCATION_SAVE_SYMBOLS fallback only after guest dlsym misses.
Wrapper handles also ask the guest link_map first, and only replace a
successful guest result with the local native bridge address.

Keep local metadata failure paths non-fatal as well.  Missing local
library metadata now reports ordinary dl errors instead of asserting or
calling GetElfIndex() with a NULL library.  dladdr1 LINKMAP fallback also
checks that extra_info and a local link map are available before writing
fallback results.

Keep dlvsym on the versioned guest lookup path for wrapper handles too.
The guest version check now returns the guest result directly unless a
local bridge can replace that successful result.  Missing local metadata
also reports the handle-specific miss instead of falling back to a plain
dlsym lookup.

Keep guest dlopen and dlclose references paired for wrapper handles.
Explicit dlopen now retains the guest loader object even when the link_map
was already known from a dependency load.  If local symbol registration
fails after guest dlopen succeeds, roll back the guest reference and clear
the cached guest identity.

Initialize all parallel handle-table arrays when growing dlprivate state.
This keeps future debug and failure paths from observing stale metadata in
unused slots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant