Skip to content

XY-927: [ELF benchmark P1] Add Letta-style core-vs-archival memory comparison#189

Merged
yvette-carlisle merged 10 commits into
mainfrom
y/elf-xy-927
Jun 11, 2026
Merged

XY-927: [ELF benchmark P1] Add Letta-style core-vs-archival memory comparison#189
yvette-carlisle merged 10 commits into
mainfrom
y/elf-xy-927

Conversation

@yvette-carlisle

@yvette-carlisle yvette-carlisle commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

  • Add and preserve core_archival_memory benchmark coverage for core block attachment, scope, provenance, stale-core detection, archival fallback, and project-decision recovery.
  • Keep Letta comparison rows blocked or not_tested until a contained export/readback path exists.
  • Harden external adapter scenario validation so blocked scenarios require blocked outcomes and explicit outcome/position contradictions are rejected.
  • Align benchmark reports, research guidance, and the benchmark spec with the manifest-backed evidence boundaries.

Verification

  • cargo test -p elf-eval --test real_world_job_benchmark --all-features
  • cargo make real-world-memory
  • cargo make fmt
  • cargo make lint-fix
  • cargo make checks
  • Independent fresh-context review checkpoint: clean for 05232fb

… benchmark with main","authority":"XY-927"}

# Conflicts:
#	README.md
#	apps/elf-eval/tests/real_world_job_benchmark.rs
#	docs/guide/benchmarking/2026-06-11-competitor-strength-adoption-report.md
#	docs/guide/benchmarking/2026-06-11-measurement-coverage-audit.md
#	docs/guide/benchmarking/index.md
#	docs/guide/benchmarking/real_world_agent_memory_benchmark.md
#	docs/research/2026-06-11-competitor-strength-adoption-report.json
#	docs/research/2026-06-11-measurement-coverage-audit.json
…h context trajectory main","authority":"XY-927"}
…h first-generation OSS main","authority":"XY-927"}
@yvette-carlisle yvette-carlisle merged commit 3533711 into main Jun 11, 2026
13 checks passed
@yvette-carlisle yvette-carlisle deleted the y/elf-xy-927 branch June 11, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant