fix(memory): restore pool session attribution#320
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Recall Benchmark ResultsFull outputCommit: 1215455 |
heybeaux
left a comment
There was a problem hiding this comment.
Independent PR Review — fix(memory): restore pool session attribution
Reviewed against: correctness, RLS/tenant isolation, LongMemEval/recall impact, API compatibility, and deployment risk.
✅ Root causes are correctly diagnosed and fixed
agentSessionKey ?? sessionId fallback (conversation-observer.service.ts)
The fix is right. OpenClaw hook integrations historically set sessionId before agentSessionKey existed. Resolving agentSessionKey = dto.agentSessionKey ?? dto.sessionId upstream — once, before both the summarization buffer path and the memory write path — means both paths now share the same resolved value. Previously the memory write was using a different key (or none) versus what the summarization path would have used.
Pool membership now awaited instead of fire-and-forget (memory-write.service.ts)
The old .then().catch() pattern for caller-specified poolId writes allowed the HTTP response to complete before the membership was inserted, which could silently lose it if the process was under memory pressure or the connection pool was saturated. The await-inside-try/catch pattern fixes this. Correct.
addToGlobalPoolAndLog detached from request RLS context
rlsContext.run(undefined as any, ...) clears the per-request transactional Prisma client before doing background work. This is the established pattern (used identically in ensemble.service.ts, memory-pipeline.service.ts, cloud-sync.service.ts). Without the detachment, the async work would inherit a closed transaction client — causing Prisma to throw after the HTTP response completed. The explicit application-level scoping (userId, memoryId) preserves tenant isolation even without the db-session variable; the db-level app.current_account_id enforcement is noted as deferred defense-in-depth.
Prisma P2002 check precision
Changed from .includes('P2002') to === 'P2002'. Correct — err.code is a top-level property on Prisma errors, not a substring in a message. This is more precise.
Summarization buffer propagation
poolId and agentSessionKey are now threaded through addTurnsToBuffer → flushBuffer → summarizeAndStore → individual remember() calls. The path was confirmed by reading the full summarization.service.ts flow.
Schema: composite index exists
@@unique([userId, name]) on MemoryPool (schema.prisma line 580) confirms the userId_name upsert key is valid. No migration needed.
/v1/pools controller: authenticated-user-first with legacy fallback
@UserId() returns string | null per the decorator definition. The authenticatedUserId || legacyUserId logic correctly falls through to the legacy query param when the decorator returns null. BadRequestException for the case where both are absent is the right response.
Access log fire-and-forget detachment (memory-access-log.service.ts)
logAccessFireAndForget and logBatchFireAndForget now wrap their work in rlsContext.run(undefined, ...). Same pattern, same rationale. Fixes the case where fire-and-forget telemetry was inheriting a closed transaction client.
Duration field: ISO-8601 → milliseconds
The change from PT45M to numeric milliseconds is intentional and the inline comment explains why (NaN duration cards). The test expectation expect(result.duration).toBe(45 * 60 * 1000) is explicit and passes. The durationEnd = session.endedAt ?? new Date() approach means in-progress sessions get a live elapsed time rather than null.
⚠️ Notes and minor issues (none are blockers)
limit/offset NaN propagation
Number('abc') → NaN. NaN ?? 50 → NaN (nullish coalescing doesn't catch NaN). Math.max(NaN, 1) → NaN. Prisma receives take: NaN, which will likely error or behave unexpectedly. No real-world client sends non-numeric limit, but suggest adding an isNaN guard:
limit: limit && !isNaN(Number(limit)) ? Number(limit) : undefined,or use parseInt(limit, 10) with an isNaN check. Low priority but worth a follow-up.
Breaking response shape changes — document for SDK consumers
GET /v1/poolsnow returns{ pools: [...], total: N }instead of a bare array. Clients expecting an array will break silently (.lengthon the object would returnundefined).SessionSummaryResult.durationchanged from ISO-8601 string ("PT45M") to numeric milliseconds. Any client parsing the old format will get the wrong value.- New fields (
uniqueMemories,topTopics,memberCount,grantCount) are additive and backward-compatible.
These changes are intentional and correct for the dashboard, but they should be noted in the API changelog or a migration guide for external consumers.
uniqueMemories and uniqueMemoriesAccessed hold the same value
Both are uniqueMemoryIds.size, which counts unique memories from non-CREATED access events. The field duplication is intentional (dashboard compatibility) but worth a comment or a note in the interface definition to avoid future confusion.
createdBySession field semantics
createdBySession: pool.createdBy ?? null maps a plain string (the session key at creation time) to a field named createdBySession, which implies it might be a foreign key or structured object. It's not — it's just the createdBy string. No bug, but the naming could confuse consumers.
getById has no tenant scoping — pre-existing
Any authenticated user who knows a pool ID can fetch its members and grants via GET /v1/pools/:id. This is unchanged from before this PR. Not introduced here, but worth a follow-up ticket.
LongMemEval / recall impact
None. The PR deliberately avoids touching vector search, recall ranking, importance scoring, or ingestion pipeline logic. The only meaningful recall effect is that memories now correctly receive createdBySession and get added to the global pool, which improves their accessibility for session-scoped recall. Confirmed: no changes to recall.service.ts, retrieval*.ts, or embedding/scoring code.
CI
synccheck: ✅ pass- Recall Benchmark / Post-Dream-Cycle:
pendingat review time (expected, these take longer)
Verdict
LGTM. The root causes were correctly identified and the fixes are sound. The NaN validation gap and API shape changes are worth addressing but are not blockers for merging. Deployment risk is low: no schema migrations, no recall/vector changes, pool side effects are now more reliable than before, and the existing fire-and-forget telemetry is better protected against post-response transaction errors.
Summary
Fixes the pool/session attribution path reported in the dashboard screenshots: memories could be created successfully while pool membership,
createdBySession, session summary counts, and dashboard pool/member/grant views stayed empty or unusable.Root causes addressed:
POST /v1/observeintegrations historically sentsessionId: event.sessionKey, but memory attribution only looked at the neweragentSessionKeyfield.poolIdmembership writes were fire-and-forget, so successful memory creation could still lose the pool membership side effect.GET /v1/poolsrequired?userId=...; dashboard calls rely on the authenticated/default user instead.createdBySession, counts, flattened members/grants, numeric duration,uniqueMemories).Changes
ObserveDto.sessionIdas the agent-session attribution key whenagentSessionKeyis absent.poolIdandagentSessionKeythrough the summarization buffer path.globalpool, create membership, and write CREATED access logs outside the completed request transaction.poolIdmembership writes inside the request context, preserving authorization and avoiding lost side effects./v1/poolslist for the authenticated user, with legacyuserIdfallback and pagination.Validation
corepack pnpm exec jest src/auto/conversation-observer.service.spec.ts src/summarization/summarization.service.spec.ts src/memory-pool/memory-pool.service.spec.ts src/memory-pool/memory-pool.controller.spec.ts src/memory-access-log/memory-access-log.service.spec.ts src/memory/memory-write.service.spec.ts --forceExit --maxWorkers=2corepack pnpm buildgit diff --checkcorepack pnpm exec eslint <touched files>Risk / merge note
This intentionally avoids recall-ranking/vector/query logic and does not modify LongMemEval ingestion/recall codepaths. It does touch memory attribution/pool side effects, so per the autonomous operator safety policy this should be reviewed carefully rather than blindly auto-merged.