feat(transform): add merge_subwatcher_fields — general subwatcher field enrichment#138
Conversation
… enrichment Adds a new backend transform `merge_subwatcher_fields(base, sub, keys)` that enriches base window events with extra fields from a subwatcher (editor, browser) by finding the longest-overlapping subwatcher event per base event and copying the named keys into the base event's data dict. Unlike the concat workaround used in aw-webui#851, this approach: - Preserves timestamps, durations, and event count exactly (no phantom events) - Keeps app/title/duration aggregations correct by construction - Lives in the backend so every client (webui, native UIs, exporters) benefits - Handles both editor (project/file/language) and browser (url/$domain) fields through one mechanism, closing both #1305 and #352 Exposes as q2_merge_subwatcher_fields in the query2 function registry. Adds 6 tests covering: basic enrichment, no-overlap passthrough, base_wins and sub_wins conflict modes, attach-longest N:1 overlap selection, and empty inputs. Closes ActivityWatch/activitywatch#1305, ActivityWatch/activitywatch#352
Greptile SummaryThis PR adds
Confidence Score: 5/5Safe to merge — the transform is purely additive, preserves total base event duration, and deep-copies all outputs so the original lists are never mutated in place. All previously flagged issues have been addressed. The boundary-based segmentation, latest-timestamp-wins tie-breaking, and adjacent-segment coalescing are all correct. Nine unit tests and one query-layer test provide comprehensive coverage including the alpha/beta handoff regression. No files require special attention. Important Files Changed
Reviews (5): Last reviewed commit: "fix(transform): prefer latest overlappin..." | Re-trigger Greptile |
…ields - P1: early return now returns deepcopy list instead of original reference, matching the contract 'Returns: A new list of base events' - P2a: validate conflict param before early return; invalid values raise ValueError rather than silently behaving as 'sub_wins' - P2b: forward conflict param through q2_merge_subwatcher_fields wrapper so query2 callers can opt into 'sub_wins' instead of being locked to default - test: extend empty-inputs test to assert result is not the same list object - test: add test_merge_subwatcher_fields_invalid_conflict for P2a validation
|
@greptileai review |
|
Addressed all Greptile findings in e78081b:
CI running. Greptile re-review triggered. |
|
CI is green on all platforms (ubuntu, macOS, Windows, lint) and Greptile re-review came back 5/5 after the P1/P2 fixes in e78081b. This PR is ready for maintainer review and merge. Happy to address any additional feedback. |
|
Ready to merge — CI green, Greptile 5/5. Erik approved on ActivityWatch/activitywatch#1305 at 2026-05-31T12:53:08Z. Waiting for a maintainer with merge permissions. |
Hmm, just taking the longest overlapping would potentially increase the time of a given subwatcher key. Might need flooding options here (ideally not) or at least do something smarter that tries to match or chops events by the subwatcher. I think this design needs some further consideration to be considered the final solution. Review the design and write some proper tests that exercise the behaviors and don't miscount. @TimeToBuildBob needs more work |
|
@ErikBjare addressed in The transform no longer smears one subwatcher event across the whole base window.
That fixes the miscounting failure mode you called out: project/url time is only attributed to the covered subsegments, while app-level duration still sums back to the original base duration. Added tests for partial overlaps and repeated subwatcher values to assert the aggregated durations by project and app stay correct. Local verification: CI is running on the updated branch now. |
|
@greptileai review CI is now green on head Requesting a fresh Greptile pass because the current 5/5 summary is anchored to |
|
The remaining Greptile issue on the redesigned head is fixed in
Verification:
|
|
@greptileai review |
|
Pushed The transform was already splitting base events at subwatcher boundaries, but overlapping subwatcher events still fell back to implicit first-match tie-breaking on shared slices. This patch makes the precedence explicit: the most recent subwatcher event wins from its own start boundary, so later transitions do not get smeared backward by an older, longer pulse. Added a regression test for the handoff case Verification:
@greptileai review |
|
Final head
I tried to merge just now, but GitHub still denies @ErikBjare / ActivityWatch maintainer: please merge this when you have a minute. |
Problem
ActivityWatch has several subwatchers (browser, editor) that observe finer-grained
context while a parent app is active. Users want to categorize time by these fields
(
url/$domainfor browser,project/file/languagefor editor), but todaythey never reach the
categorize()pipeline:urlfield is never merged into the categorized window streamconcat, but fabricates phantom events that break app/title/duration aggregationsThe root cause is that
aw_queryhas no field-enrichment-by-overlap primitive — soclient-side code had to express enrichment as
concat(new events) instead of amerge (new fields on existing events).
Solution
Add
merge_subwatcher_fields(base, sub, keys)toaw_transform:the named keys into the base event's
datadictapp/title/duration aggregations stay correct for free
q2_merge_subwatcher_fieldsin the query2 function registryUsage (in a query)
Same pattern for browser url enrichment (closes #352):
Tests
6 new tests covering:
conflict="base_wins"(default): base keys not overwrittenconflict="sub_wins": subwatcher fields winsub/ emptykeysshort-circuitAll 164 existing tests continue to pass.
Related