Skip to content
This repository was archived by the owner on Jun 7, 2026. It is now read-only.
This repository was archived by the owner on Jun 7, 2026. It is now read-only.

knowledge: Co-occurrence clustering for project inference #28

Description

@timkicker

Goal

Implement the fourth membership mechanism described in docs/architecture/project-system.md and foundation/sections/04-event-knowledge.tex §4.7: detect coherent project clusters from co-occurring file accesses, terminal working directories, and (eventually) browser tabs that consistently appear together in sessions.

Background

The signal-based detection in knowledge/src/project/signals.rs covers the first three mechanisms (explicit .project, rootdir heuristic, manual assignment). Co-occurrence clustering is unimplemented (0 grep hits across knowledge/).

The blueprint frames this as: "files that consistently appear together in the same sessions, accessed by the same applications and terminal working directories, are grouped as project candidates by a background inference pass."

Scope

  • Background inference task (separate from the existing projects task — runs at lower frequency, e.g. daily)
  • Reads recent Session + ACCESSED_BY + ACTIVE_IN edges from the graph
  • Identifies clusters via a co-occurrence frequency metric (specific algorithm TBD: simple threshold, Jaccard similarity, or affinity propagation)
  • Creates Project nodes with inferred = true, promoted = false, confidence set from cluster strength
  • Surfaces candidates as "suggestions" — does NOT auto-link files via PART_OF
  • User confirms via Waypointer to promote (see also #95 Save-as-Project action)

Out of scope

  • Browser tab integration — needs browser extension, defer to post-Phase 8
  • ML-based clustering — start with deterministic frequency heuristic

References

Acceptance criteria

  • New co_occurrence module under knowledge/src/project/
  • Inference task runs on schedule and produces candidate Project nodes
  • Each candidate has confidence set (0-100) and is marked inferred + !promoted
  • Tests cover: empty graph, single session no clusters, multi-session strong cluster, weak cluster below threshold

Phase

Phase 5 (Foundation cleanup follow-up). Discovered during 5A issue-hygiene pass — this was scoped under #11/#13 but never built, so it gets its own ticket rather than reopening those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions