Skip to content

Add DataModelMaintenance.ReseedProject for re-identifying bulk-loaded chains#68

Open
myieye wants to merge 2 commits into
mainfrom
reseed-project-api
Open

Add DataModelMaintenance.ReseedProject for re-identifying bulk-loaded chains#68
myieye wants to merge 2 commits into
mainfrom
reseed-project-api

Conversation

@myieye

@myieye myieye commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Adds DataModelMaintenance.ReseedProject(model, clientId) — re-identifies a commit chain bulk-loaded from a pre-built source (mints fresh Commit.Ids, sets a uniform ClientId, rehashes), so projects bootstrapped from a shared SQL template get distinct, valid chains.

Guards: refuses empty / multi-author chains and (DateTime, Counter) ties (re-minting random Ids would otherwise reorder them). CommitBase.GenerateHash gains a static overload so the reseed reuses the real hash rather than duplicating it. Full design + rationale in docs/decisions/reseed-project.md.

Consumed by lexbox PR sillsdev/languageforge-lexbox#2281 (template-based project creation).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added project reseeding capability to regenerate commit IDs and recompute hashes while preserving project content and enforcing safety constraints.
  • Tests

    • Comprehensive test suite added to validate reseeding functionality, including hash chain validation and failure scenarios.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a complete project reseeding workflow for Harmony that regenerates commit IDs and hashes when importing pre-built commit chains. It includes a refactored static hash helper, repository support methods, a three-phase SQL algorithm with precondition validation, a public API, comprehensive tests, and architectural documentation.

Changes

Project Reseeding Workflow

Layer / File(s) Summary
Hash Generation Refactor
src/SIL.Harmony.Core/CommitBase.cs
GenerateHash(string parentHash) delegates to a new public static GenerateHash(Guid id, string parentHash) that computes hashes from commit id and parent hash using XxHash64.
Repository Support Methods and Class Partials
src/SIL.Harmony/Db/CrdtRepository.cs, src/SIL.Harmony/DataModel.cs
CrdtRepository gains ExecuteSqlAsync for raw SQL execution and CountReferencesToCommits to count foreign key references. DataModel becomes partial to enable separate implementation files.
Core Reseeding Algorithm with Precondition Validation
src/SIL.Harmony/Maintenance/DataModel.Reseed.cs
ReseedProjectImpl validates preconditions (non-empty chain, single author, no duplicate (DateTime, Counter) timestamps), mints fresh commit IDs with recomputed hashes, then executes three-phase SQL: insert new commit rows, rewrite foreign keys in ChangeEntities and Snapshots, defensively verify no old references remain, and delete original commits—all within a single transaction.
Public Maintenance API Entry Point
src/SIL.Harmony/Maintenance/DataModelMaintenance.cs
DataModelMaintenance.ReseedProject provides the public caller-facing method with null validation and comprehensive XML documentation of expected semantics, failure modes, and transactional guarantees.
Comprehensive Test Suite
src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs
12 tests validate ID freshness (no pre-reseed overlap), client ID updates across commits, hash and parent-hash recomputation and chaining, preservation of change entities by (EntityId, Index) and snapshots except for CommitId, chain ordering by hybrid timestamp, successful post-reseed hash validation, and failure modes including multi-author chains, empty chains, duplicate timestamps, and atomic rollback on precondition violation.
Design and Operational Documentation
docs/decisions/reseed-project.md
Complete architectural record of the reseed feature covering API contract, precondition rationale, driving use case (template-imported projects with cross-project ID collisions), C# naming and surface decisions, implementation location and structure, high-level algorithm, hash computation sourcing, SQL phase ordering rationale, preservation vs rewrite matrix, tie-guard rationale and open questions, test coverage summary, required lexbox-side caller workflow updates, related artifacts, and terminology glossary.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • hahn-kev
  • jasonleenaylor

Poem

🐰 Commits reborn with brand new names,
Hashes rehashed without the blame,
Three SQL phases, crisp and clean,
The finest reseed you've ever seen!
Guards at checkpoints stand so tall,
Catching errors—catching all! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main feature being added: DataModelMaintenance.ReseedProject and its purpose of re-identifying bulk-loaded commit chains.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch reseed-project-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

… chains

Mints fresh Commit.Ids, sets a uniform ClientId, and rehashes a commit chain bulk-loaded from a pre-built source (e.g. a SQL template), so each bootstrapped project gets a distinct, valid chain. Refuses empty/multi-author chains and (DateTime,Counter) ties that re-minting would reorder. See docs/decisions/reseed-project.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@myieye myieye force-pushed the reseed-project-api branch from 2d82c61 to 0cfec6c Compare June 1, 2026 13:48
The previous commit saved the file as CRLF, inflating the diff to the
whole file. Only the new static GenerateHash(Guid, string) overload
actually changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
src/SIL.Harmony/Maintenance/DataModelMaintenance.cs (1)

42-50: ⚡ Quick win

Document the null-guard exception in the XML contract.

Line 50 throws ArgumentNullException, but the XML docs only advertise InvalidOperationException. Add the missing <exception cref="ArgumentNullException"> entry so IntelliSense matches the actual public API behavior.

Suggested doc update
     /// <param name="model">The <see cref="DataModel"/> whose commit chain will be reseeded.</param>
     /// <param name="clientId">The ClientId to set on every commit in the chain.</param>
+    /// <exception cref="ArgumentNullException">
+    /// Thrown if <paramref name="model"/> is null.
+    /// </exception>
     /// <exception cref="InvalidOperationException">
     /// Thrown if the commit chain is empty, if its commits have more than one distinct ClientId, or if
     /// two commits share an identical (DateTime, Counter).
     /// </exception>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/SIL.Harmony/Maintenance/DataModelMaintenance.cs` around lines 42 - 50,
The XML docs for ReseedProject are missing the ArgumentNullException entry;
update the XML comment for the DataModelMaintenance.ReseedProject method to
include an <exception cref="ArgumentNullException"> element describing that an
ArgumentNullException is thrown when the model parameter is null (since
ArgumentNullException.ThrowIfNull(model) is used). Ensure the new exception doc
sits alongside the existing InvalidOperationException entry so IntelliSense
reflects the actual behavior.
src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs (1)

157-204: ⚡ Quick win

Add a test for the public null-guard.

This suite covers the internal reseed failure modes well, but it never asserts the one behavior owned by DataModelMaintenance.ReseedProject itself: rejecting a null model. A dedicated test here would lock down the public API contract added in this PR.

Suggested test
     [Fact]
+    public async Task ReseedProject_ThrowsOnNullModel()
+    {
+        var act = () => DataModelMaintenance.ReseedProject(null!, _newClientId);
+        await act.Should().ThrowAsync<ArgumentNullException>();
+    }
+
+    [Fact]
     public async Task ReseedProject_ThrowsOnMultiAuthorChain()
     {
         var clientA = Guid.NewGuid();
         var clientB = Guid.NewGuid();
         await WriteChange(clientA, NextDate(), SetWord(Guid.NewGuid(), "a"));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs` around lines 157 -
204, Add a unit test that verifies DataModelMaintenance.ReseedProject rejects a
null model by throwing the appropriate null-guard exception (e.g.,
ArgumentNullException) when called with model == null; create a new test method
(e.g., ReseedProject_ThrowsOnNullModel) that calls await
DataModelMaintenance.ReseedProject(null, _newClientId) and asserts the expected
exception, mirroring the style of the other tests in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/decisions/reseed-project.md`:
- Around line 171-175: Update the two fenced code blocks that list the filenames
(the blocks containing the lines starting with
"src/SIL.Harmony/Maintenance/DataModelMaintenance.cs",
"src/SIL.Harmony/Maintenance/DataModel.Reseed.cs",
"src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs" and the later block
containing "src/SIL.Harmony/DataModel.cs") by adding a language identifier to
the opening backticks (use "text" or a more specific tag like "bash"/"csharp" if
preferred) so the fences become ```text (or ```csharp/```bash) instead of plain
``` to satisfy markdownlint MD040.
- Around line 335-337: Update the XML docs for the method's <exception
cref="InvalidOperationException"> contract to include the third failure mode:
throw when the commit chain contains duplicate hybrid timestamps (duplicate
(DateTime, Counter) entries). Edit the existing <exception> block that currently
lists "empty chain" and "commits with more than one distinct ClientId" to also
describe the duplicate (DateTime, Counter) case so the public XML documentation
matches the method's preconditions and behavior.
- Line 9: The "Audience:" line contradicts earlier statements about lexbox
wiring; decide whether lexbox is already wired (except template regeneration) or
still needs wiring and update the Audience paragraph accordingly so the doc is
consistent. Specifically, either change the earlier lines that claim lexbox is
wired (mentions of lexbox wiring and "template regeneration") to indicate
remaining work for the next agent, or change the Audience line to state this doc
is for future maintenance/templating only (referencing §11 and "template
regeneration"); make the chosen state explicit and mention the sole remaining
task ("template regeneration") if applicable, and update any phrasing that
references "wire this up" or "wire lexbox" to match.

In `@src/SIL.Harmony/Maintenance/DataModel.Reseed.cs`:
- Around line 70-76: The INSERT/DELETE SQL operations using repo.ExecuteSqlAsync
do not check the returned affected-row count; update the code around the loop
that calls repo.ExecuteSqlAsync for the INSERT INTO "Commits" (and the
corresponding DELETE later) to capture the integer result, assert it equals 1,
and throw a descriptive exception (including oldId/newId or operation context)
if it is not 1 so the surrounding transaction will roll back; specifically
modify the calls to repo.ExecuteSqlAsync(...) in the reseed logic (the INSERT
INTO "Commits" SELECT ... WHERE "Id" = {oldId} and the later DELETE that removes
the old commit) to check the return value and fail fast on != 1.

---

Nitpick comments:
In `@src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs`:
- Around line 157-204: Add a unit test that verifies
DataModelMaintenance.ReseedProject rejects a null model by throwing the
appropriate null-guard exception (e.g., ArgumentNullException) when called with
model == null; create a new test method (e.g., ReseedProject_ThrowsOnNullModel)
that calls await DataModelMaintenance.ReseedProject(null, _newClientId) and
asserts the expected exception, mirroring the style of the other tests in this
file.

In `@src/SIL.Harmony/Maintenance/DataModelMaintenance.cs`:
- Around line 42-50: The XML docs for ReseedProject are missing the
ArgumentNullException entry; update the XML comment for the
DataModelMaintenance.ReseedProject method to include an <exception
cref="ArgumentNullException"> element describing that an ArgumentNullException
is thrown when the model parameter is null (since
ArgumentNullException.ThrowIfNull(model) is used). Ensure the new exception doc
sits alongside the existing InvalidOperationException entry so IntelliSense
reflects the actual behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee6ca52f-ce2b-4aad-8239-9b20c98f5236

📥 Commits

Reviewing files that changed from the base of the PR and between 31dc78b and b043d87.

📒 Files selected for processing (7)
  • docs/decisions/reseed-project.md
  • src/SIL.Harmony.Core/CommitBase.cs
  • src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs
  • src/SIL.Harmony/DataModel.cs
  • src/SIL.Harmony/Db/CrdtRepository.cs
  • src/SIL.Harmony/Maintenance/DataModel.Reseed.cs
  • src/SIL.Harmony/Maintenance/DataModelMaintenance.cs

>
> Note on *how* the WS leaves the template: excluding it during the FW→CRDT import does **not** work — the importer queries entries, and `EnsureWritingSystemIsPopulated` requires a default vernacular WS, so a WS-less import throws. The generator therefore imports normally, then deletes the vernacular WS's commit (which cascades its change + snapshot), deletes its projected row, and calls `ReseedProject` to rehash the shortened chain before dumping `template.sql`. See `TEMPLATE-FOLLOWUPS.md` in lexbox for the current shape.

**Audience:** the next agent (potentially a different person) who will wire this up on the lexbox side (§11). This doc captures everything needed to implement without ambiguity *and* the decision history so future maintainers understand why each choice was made.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve the lexbox-status contradiction in the audience statement.

Line 9 says this is for the next agent to wire lexbox, but Lines 3-7 say lexbox wiring is already done (except template regeneration). Please align these so the doc doesn’t send mixed signals about remaining work.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/decisions/reseed-project.md` at line 9, The "Audience:" line contradicts
earlier statements about lexbox wiring; decide whether lexbox is already wired
(except template regeneration) or still needs wiring and update the Audience
paragraph accordingly so the doc is consistent. Specifically, either change the
earlier lines that claim lexbox is wired (mentions of lexbox wiring and
"template regeneration") to indicate remaining work for the next agent, or
change the Audience line to state this doc is for future maintenance/templating
only (referencing §11 and "template regeneration"); make the chosen state
explicit and mention the sole remaining task ("template regeneration") if
applicable, and update any phrasing that references "wire this up" or "wire
lexbox" to match.

Comment on lines +171 to +175
```
src/SIL.Harmony/Maintenance/DataModelMaintenance.cs
src/SIL.Harmony/Maintenance/DataModel.Reseed.cs ← internal partial of DataModel
src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs ← test class
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint MD040.

Two fenced blocks are missing language tags. Use text (or bash/csharp where applicable) to keep docs lint-clean.

Suggested patch
-```
+```text
 src/SIL.Harmony/Maintenance/DataModelMaintenance.cs
 src/SIL.Harmony/Maintenance/DataModel.Reseed.cs            ← internal partial of DataModel
 src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs    ← test class

@@
- +text
src/SIL.Harmony/DataModel.cs

Also applies to: 179-181

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 171-171: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/decisions/reseed-project.md` around lines 171 - 175, Update the two
fenced code blocks that list the filenames (the blocks containing the lines
starting with "src/SIL.Harmony/Maintenance/DataModelMaintenance.cs",
"src/SIL.Harmony/Maintenance/DataModel.Reseed.cs",
"src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs" and the later block
containing "src/SIL.Harmony/DataModel.cs") by adding a language identifier to
the opening backticks (use "text" or a more specific tag like "bash"/"csharp" if
preferred) so the fences become ```text (or ```csharp/```bash) instead of plain
``` to satisfy markdownlint MD040.

Comment on lines +335 to +337
/// <exception cref="InvalidOperationException">
/// Thrown if the commit chain is empty, or if its commits have more than one distinct ClientId.
/// </exception>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include the duplicate (DateTime, Counter) failure mode in the XML exception contract.

The doc’s preconditions include a third InvalidOperationException case (duplicate hybrid timestamps), but the proposed XML exceptions only list empty chain and multi-author. Add the third case to keep public docs consistent with behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/decisions/reseed-project.md` around lines 335 - 337, Update the XML docs
for the method's <exception cref="InvalidOperationException"> contract to
include the third failure mode: throw when the commit chain contains duplicate
hybrid timestamps (duplicate (DateTime, Counter) entries). Edit the existing
<exception> block that currently lists "empty chain" and "commits with more than
one distinct ClientId" to also describe the duplicate (DateTime, Counter) case
so the public XML documentation matches the method's preconditions and behavior.

Comment on lines +70 to +76
foreach (var (oldId, newId, hash, newParentHash) in plan)
{
await repo.ExecuteSqlAsync($"""
INSERT INTO "Commits" ("Id", "ClientId", "DateTime", "Counter", "Metadata", "Hash", "ParentHash")
SELECT {newId}, {clientId}, "DateTime", "Counter", "Metadata", {hash}, {newParentHash}
FROM "Commits" WHERE "Id" = {oldId}
""");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert the commit-row statements each affect exactly one row.

Line 72 and Line 98 ignore ExecuteSqlAsync’s affected-row count. If INSERT ... SELECT ... WHERE "Id" = {oldId} or the later DELETE ever matches 0 rows, the dangling-FK check still won’t catch a missed commit replacement when that commit has no child rows. Fail fast on != 1 here so the transaction rolls back instead of silently shortening the chain.

Suggested guard
         foreach (var (oldId, newId, hash, newParentHash) in plan)
         {
-            await repo.ExecuteSqlAsync($"""
+            var inserted = await repo.ExecuteSqlAsync($"""
                 INSERT INTO "Commits" ("Id", "ClientId", "DateTime", "Counter", "Metadata", "Hash", "ParentHash")
                 SELECT {newId}, {clientId}, "DateTime", "Counter", "Metadata", {hash}, {newParentHash}
                 FROM "Commits" WHERE "Id" = {oldId}
                 """);
+            if (inserted != 1)
+                throw new InvalidOperationException(
+                    $"ReseedProject expected to insert exactly one replacement commit for {oldId}, but inserted {inserted}.");
         }
@@
         foreach (var (oldId, _, _, _) in plan)
         {
-            await repo.ExecuteSqlAsync($"""DELETE FROM "Commits" WHERE "Id" = {oldId}""");
+            var deleted = await repo.ExecuteSqlAsync($"""DELETE FROM "Commits" WHERE "Id" = {oldId}""");
+            if (deleted != 1)
+                throw new InvalidOperationException(
+                    $"ReseedProject expected to delete exactly one original commit {oldId}, but deleted {deleted}.");
         }

Also applies to: 96-98

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/SIL.Harmony/Maintenance/DataModel.Reseed.cs` around lines 70 - 76, The
INSERT/DELETE SQL operations using repo.ExecuteSqlAsync do not check the
returned affected-row count; update the code around the loop that calls
repo.ExecuteSqlAsync for the INSERT INTO "Commits" (and the corresponding DELETE
later) to capture the integer result, assert it equals 1, and throw a
descriptive exception (including oldId/newId or operation context) if it is not
1 so the surrounding transaction will roll back; specifically modify the calls
to repo.ExecuteSqlAsync(...) in the reseed logic (the INSERT INTO "Commits"
SELECT ... WHERE "Id" = {oldId} and the later DELETE that removes the old
commit) to check the return value and fail fast on != 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant