Add DataModelMaintenance.ReseedProject for re-identifying bulk-loaded chains#68
Add DataModelMaintenance.ReseedProject for re-identifying bulk-loaded chains#68myieye wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughThis PR implements a complete project reseeding workflow for Harmony that regenerates commit IDs and hashes when importing pre-built commit chains. It includes a refactored static hash helper, repository support methods, a three-phase SQL algorithm with precondition validation, a public API, comprehensive tests, and architectural documentation. ChangesProject Reseeding Workflow
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… chains Mints fresh Commit.Ids, sets a uniform ClientId, and rehashes a commit chain bulk-loaded from a pre-built source (e.g. a SQL template), so each bootstrapped project gets a distinct, valid chain. Refuses empty/multi-author chains and (DateTime,Counter) ties that re-minting would reorder. See docs/decisions/reseed-project.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2d82c61 to
0cfec6c
Compare
The previous commit saved the file as CRLF, inflating the diff to the whole file. Only the new static GenerateHash(Guid, string) overload actually changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/SIL.Harmony/Maintenance/DataModelMaintenance.cs (1)
42-50: ⚡ Quick winDocument the null-guard exception in the XML contract.
Line 50 throws
ArgumentNullException, but the XML docs only advertiseInvalidOperationException. Add the missing<exception cref="ArgumentNullException">entry so IntelliSense matches the actual public API behavior.Suggested doc update
/// <param name="model">The <see cref="DataModel"/> whose commit chain will be reseeded.</param> /// <param name="clientId">The ClientId to set on every commit in the chain.</param> + /// <exception cref="ArgumentNullException"> + /// Thrown if <paramref name="model"/> is null. + /// </exception> /// <exception cref="InvalidOperationException"> /// Thrown if the commit chain is empty, if its commits have more than one distinct ClientId, or if /// two commits share an identical (DateTime, Counter). /// </exception>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/SIL.Harmony/Maintenance/DataModelMaintenance.cs` around lines 42 - 50, The XML docs for ReseedProject are missing the ArgumentNullException entry; update the XML comment for the DataModelMaintenance.ReseedProject method to include an <exception cref="ArgumentNullException"> element describing that an ArgumentNullException is thrown when the model parameter is null (since ArgumentNullException.ThrowIfNull(model) is used). Ensure the new exception doc sits alongside the existing InvalidOperationException entry so IntelliSense reflects the actual behavior.src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs (1)
157-204: ⚡ Quick winAdd a test for the public null-guard.
This suite covers the internal reseed failure modes well, but it never asserts the one behavior owned by
DataModelMaintenance.ReseedProjectitself: rejecting a nullmodel. A dedicated test here would lock down the public API contract added in this PR.Suggested test
[Fact] + public async Task ReseedProject_ThrowsOnNullModel() + { + var act = () => DataModelMaintenance.ReseedProject(null!, _newClientId); + await act.Should().ThrowAsync<ArgumentNullException>(); + } + + [Fact] public async Task ReseedProject_ThrowsOnMultiAuthorChain() { var clientA = Guid.NewGuid(); var clientB = Guid.NewGuid(); await WriteChange(clientA, NextDate(), SetWord(Guid.NewGuid(), "a"));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs` around lines 157 - 204, Add a unit test that verifies DataModelMaintenance.ReseedProject rejects a null model by throwing the appropriate null-guard exception (e.g., ArgumentNullException) when called with model == null; create a new test method (e.g., ReseedProject_ThrowsOnNullModel) that calls await DataModelMaintenance.ReseedProject(null, _newClientId) and asserts the expected exception, mirroring the style of the other tests in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/decisions/reseed-project.md`:
- Around line 171-175: Update the two fenced code blocks that list the filenames
(the blocks containing the lines starting with
"src/SIL.Harmony/Maintenance/DataModelMaintenance.cs",
"src/SIL.Harmony/Maintenance/DataModel.Reseed.cs",
"src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs" and the later block
containing "src/SIL.Harmony/DataModel.cs") by adding a language identifier to
the opening backticks (use "text" or a more specific tag like "bash"/"csharp" if
preferred) so the fences become ```text (or ```csharp/```bash) instead of plain
``` to satisfy markdownlint MD040.
- Around line 335-337: Update the XML docs for the method's <exception
cref="InvalidOperationException"> contract to include the third failure mode:
throw when the commit chain contains duplicate hybrid timestamps (duplicate
(DateTime, Counter) entries). Edit the existing <exception> block that currently
lists "empty chain" and "commits with more than one distinct ClientId" to also
describe the duplicate (DateTime, Counter) case so the public XML documentation
matches the method's preconditions and behavior.
- Line 9: The "Audience:" line contradicts earlier statements about lexbox
wiring; decide whether lexbox is already wired (except template regeneration) or
still needs wiring and update the Audience paragraph accordingly so the doc is
consistent. Specifically, either change the earlier lines that claim lexbox is
wired (mentions of lexbox wiring and "template regeneration") to indicate
remaining work for the next agent, or change the Audience line to state this doc
is for future maintenance/templating only (referencing §11 and "template
regeneration"); make the chosen state explicit and mention the sole remaining
task ("template regeneration") if applicable, and update any phrasing that
references "wire this up" or "wire lexbox" to match.
In `@src/SIL.Harmony/Maintenance/DataModel.Reseed.cs`:
- Around line 70-76: The INSERT/DELETE SQL operations using repo.ExecuteSqlAsync
do not check the returned affected-row count; update the code around the loop
that calls repo.ExecuteSqlAsync for the INSERT INTO "Commits" (and the
corresponding DELETE later) to capture the integer result, assert it equals 1,
and throw a descriptive exception (including oldId/newId or operation context)
if it is not 1 so the surrounding transaction will roll back; specifically
modify the calls to repo.ExecuteSqlAsync(...) in the reseed logic (the INSERT
INTO "Commits" SELECT ... WHERE "Id" = {oldId} and the later DELETE that removes
the old commit) to check the return value and fail fast on != 1.
---
Nitpick comments:
In `@src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs`:
- Around line 157-204: Add a unit test that verifies
DataModelMaintenance.ReseedProject rejects a null model by throwing the
appropriate null-guard exception (e.g., ArgumentNullException) when called with
model == null; create a new test method (e.g., ReseedProject_ThrowsOnNullModel)
that calls await DataModelMaintenance.ReseedProject(null, _newClientId) and
asserts the expected exception, mirroring the style of the other tests in this
file.
In `@src/SIL.Harmony/Maintenance/DataModelMaintenance.cs`:
- Around line 42-50: The XML docs for ReseedProject are missing the
ArgumentNullException entry; update the XML comment for the
DataModelMaintenance.ReseedProject method to include an <exception
cref="ArgumentNullException"> element describing that an ArgumentNullException
is thrown when the model parameter is null (since
ArgumentNullException.ThrowIfNull(model) is used). Ensure the new exception doc
sits alongside the existing InvalidOperationException entry so IntelliSense
reflects the actual behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ee6ca52f-ce2b-4aad-8239-9b20c98f5236
📒 Files selected for processing (7)
docs/decisions/reseed-project.mdsrc/SIL.Harmony.Core/CommitBase.cssrc/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cssrc/SIL.Harmony/DataModel.cssrc/SIL.Harmony/Db/CrdtRepository.cssrc/SIL.Harmony/Maintenance/DataModel.Reseed.cssrc/SIL.Harmony/Maintenance/DataModelMaintenance.cs
| > | ||
| > Note on *how* the WS leaves the template: excluding it during the FW→CRDT import does **not** work — the importer queries entries, and `EnsureWritingSystemIsPopulated` requires a default vernacular WS, so a WS-less import throws. The generator therefore imports normally, then deletes the vernacular WS's commit (which cascades its change + snapshot), deletes its projected row, and calls `ReseedProject` to rehash the shortened chain before dumping `template.sql`. See `TEMPLATE-FOLLOWUPS.md` in lexbox for the current shape. | ||
|
|
||
| **Audience:** the next agent (potentially a different person) who will wire this up on the lexbox side (§11). This doc captures everything needed to implement without ambiguity *and* the decision history so future maintainers understand why each choice was made. |
There was a problem hiding this comment.
Resolve the lexbox-status contradiction in the audience statement.
Line 9 says this is for the next agent to wire lexbox, but Lines 3-7 say lexbox wiring is already done (except template regeneration). Please align these so the doc doesn’t send mixed signals about remaining work.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/decisions/reseed-project.md` at line 9, The "Audience:" line contradicts
earlier statements about lexbox wiring; decide whether lexbox is already wired
(except template regeneration) or still needs wiring and update the Audience
paragraph accordingly so the doc is consistent. Specifically, either change the
earlier lines that claim lexbox is wired (mentions of lexbox wiring and
"template regeneration") to indicate remaining work for the next agent, or
change the Audience line to state this doc is for future maintenance/templating
only (referencing §11 and "template regeneration"); make the chosen state
explicit and mention the sole remaining task ("template regeneration") if
applicable, and update any phrasing that references "wire this up" or "wire
lexbox" to match.
| ``` | ||
| src/SIL.Harmony/Maintenance/DataModelMaintenance.cs | ||
| src/SIL.Harmony/Maintenance/DataModel.Reseed.cs ← internal partial of DataModel | ||
| src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs ← test class | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks to satisfy markdownlint MD040.
Two fenced blocks are missing language tags. Use text (or bash/csharp where applicable) to keep docs lint-clean.
Suggested patch
-```
+```text
src/SIL.Harmony/Maintenance/DataModelMaintenance.cs
src/SIL.Harmony/Maintenance/DataModel.Reseed.cs ← internal partial of DataModel
src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs ← test class@@
- +text
src/SIL.Harmony/DataModel.cs
Also applies to: 179-181
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 171-171: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/decisions/reseed-project.md` around lines 171 - 175, Update the two
fenced code blocks that list the filenames (the blocks containing the lines
starting with "src/SIL.Harmony/Maintenance/DataModelMaintenance.cs",
"src/SIL.Harmony/Maintenance/DataModel.Reseed.cs",
"src/SIL.Harmony.Tests/Maintenance/ReseedProjectTests.cs" and the later block
containing "src/SIL.Harmony/DataModel.cs") by adding a language identifier to
the opening backticks (use "text" or a more specific tag like "bash"/"csharp" if
preferred) so the fences become ```text (or ```csharp/```bash) instead of plain
``` to satisfy markdownlint MD040.
| /// <exception cref="InvalidOperationException"> | ||
| /// Thrown if the commit chain is empty, or if its commits have more than one distinct ClientId. | ||
| /// </exception> |
There was a problem hiding this comment.
Include the duplicate (DateTime, Counter) failure mode in the XML exception contract.
The doc’s preconditions include a third InvalidOperationException case (duplicate hybrid timestamps), but the proposed XML exceptions only list empty chain and multi-author. Add the third case to keep public docs consistent with behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/decisions/reseed-project.md` around lines 335 - 337, Update the XML docs
for the method's <exception cref="InvalidOperationException"> contract to
include the third failure mode: throw when the commit chain contains duplicate
hybrid timestamps (duplicate (DateTime, Counter) entries). Edit the existing
<exception> block that currently lists "empty chain" and "commits with more than
one distinct ClientId" to also describe the duplicate (DateTime, Counter) case
so the public XML documentation matches the method's preconditions and behavior.
| foreach (var (oldId, newId, hash, newParentHash) in plan) | ||
| { | ||
| await repo.ExecuteSqlAsync($""" | ||
| INSERT INTO "Commits" ("Id", "ClientId", "DateTime", "Counter", "Metadata", "Hash", "ParentHash") | ||
| SELECT {newId}, {clientId}, "DateTime", "Counter", "Metadata", {hash}, {newParentHash} | ||
| FROM "Commits" WHERE "Id" = {oldId} | ||
| """); |
There was a problem hiding this comment.
Assert the commit-row statements each affect exactly one row.
Line 72 and Line 98 ignore ExecuteSqlAsync’s affected-row count. If INSERT ... SELECT ... WHERE "Id" = {oldId} or the later DELETE ever matches 0 rows, the dangling-FK check still won’t catch a missed commit replacement when that commit has no child rows. Fail fast on != 1 here so the transaction rolls back instead of silently shortening the chain.
Suggested guard
foreach (var (oldId, newId, hash, newParentHash) in plan)
{
- await repo.ExecuteSqlAsync($"""
+ var inserted = await repo.ExecuteSqlAsync($"""
INSERT INTO "Commits" ("Id", "ClientId", "DateTime", "Counter", "Metadata", "Hash", "ParentHash")
SELECT {newId}, {clientId}, "DateTime", "Counter", "Metadata", {hash}, {newParentHash}
FROM "Commits" WHERE "Id" = {oldId}
""");
+ if (inserted != 1)
+ throw new InvalidOperationException(
+ $"ReseedProject expected to insert exactly one replacement commit for {oldId}, but inserted {inserted}.");
}
@@
foreach (var (oldId, _, _, _) in plan)
{
- await repo.ExecuteSqlAsync($"""DELETE FROM "Commits" WHERE "Id" = {oldId}""");
+ var deleted = await repo.ExecuteSqlAsync($"""DELETE FROM "Commits" WHERE "Id" = {oldId}""");
+ if (deleted != 1)
+ throw new InvalidOperationException(
+ $"ReseedProject expected to delete exactly one original commit {oldId}, but deleted {deleted}.");
}Also applies to: 96-98
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/SIL.Harmony/Maintenance/DataModel.Reseed.cs` around lines 70 - 76, The
INSERT/DELETE SQL operations using repo.ExecuteSqlAsync do not check the
returned affected-row count; update the code around the loop that calls
repo.ExecuteSqlAsync for the INSERT INTO "Commits" (and the corresponding DELETE
later) to capture the integer result, assert it equals 1, and throw a
descriptive exception (including oldId/newId or operation context) if it is not
1 so the surrounding transaction will roll back; specifically modify the calls
to repo.ExecuteSqlAsync(...) in the reseed logic (the INSERT INTO "Commits"
SELECT ... WHERE "Id" = {oldId} and the later DELETE that removes the old
commit) to check the return value and fail fast on != 1.
Adds
DataModelMaintenance.ReseedProject(model, clientId)— re-identifies a commit chain bulk-loaded from a pre-built source (mints freshCommit.Ids, sets a uniformClientId, rehashes), so projects bootstrapped from a shared SQL template get distinct, valid chains.Guards: refuses empty / multi-author chains and
(DateTime, Counter)ties (re-minting random Ids would otherwise reorder them).CommitBase.GenerateHashgains a static overload so the reseed reuses the real hash rather than duplicating it. Full design + rationale indocs/decisions/reseed-project.md.Consumed by lexbox PR sillsdev/languageforge-lexbox#2281 (template-based project creation).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests