impl learnings and add configuration options#23
Conversation
…nd tasks when no references remain
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors multiple actor loops into spawned, cancellation-aware tasks; introduces a builder-based, strongly-typed Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Topic
participant CancelToken
participant ActorTask
participant GossipNet
participant DHT
Client->>Topic: Topic::new(..., Config)
Topic->>CancelToken: create & share token
Topic->>ActorTask: Handle::spawn_with(actor.run(rx), cancel_token.clone())
Note right of ActorTask: select! { rx.recv_async(), cancel_token.cancelled(), ticker.tick() }
GossipNet->>ActorTask: incoming events / RPC responses
ActorTask->>GossipNet: broadcast / join_peers (uses TimeoutConfig)
ActorTask->>DHT: get/put_mutable (uses DhtConfig timeouts & retries)
Client->>CancelToken: cancel()
CancelToken->>ActorTask: cancelled signal
ActorTask-->>Topic: exit run() and cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gossip/receiver.rs (1)
128-149:⚠️ Potential issue | 🟠 MajorRecord the message hash before the event can be popped for a waiter.
On Lines 128-149, if
waitersis non-empty andmsg_queuewas empty, the newly received event is immediately popped and sent at Lines 130-133, so thefront()-based hash update never runs. In practice,last_message_hashesmisses the common “consumer is already awaitingnext()” path, which breaks overlap detection for live-delivered messages.Possible fix
+ if let Some(Ok(iroh_gossip::api::Event::Received(msg))) = raw_event.as_ref() { + let mut hash = sha2::Sha512::new(); + hash.update(&msg.content); + if let Ok(lmh) = hash.finalize()[..32].try_into() { + if self.last_message_hashes.len() == 5 { + self.last_message_hashes.pop_front(); + } + self.last_message_hashes.push_back(lmh); + } + } self.msg_queue.push_front(raw_event); if let Some(waiter) = self.waiters.pop_back() { if let Some(event) = self.msg_queue.pop_back() { let _ = waiter.send(event); } else { let _ = waiter.send(None); // this should never happen } } - if let Some(Some(Ok(event))) = self.msg_queue.front() { + if let Some(Some(Ok(event))) = self.msg_queue.front() { match event { iroh_gossip::api::Event::Received(msg) => { tracing::debug!("GossipReceiver: received message from {:?}", msg.delivered_from); - let mut hash = sha2::Sha512::new(); - hash.update(msg.content.clone()); - if let Ok(lmh) = hash.finalize()[..32].try_into() { - if self.last_message_hashes.len() == 5 { - self.last_message_hashes.pop_front(); - } - self.last_message_hashes.push_back(lmh); - } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/receiver.rs` around lines 128 - 149, The hash update must run immediately after pushing raw_event so delivered messages aren't missed when a waiter is popped; move the front()-based Received message hashing block to just after self.msg_queue.push_front(raw_event) (before the waiter pop/send logic), checking the newly pushed front with the same pattern (if let Some(Some(Ok(iroh_gossip::api::Event::Received(msg)))) = self.msg_queue.front()) and then computing sha2::Sha512 over msg.content, truncating to 32 bytes, and maintaining self.last_message_hashes capacity (pop_front when len==5) before any waiter may pop the event.
🧹 Nitpick comments (1)
src/gossip/merge/bubble.rs (1)
66-80: Consider handling channel close explicitly.The pattern
Ok(action) = rx.recv_async()silently ignoresErrfrom a closed channel. If the channel closes before thecancel_tokenis cancelled, theselect!will spin on the other branches. Consider handling the channel closure explicitly to ensure clean shutdown:♻️ Suggested improvement
tokio::select! { - Ok(action) = rx.recv_async() => { - action(self).await; - } + result = rx.recv_async() => { + match result { + Ok(action) => action(self).await, + Err(_) => break Ok(()), // Channel closed + } + } _ = self.ticker.tick() => { // ... } _ = self.cancel_token.cancelled() => { break Ok(()); } - else => break Ok(()), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/merge/bubble.rs` around lines 66 - 80, The recv_async branch currently ignores a closed channel error by matching only Ok(action) = rx.recv_async(), so if the sender closes the channel the loop can spin; change the select branch to explicitly match the Result from rx.recv_async() (e.g., `res = rx.recv_async()`) and then handle Err(_) by breaking the loop or returning Ok(()) to perform a clean shutdown; keep existing behavior for Ok(action) by awaiting action(self).await and leave the ticker and cancel_token branches unchanged (refer to rx.recv_async(), self.ticker, self.merge(), and self.cancel_token.cancelled()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Cargo.toml`:
- Line 27: The Cargo.toml entry for tokio-util currently disables default
features which prevents CancellationToken from being available; update the
tokio-util dependency declaration to explicitly enable the runtime feature (keep
default-features = false) by adding features = ["rt"] so the crate provides
CancellationToken used across topic, publisher, bootstrap, receiver, and merge
modules.
In `@src/gossip/topic/topic.rs`:
- Around line 219-272: The tests module (mod tests containing async fn
test_topic_full_shutdown_on_drop) is not guarded and is compiled into non-test
builds; add the #[cfg(test)] attribute above the mod tests declaration so the
module (and the test function test_topic_full_shutdown_on_drop) is only included
during test builds, ensuring test-only code like
Topic::new/gossip_receiver/split and the cancel_token probe are excluded from
release binaries.
---
Outside diff comments:
In `@src/gossip/receiver.rs`:
- Around line 128-149: The hash update must run immediately after pushing
raw_event so delivered messages aren't missed when a waiter is popped; move the
front()-based Received message hashing block to just after
self.msg_queue.push_front(raw_event) (before the waiter pop/send logic),
checking the newly pushed front with the same pattern (if let
Some(Some(Ok(iroh_gossip::api::Event::Received(msg)))) = self.msg_queue.front())
and then computing sha2::Sha512 over msg.content, truncating to 32 bytes, and
maintaining self.last_message_hashes capacity (pop_front when len==5) before any
waiter may pop the event.
---
Nitpick comments:
In `@src/gossip/merge/bubble.rs`:
- Around line 66-80: The recv_async branch currently ignores a closed channel
error by matching only Ok(action) = rx.recv_async(), so if the sender closes the
channel the loop can spin; change the select branch to explicitly match the
Result from rx.recv_async() (e.g., `res = rx.recv_async()`) and then handle
Err(_) by breaking the loop or returning Ok(()) to perform a clean shutdown;
keep existing behavior for Ok(action) by awaiting action(self).await and leave
the ticker and cancel_token branches unchanged (refer to rx.recv_async(),
self.ticker, self.merge(), and self.cancel_token.cancelled()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 40781e9c-14a1-445e-b451-a914d23360a0
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
.github/workflows/test.ymlCargo.tomlsrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rs
…publisher record_content failing try_into if not exactly 5 + topic test only compiled for tests + fixed receiver pop before hashing
…gossip mailbox (only 64) + fixed circular reference in topic split code path (if topic => new => split is one call without keeping topic around specifically) + cleanup + cargo test phase added to cicd
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gossip/topic/bootstrap.rs`:
- Around line 91-95: The spawned async task currently always calls
sender.send(()) when it exits, which reports success even if it exited due to
cancellation; modify the task closures in bootstrap (the async move using
cancel_token and calling sender.send(())) to only send completion when the
bootstrap actually finished (i.e., on the success path), and avoid sending when
cancel_token.is_cancelled() caused the loop to break — e.g., check
cancel_token.is_cancelled() (or set a local `completed` flag on the real-success
path) before invoking sender.send(()); apply the same change for the other spawn
sites that use cancel_token and sender.send(()) so cancellation and real
completion are distinguishable.
In `@src/gossip/topic/topic.rs`:
- Around line 262-264: The test currently uses a single
tokio::task::yield_now().await then asserts
assert!(cancel_token.is_cancelled()), which is racy; instead await the
cancellation future on the CancelToken so the test blocks until shutdown
completes (e.g. await cancel_token.cancelled() or use tokio::time::timeout
around cancel_token.cancelled() for a bounded wait). Replace the
yield_now()/is_cancelled() check with awaiting the token's cancelled() future
(optionally wrapped in a timeout) to deterministically wait for shutdown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0b1e9a63-3a21-48c0-9454-a42f15b3ca39
📒 Files selected for processing (8)
.github/workflows/test.ymlsrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/test.yml
- src/gossip/merge/message_overlap.rs
…_request from cicd
…lled instead of yield
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gossip/receiver.rs (1)
64-70:⚠️ Potential issue | 🟠 MajorDon't collapse dead-actor failures into healthy-looking state.
After an unexpected receiver-task exit, these methods report
{},false, and[], so downstream code likesrc/gossip/topic/publisher.rs:93-100andsrc/gossip/merge/message_overlap.rs:86-101keeps making decisions on fabricated state. At minimum, log the call error before defaulting; ideally, surface shutdown explicitly instead of synthesizing defaults.Also applies to: 74-78, 98-102
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/receiver.rs` around lines 64 - 70, The neighbors() method currently swallows actor-call failures by .await.unwrap_or_default(), yielding fabricated empty-state; change it to detect Err from self.api.call(...) and log the error (include the error details) before returning a non-ambiguous value or propagating the failure (prefer returning Result or an explicit Shutdown/Unavailable sentinel instead of empty HashSet). Locate the call site using neighbors() and the inner act_ok! closure referencing actor.gossip_receiver.neighbors(), replace the unwrap_or_default pattern with explicit error handling that logs via the crate's logger/tracing and either returns a Result<HashSet<EndpointId>, Error> or a distinct enum variant indicating the receiver task has exited; apply the same fix to the other methods with the same pattern (the other self.api.call(...).await.unwrap_or_default() sites mentioned).
♻️ Duplicate comments (1)
src/gossip/topic/bootstrap.rs (1)
91-95:⚠️ Potential issue | 🟠 MajorOnly signal bootstrap completion on actual join.
All of the cancellation exits still fall through to
sender.send(()), sobootstrap()resolves even if the topic never joined. That makes cancellation indistinguishable from success and breaks the method contract.🔧 Minimal fix
tokio::spawn({ let mut last_published_unix_minute = 0; let (gossip_sender, gossip_receiver) = (self.gossip_sender.clone(), self.gossip_receiver.clone()); let record_publisher = self.record_publisher.clone(); let cancel_token = self.cancel_token.clone(); async move { tracing::debug!("Bootstrap: starting bootstrap process"); + let mut joined = false; while !cancel_token.is_cancelled() { // Check if we are connected to at least one node if gossip_receiver.is_joined().await { tracing::debug!("Bootstrap: already joined, exiting bootstrap loop"); + joined = true; break; } @@ if gossip_receiver.is_joined().await { tracing::debug!( "Bootstrap: successfully joined via peer {}", node_id ); + joined = true; break; } @@ if gossip_receiver.is_joined().await { tracing::debug!("Bootstrap: successfully joined after final wait"); + joined = true; break; } else { @@ } - tracing::debug!("Bootstrap: exited"); - let _ = sender.send(()); + if joined { + tracing::debug!("Bootstrap: completed successfully"); + let _ = sender.send(()); + } else { + tracing::debug!("Bootstrap: cancelled before join completed"); + } } });Also applies to: 144-147, 190-193, 219-222, 254-257, 261-262
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/bootstrap.rs` around lines 91 - 95, The bootstrap task currently always calls sender.send(()) on any exit (including cancellations), making cancellation indistinguishable from a successful join; change the logic in bootstrap() so sender.send(()) is only invoked after an actual successful join event: introduce or use a local "joined" flag (or detect the successful join path in the async loop where the topic join completes), set joined = true when join succeeds, and on cancellation simply return/stop without calling sender.send(()). Apply the same pattern wherever sender.send(()) is currently unconditionally called (e.g., within the closures using cancel_token.clone() and async move blocks) so that all early/cancel exits skip sending and only the real join path sends the completion signal.
🧹 Nitpick comments (3)
.github/workflows/test.yml (1)
21-21: Pinactions/checkout@v6to a full commit SHA for supply-chain security.Use the pattern
actions/checkout@<commit-sha> # v6.0.2instead of the major tag. GitHub's official security guidance recommends commit SHA pinning as the only immutable approach; major version tags can be moved or recreated, introducing risk if the underlying code changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test.yml at line 21, Replace the floating major tag for the checkout action to a pinned commit SHA: locate the uses: actions/checkout@v6 reference in the GitHub Actions workflow and change it to uses: actions/checkout@<full-commit-sha> and append a comment indicating the original tag (e.g., "# v6.0.2") so the action is immutably pinned while preserving human-readable version context.src/gossip/receiver.rs (2)
139-141: Avoid cloning the payload on the receive hot path.
msg.content.clone()adds a full buffer copy for every received message while the event is already retained inmsg_queue. Hashing borrowed bytes avoids that extra memory traffic.♻️ Proposed fix
- hash.update(msg.content.clone()); + hash.update(msg.content.as_ref());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/receiver.rs` around lines 139 - 141, Avoid cloning message payload on the receive hot path: in the GossipReceiver code where you create sha2::Sha512 and call hash.update(msg.content.clone()), change it to hash.update on a borrowed reference (e.g., &msg.content or msg.content.as_ref()) so the content is hashed by reference without allocating a new buffer; update the call site that currently does hash.update(msg.content.clone()) to use a borrowed slice instead.
126-134: Log the terminal gossip error before treating it as close.This branch intentionally normalizes
ApiErrorinto shutdown, but it drops the concrete failure when no waiter is parked. A log here is the only remaining breadcrumb for diagnosing receiver failures.🪵 Minimal tweak
- Some(Err(_)) => { + Some(Err(err)) => { + tracing::warn!(?err, "GossipReceiver: gossip stream closed with error"); // if Err => stream error, treat as closed if let Some(waiter) = self.waiters.pop_back() { - let _ = waiter.send(raw_event); + let _ = waiter.send(Some(Err(err))); } while let Some(waiter) = self.waiters.pop_back() { let _ = waiter.send(None); } break Ok(());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/receiver.rs` around lines 126 - 134, Capture the concrete error in the Some(Err(_)) arm (e.g., change Some(Err(_)) to Some(Err(err))) and emit a log entry with that error before you treat it as a close; then continue sending raw_event to the last waiter and None to remaining waiters as currently done (referencing the Some(Err(err)) match arm, raw_event, self.waiters.pop_back(), and waiter.send). Use the project’s logging facility (e.g., tracing::error! or the crate’s logger) to include the error value and brief context so failures are observable even when no waiter remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/gossip/receiver.rs`:
- Around line 64-70: The neighbors() method currently swallows actor-call
failures by .await.unwrap_or_default(), yielding fabricated empty-state; change
it to detect Err from self.api.call(...) and log the error (include the error
details) before returning a non-ambiguous value or propagating the failure
(prefer returning Result or an explicit Shutdown/Unavailable sentinel instead of
empty HashSet). Locate the call site using neighbors() and the inner act_ok!
closure referencing actor.gossip_receiver.neighbors(), replace the
unwrap_or_default pattern with explicit error handling that logs via the crate's
logger/tracing and either returns a Result<HashSet<EndpointId>, Error> or a
distinct enum variant indicating the receiver task has exited; apply the same
fix to the other methods with the same pattern (the other
self.api.call(...).await.unwrap_or_default() sites mentioned).
---
Duplicate comments:
In `@src/gossip/topic/bootstrap.rs`:
- Around line 91-95: The bootstrap task currently always calls sender.send(())
on any exit (including cancellations), making cancellation indistinguishable
from a successful join; change the logic in bootstrap() so sender.send(()) is
only invoked after an actual successful join event: introduce or use a local
"joined" flag (or detect the successful join path in the async loop where the
topic join completes), set joined = true when join succeeds, and on cancellation
simply return/stop without calling sender.send(()). Apply the same pattern
wherever sender.send(()) is currently unconditionally called (e.g., within the
closures using cancel_token.clone() and async move blocks) so that all
early/cancel exits skip sending and only the real join path sends the completion
signal.
---
Nitpick comments:
In @.github/workflows/test.yml:
- Line 21: Replace the floating major tag for the checkout action to a pinned
commit SHA: locate the uses: actions/checkout@v6 reference in the GitHub Actions
workflow and change it to uses: actions/checkout@<full-commit-sha> and append a
comment indicating the original tag (e.g., "# v6.0.2") so the action is
immutably pinned while preserving human-readable version context.
In `@src/gossip/receiver.rs`:
- Around line 139-141: Avoid cloning message payload on the receive hot path: in
the GossipReceiver code where you create sha2::Sha512 and call
hash.update(msg.content.clone()), change it to hash.update on a borrowed
reference (e.g., &msg.content or msg.content.as_ref()) so the content is hashed
by reference without allocating a new buffer; update the call site that
currently does hash.update(msg.content.clone()) to use a borrowed slice instead.
- Around line 126-134: Capture the concrete error in the Some(Err(_)) arm (e.g.,
change Some(Err(_)) to Some(Err(err))) and emit a log entry with that error
before you treat it as a close; then continue sending raw_event to the last
waiter and None to remaining waiters as currently done (referencing the
Some(Err(err)) match arm, raw_event, self.waiters.pop_back(), and waiter.send).
Use the project’s logging facility (e.g., tracing::error! or the crate’s logger)
to include the error value and brief context so failures are observable even
when no waiter remains.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 67f0f1b9-42ae-4c89-8088-d43274c9dac1
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
.github/workflows/test.ymlCargo.tomlsrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/gossip/topic/publisher.rs (1)
84-91: Choose the capped neighbor set intentionally.
neighbors()returns aHashSet, soiter().take(5)depends on hash iteration order. Once a node has more than five neighbors, otherwise equivalent topologies can publish differentactive_peersarrays and churn the DHT record for no semantic reason.🧭 One deterministic option
- let active_peers = self + let mut active_peers = self .gossip_receiver .neighbors() .await? - .iter() - .filter_map(|pub_key| pub_key.as_bytes().as_ref().try_into().ok()) - .take(5) - .collect::<Vec<_>>(); + .into_iter() + .map(|pub_key| *pub_key.as_bytes()) + .collect::<Vec<[u8; 32]>>(); + active_peers.sort_unstable(); + active_peers.truncate(5);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/publisher.rs` around lines 84 - 91, The current construction of active_peers uses neighbors() (a HashSet) and then iter().take(5), which produces a nondeterministic selection; instead, make the selection deterministic by collecting the neighbors into a Vec, map each neighbor to its canonical key (e.g., pub_key.as_bytes() or hex string), sort the Vec by that canonical representation, and only then convert and take the first N (the existing 5) to build active_peers. Update the logic around active_peers in publisher.rs (the block using self.gossip_receiver.neighbors(), active_peers variable) to perform collect -> sort_by_key -> take(5) -> filter_map/try_into so the same peer set is chosen deterministically across runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gossip/receiver.rs`:
- Around line 115-131: The receiver currently exits on the stream being closed
(match arm for None) and on stream errors (match arm for Some(Err(_))) without
signaling shutdown, so sibling actors wait forever; modify the two terminal
branches in recv loop (the None branch and the Some(Err(_)) branch inside the
match over raw_event) to call self.cancel_token.cancel() before performing the
waiter cleanup and breaking out (i.e., call self.cancel_token.cancel(); then run
the existing while let Some(waiter) = self.waiters.pop_back() { let _ =
waiter.send(...); } and break Ok(()) in both branches).
In `@src/gossip/topic/publisher.rs`:
- Around line 36-37: The interval currently created with
tokio::time::interval(Duration::from_secs(10)) produces an immediate first tick
so the publisher publishes on startup; change the setup to use
tokio::time::interval_at with a start Instant delayed by Duration::from_secs(10)
(e.g., Instant::now() + Duration::from_secs(10)) and keep the existing
ticker.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip) so the
first tick happens after 10s; update the code that constructs the ticker
variable in publisher.rs (the ticker binding/creation) to use interval_at(...)
instead of interval(...).
---
Nitpick comments:
In `@src/gossip/topic/publisher.rs`:
- Around line 84-91: The current construction of active_peers uses neighbors()
(a HashSet) and then iter().take(5), which produces a nondeterministic
selection; instead, make the selection deterministic by collecting the neighbors
into a Vec, map each neighbor to its canonical key (e.g., pub_key.as_bytes() or
hex string), sort the Vec by that canonical representation, and only then
convert and take the first N (the existing 5) to build active_peers. Update the
logic around active_peers in publisher.rs (the block using
self.gossip_receiver.neighbors(), active_peers variable) to perform collect ->
sort_by_key -> take(5) -> filter_map/try_into so the same peer set is chosen
deterministically across runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d7ffc5fe-0e5e-4c6b-beff-e1f32b75a500
📒 Files selected for processing (5)
src/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/receiver.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/gossip/merge/bubble.rs
- src/gossip/merge/message_overlap.rs
- src/gossip/topic/bootstrap.rs
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
src/gossip/receiver.rs (1)
159-170:⚠️ Potential issue | 🟠 MajorCancel the shared token when the gossip stream terminates.
These two terminal branches stop the receiver actor, but they never call
cancel_token.cancel(). That leaves sibling workers alive even though the underlying gossip stream is already gone.Suggested fix
None => { tracing::debug!("GossipReceiver: gossip receiver closed"); + self.cancel_token.cancel(); self.join_channel_sender.send(None).ok(); self.next_channel_sender.send(None).ok(); break Ok(()); } Some(Err(err)) => { tracing::warn!("GossipReceiver: error receiving gossip event: {err}"); + self.cancel_token.cancel(); self.next_channel_sender.send(None).ok(); self.join_channel_sender.send(None).ok(); break Ok(()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/receiver.rs` around lines 159 - 170, The receiver match's terminal branches for raw_event (None and Some(Err(err))) never call cancel_token.cancel(), leaving sibling workers running; update those two branches in the match handling inside the GossipReceiver actor to call cancel_token.cancel() (or cancel_token.cancelled().cancel() if using a wrapper) before sending None on join_channel_sender and next_channel_sender and before breaking Ok(()), referencing the existing cancel_token variable so the cancellation is triggered whenever the gossip stream ends or errors.src/gossip/topic/publisher.rs (1)
41-42:⚠️ Potential issue | 🟠 MajorHonor
initial_delayand keep the configured duration precision.
tokio::time::interval(initial_delay)still publishes on the first tick immediately, so the startup delay is skipped. The lateras_secs()/from_secs()path also truncates any sub-second config to0s, which can turn validDurationsettings into an unintended hot loop. Build the ticker with a delayed start instant and keepnext_intervalas aDurationend-to-end.Also applies to: 74-76
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/publisher.rs` around lines 41 - 42, The ticker currently uses tokio::time::interval(initial_delay) which fires immediately and also converts durations via as_secs()/from_secs() losing sub-second precision; change to create the ticker with a delayed start using tokio::time::interval_at(std::time::Instant::now() + initial_delay, initial_delay) so the first tick waits the full initial_delay, and keep next_interval as a Duration end-to-end (remove any as_secs()/from_secs() conversions) wherever next_interval and ticker are used (e.g., the ticker creation and the code around the next_interval usage at the later block referenced).
🧹 Nitpick comments (3)
src/gossip/mod.rs (1)
26-29: Update struct field docs to avoid hard-coded “5”.The docs still mention
5, but the fields now useMAX_RECORD_PEERS/MAX_MESSAGE_HASHES. Updating the wording will prevent future drift.✏️ Suggested doc-only diff
- /// Fixed array of 5 peer node IDs (as 32-byte arrays, empty slots are zero-filled) + /// Fixed array of peer node IDs (as 32-byte arrays, empty slots are zero-filled) + /// Length is `MAX_RECORD_PEERS`. pub active_peers: [[u8; 32]; MAX_RECORD_PEERS], - /// Fixed array of 5 recent message hashes for overlap detection (empty slots are zero-filled) + /// Fixed array of recent message hashes for overlap detection (empty slots are zero-filled) + /// Length is `MAX_MESSAGE_HASHES`. pub last_message_hashes: [[u8; 32]; MAX_MESSAGE_HASHES],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/mod.rs` around lines 26 - 29, Update the documentation comments for the struct fields active_peers and last_message_hashes to remove the hard-coded "5" and instead reference the constants used: describe active_peers as a fixed-size array of MAX_RECORD_PEERS peer node IDs and last_message_hashes as a fixed-size array of MAX_MESSAGE_HASHES recent message hashes (note empty slots are zero-filled); locate these comments adjacent to the active_peers and last_message_hashes fields in mod.rs and replace the numeric literal with the constant names and generalized wording.tests/tests.rs (1)
212-213: Minor inconsistency: unnecessarycrate::prefix.Lines 205-206 call
encryption_keypair(...)directly (which is imported at the top), but lines 212-213 usecrate::encryption_keypair(...). Consider using the imported version for consistency.let key3 = - crate::encryption_keypair(&topic_id, &rotation, initial_secret_hash, unix_minute + 1); + encryption_keypair(&topic_id, &rotation, initial_secret_hash, unix_minute + 1);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/tests.rs` around lines 212 - 213, Replace the inconsistent fully-qualified call to crate::encryption_keypair with the imported function name encryption_keypair to match the other calls; locate the call to crate::encryption_keypair(&topic_id, &rotation, initial_secret_hash, unix_minute + 1) and change it to encryption_keypair(...) so usage is consistent with earlier invocations of encryption_keypair.src/gossip/merge/bubble.rs (1)
1-4: Documentation references hardcoded value that is now configurable.The module doc comment says "If local peer count < 4" but this threshold is now configurable via
min_neighbors. Consider updating the documentation.//! Bubble detection: merge isolated peer groups in the same topic. //! -//! If local peer count < 4, extract suggested peers from DHT records and join them. +//! If local peer count is below the configured minimum, extract suggested peers from DHT records and join them.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/merge/bubble.rs` around lines 1 - 4, Update the module doc comment in bubble.rs to remove the hardcoded "If local peer count < 4" text and instead refer to the configurable threshold (min_neighbors); mention that bubble detection uses the configured min_neighbors value from the gossip configuration (or the min_neighbors field) to decide when to extract suggested peers from DHT records. Ensure the doc comment references the min_neighbors symbol so it stays accurate when the default changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/full_config.rs`:
- Line 8: Fix the minor typo in the comment string "Imports from
distrubuted-topic-tracker" by changing "distrubuted" to "distributed" in
examples/full_config.rs (update the comment that contains "Imports from
distrubuted-topic-tracker" to read "Imports from distributed-topic-tracker").
In `@examples/without_mergers.rs`:
- Line 6: Fix the typo "distrubuted-topic-tracker" in the source comments by
replacing it with "distributed-topic-tracker"; locate the offending comment
string (e.g., the comment containing "distrubuted-topic-tracker" in
examples/without_mergers.rs and the similar occurrence in full_config.rs) and
update both occurrences so the comment reads "distributed-topic-tracker".
In `@src/gossip/topic/topic.rs`:
- Around line 138-164: The code currently ignores errors from starting workers
(start_publishing, start_bubble_merge, start_message_overlap_merge) by using
`let _ = ...`, which can silently leave features disabled; change each
`api.call(act!(actor => actor.start_*())).await` invocation to handle and
propagate failures: unwrap or `?` the Result (or map the error into Topic
construction failure) so Topic construction fails when an enabled worker fails
to start, or at minimum log an error and return Err; locate these calls on
`record_publisher` checks and update them to return or propagate the error
instead of discarding it.
---
Duplicate comments:
In `@src/gossip/receiver.rs`:
- Around line 159-170: The receiver match's terminal branches for raw_event
(None and Some(Err(err))) never call cancel_token.cancel(), leaving sibling
workers running; update those two branches in the match handling inside the
GossipReceiver actor to call cancel_token.cancel() (or
cancel_token.cancelled().cancel() if using a wrapper) before sending None on
join_channel_sender and next_channel_sender and before breaking Ok(()),
referencing the existing cancel_token variable so the cancellation is triggered
whenever the gossip stream ends or errors.
In `@src/gossip/topic/publisher.rs`:
- Around line 41-42: The ticker currently uses
tokio::time::interval(initial_delay) which fires immediately and also converts
durations via as_secs()/from_secs() losing sub-second precision; change to
create the ticker with a delayed start using
tokio::time::interval_at(std::time::Instant::now() + initial_delay,
initial_delay) so the first tick waits the full initial_delay, and keep
next_interval as a Duration end-to-end (remove any as_secs()/from_secs()
conversions) wherever next_interval and ticker are used (e.g., the ticker
creation and the code around the next_interval usage at the later block
referenced).
---
Nitpick comments:
In `@src/gossip/merge/bubble.rs`:
- Around line 1-4: Update the module doc comment in bubble.rs to remove the
hardcoded "If local peer count < 4" text and instead refer to the configurable
threshold (min_neighbors); mention that bubble detection uses the configured
min_neighbors value from the gossip configuration (or the min_neighbors field)
to decide when to extract suggested peers from DHT records. Ensure the doc
comment references the min_neighbors symbol so it stays accurate when the
default changes.
In `@src/gossip/mod.rs`:
- Around line 26-29: Update the documentation comments for the struct fields
active_peers and last_message_hashes to remove the hard-coded "5" and instead
reference the constants used: describe active_peers as a fixed-size array of
MAX_RECORD_PEERS peer node IDs and last_message_hashes as a fixed-size array of
MAX_MESSAGE_HASHES recent message hashes (note empty slots are zero-filled);
locate these comments adjacent to the active_peers and last_message_hashes
fields in mod.rs and replace the numeric literal with the constant names and
generalized wording.
In `@tests/tests.rs`:
- Around line 212-213: Replace the inconsistent fully-qualified call to
crate::encryption_keypair with the imported function name encryption_keypair to
match the other calls; locate the call to crate::encryption_keypair(&topic_id,
&rotation, initial_secret_hash, unix_minute + 1) and change it to
encryption_keypair(...) so usage is consistent with earlier invocations of
encryption_keypair.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 724a05cb-26ad-4049-b502-0334c1d7c100
📒 Files selected for processing (26)
Cargo.tomlREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- Cargo.toml
…and reduced to sec -> ms + docs string fixes + receiver cancel on failure + some correctness fixes
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
README.md (1)
24-67:⚠️ Potential issue | 🟠 MajorREADME example now hides a stale API call.
Line 24 switched the block to
ignore, and Lines 56-62 still use the oldRecordPublisher::newsignature. This makes the Quick Start snippet inaccurate and avoids catching it in doc validation.📌 Proposed doc fix
-```rust,no_run,ignore +```rust,no_run @@ -use distributed_topic_tracker::{TopicId, AutoDiscoveryGossip, RecordPublisher}; +use distributed_topic_tracker::{TopicId, AutoDiscoveryGossip, Config, RecordPublisher}; @@ let record_publisher = RecordPublisher::new( topic_id.clone(), - signing_key.verifying_key(), signing_key.clone(), None, initial_secret, + Config::default(), );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 24 - 67, The README example uses an outdated RecordPublisher::new signature; update the import list to include Config (add Config to the use distributed_topic_tracker import) and change the RecordPublisher::new call by removing the signing_key.verifying_key() argument and appending Config::default() as the final parameter (i.e., call RecordPublisher::new(topic_id.clone(), signing_key.clone(), None, initial_secret, Config::default())). Ensure the code block remains rust,no_run so docs validate.src/dht.rs (1)
45-74:⚠️ Potential issue | 🟡 MinorUpdate the rustdoc to match the new method signatures.
getno longer takestimeout, andput_mutableno longer takesretry_countortimeout, but the public docs still describe those removed parameters. That leaves the generated API docs misleading.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dht.rs` around lines 45 - 74, Update the rustdoc for the DHT methods to match their current signatures: remove any mention of `timeout` from the `get` docs and remove `retry_count` and `timeout` from the `put_mutable` docs, and ensure the parameter lists for `get(pub_key: VerifyingKey, salt: Option<Vec<u8>>, more_recent_than: Option<i64>)` and `put_mutable(&self, signing_key: ..., pub_key: ..., salt: Option<Vec<u8>>, data: ...)` accurately describe the existing arguments (`signing_key`, `pub_key`, `salt`, `data`, etc.) and their semantics so the generated API docs are not misleading; update the prose above `pub async fn get` and `pub async fn put_mutable` accordingly.src/gossip/topic/bootstrap.rs (1)
139-148:⚠️ Potential issue | 🟠 MajorThe first bootstrap scan skips the current minute and looks back two minutes.
When
last_published_unix_minute == 0,unix_minuteis alreadycrate::unix_minute(-1). Lines 147-148 then fetchunix_minute - 1andunix_minute, which meansM-2andM-1, notM-1andM. Fresh records from the current minute are ignored on the initial pass, which can delay discovery unnecessarily.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/bootstrap.rs` around lines 139 - 148, The code computes unix_minute via crate::unix_minute with a -1 sentinel when last_published_unix_minute == 0, but then calls record_publisher.get_records(unix_minute - 1) and get_records(unix_minute), which yields M-2 and M-1; change the fetch window to unix_minute and unix_minute + 1 so the initial pass retrieves the previous minute and the current minute (i.e., replace record_publisher.get_records(unix_minute - 1) and record_publisher.get_records(unix_minute) with record_publisher.get_records(unix_minute) and record_publisher.get_records(unix_minute + 1)); keep references to last_published_unix_minute, crate::unix_minute, and record_publisher.get_records when editing.
♻️ Duplicate comments (1)
src/gossip/topic/publisher.rs (1)
33-42:⚠️ Potential issue | 🟠 Major
initial_delayis not honored by this ticker setup.The doc comment says the first publish happens after
initial_delay, buttokio::time::interval(initial_delay)yields an immediate first tick. If you want a delayed first publish, this needsinterval_at(...)with a future start instant instead. (docs.rs)Suggested fix
- let mut ticker = tokio::time::interval(initial_delay); + let start = tokio::time::Instant::now() + initial_delay; + let mut ticker = tokio::time::interval_at(start, initial_delay); ticker.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip);Does `tokio::time::interval(Duration)` yield immediately on the first `tick().await`, and is `tokio::time::interval_at(Instant::now() + initial_delay, period)` the right way to delay the first tick?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/publisher.rs` around lines 33 - 42, The constructor new currently creates ticker with tokio::time::interval(initial_delay) which produces an immediate first tick and thus ignores initial_delay; change to create the ticker with tokio::time::interval_at(start, base_interval) where start = tokio::time::Instant::now() + initial_delay so the first publish occurs after initial_delay, keep the existing missed-tick behavior call (ticker.set_missed_tick_behavior) and ensure you reference the same ticker variable and the parameters initial_delay and base_interval.
🧹 Nitpick comments (3)
src/crypto/mod.rs (1)
13-13: Call outRecordTopicremoval as a migration note.Line 13 changes the public export surface. Please document this in release notes (or add a temporary deprecated compatibility alias) to reduce downstream breakage surprises.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crypto/mod.rs` at line 13, The public API removed RecordTopic from the crypto exports; either document this breaking change in the release notes/migration guide calling out removal of RecordTopic, or restore a temporary deprecated compatibility alias (e.g., re-export or type alias named RecordTopic with a #[deprecated] notice) in src/crypto/mod.rs so downstream users have time to migrate; reference the removed symbol RecordTopic and the existing exports EncryptedRecord, Record, RecordPublisher when adding the note or alias.src/lib.rs (1)
40-43: Consider adding overflow protection for negative results.The integer division change is correct, but casting a negative result (from a large negative
minute_offset) tou64will wrap. Since this is#[doc(hidden)]and typically used with small offsets, this is low risk but worth noting.Optional defensive fix
#[doc(hidden)] pub fn unix_minute(minute_offset: i64) -> u64 { - ((chrono::Utc::now().timestamp() / 60) + minute_offset) as u64 + ((chrono::Utc::now().timestamp() / 60) + minute_offset) + .try_into() + .expect("unix_minute result must be non-negative") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib.rs` around lines 40 - 43, The unix_minute function can produce a negative i64 when minute_offset is large negative, and casting that to u64 will wrap; update unix_minute to compute the base minute (chrono::Utc::now().timestamp() / 60) + minute_offset into a signed integer, check for negativity (or use checked_add/checked_sub) and if the result is negative return 0 (or otherwise handle as an error), otherwise convert the non-negative value to u64 (e.g., via TryFrom or an explicit if res >= 0 { res as u64 } ) to avoid unsigned wrap-around.src/config.rs (1)
196-196: Clarify the jitter formula comment to match current types.The comment references
usize % max_jitter, but config fields areDuration. This is misleading for future maintainers.Proposed comment update
-/// total_interval = base_interval + rand::random::<usize>() % max_jitter; +/// Effective interval is `base_interval + jitter`, where `jitter` is sampled in `[0, max_jitter)`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config.rs` at line 196, Update the comment to reflect that base_interval and max_jitter are Durations and that the jitter is a randomly generated Duration added to base_interval; e.g., state the formula as total_interval = base_interval + random_jitter where random_jitter is a Duration in the range [0, max_jitter), and reference the config fields base_interval, max_jitter and the resulting total_interval so future readers understand the types and how the random value is derived.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/full_config.rs`:
- Around line 88-93: The inline comment "Split into sink (sending) and stream
(receiving)" is placed above RecordPublisher::builder but refers to the
topic.split() call later; move or reattach that comment to immediately precede
the topic.split() invocation (the split method on the topic variable) so the
comment correctly documents topic.split(), or alternatively update the comment
text to describe RecordPublisher::builder if you intended it there; ensure the
comment is adjacent to the unique symbol topic.split() (or to
RecordPublisher::builder) to avoid confusion.
In `@examples/without_mergers.rs`:
- Around line 32-33: The comment "Split into sink (sending) and stream
(receiving)" is misplaced; move or remove it so it sits next to the actual split
invocation. Update the comment location to directly precede the topic.split()
call (the split method on the topic variable) in examples/without_mergers.rs, or
remove the comment entirely if redundant, ensuring the comment clearly documents
that topic.split() yields a sink and a stream.
In `@src/config.rs`:
- Around line 265-269: The setter max_bootstrap_records currently accepts 0
contrary to the documented minimum of 1; update max_bootstrap_records to enforce
the minimum by validating or clamping the input (e.g., if max_records < 1 set
self.config.max_bootstrap_records = 1 or return an Err) so the config cannot be
built with 0, and ensure any callers/tests that expect the old behavior are
adjusted accordingly.
- Around line 447-451: The setter max_join_peer_count currently accepts 0
contrary to the "(min 1)" contract; update the max_join_peer_count method to
enforce the minimum by validating the incoming max_peers and ensuring
self.config.max_join_peer_count is set to at least 1 (e.g., if max_peers == 0
then set to 1 or use usize::max(1, max_peers)); keep the method signature but
add this check so downstream logic (e.g., sender code that truncates join
candidates) never sees a zero limit.
In `@src/crypto/keys.rs`:
- Around line 99-101: The doc examples call signing_keypair and
encryption_keypair with TopicId by value but the public API now expects
&TopicId; update the examples to pass a reference (e.g., &topic) to both
signing_keypair and encryption_keypair in the shown snippet and the other
occurrence (the example around lines referencing 120-122) so the sample code
matches the function signatures.
In `@src/crypto/record.rs`:
- Around line 393-395: The length check in Record deserialization incorrectly
requires at least one content byte, rejecting valid empty RecordContent; update
the guard in record.rs (the check used in the Record deserialization logic) to
require buf.len() >= 32 + 8 + 32 + 64 (remove the trailing + 1) so zero-length
payloads are accepted, and adjust the error message if desired to reflect the
corrected minimum size; ensure this aligns with RecordContent and Record::sign
which permit empty payloads.
In `@src/dht.rs`:
- Around line 156-170: The retry calculation currently forces a 1s cap via
.min(1000) which ignores configured base_retry_interval()/max_retry_jitter();
remove the hard-coded .min(1000) so the computed retry_interval uses the
configured durations, ensure you convert the resulting value safely to u64 for
Duration::from_millis (e.g., via as u64 or a saturating/clamping convert) in the
same block where retry_interval is computed and used before the tracing::debug!
and tokio::time::sleep calls in src/dht.rs (the retry_interval variable and
surrounding put_mutable retry loop).
In `@src/gossip/merge/bubble.rs`:
- Around line 83-89: The schedule computation clamps the computed interval to
1000ms causing the actor to wake every second; in the tick branch where you
compute next_interval (currently using base_interval.as_millis() + ...
.min(1000)), remove the hard .min(1000) clamp and compute the next delay as a
Duration by adding base_interval and a jitter Duration derived from max_jitter
(e.g., convert the random millis into Duration), then call
self.ticker.reset_after(...) with that Duration; update the logic around
self.ticker.tick(), merge(), next_interval, and self.ticker.reset_after to use
Duration arithmetic instead of integer millis so base_interval and max_jitter
are honored.
In `@src/gossip/merge/message_overlap.rs`:
- Around line 74-80: The tick handler in MessageOverlapMerge computes
next_interval using base_interval and max_jitter but then forces a 1_000ms cap
via .min(1000), negating configuration; remove or replace the hard-coded
.min(1000) so next_interval respects base_interval and max_jitter (or use a
configurable cap constant), and ensure the value passed to
ticker.reset_after(Duration::from_millis(...)) is safely converted to u64;
update the block around self.ticker.tick() / let next_interval /
self.ticker.reset_after(...) so it schedules according to the configured
intervals rather than always <= 1s.
In `@src/gossip/topic/publisher.rs`:
- Around line 71-77: The code forces next_interval to at most 1000ms via
.min(1000), collapsing configurable base_interval/max_jitter to 1s; remove the
.min(1000) so next_interval = base_interval + jitter respects the configured
values (update the computation that sets next_interval and the subsequent call
to self.ticker.reset_after(Duration::from_millis(next_interval as u64))
accordingly), keeping references to self.ticker.tick(), base_interval,
max_jitter, next_interval, and self.ticker.reset_after.
---
Outside diff comments:
In `@README.md`:
- Around line 24-67: The README example uses an outdated RecordPublisher::new
signature; update the import list to include Config (add Config to the use
distributed_topic_tracker import) and change the RecordPublisher::new call by
removing the signing_key.verifying_key() argument and appending
Config::default() as the final parameter (i.e., call
RecordPublisher::new(topic_id.clone(), signing_key.clone(), None,
initial_secret, Config::default())). Ensure the code block remains rust,no_run
so docs validate.
In `@src/dht.rs`:
- Around line 45-74: Update the rustdoc for the DHT methods to match their
current signatures: remove any mention of `timeout` from the `get` docs and
remove `retry_count` and `timeout` from the `put_mutable` docs, and ensure the
parameter lists for `get(pub_key: VerifyingKey, salt: Option<Vec<u8>>,
more_recent_than: Option<i64>)` and `put_mutable(&self, signing_key: ...,
pub_key: ..., salt: Option<Vec<u8>>, data: ...)` accurately describe the
existing arguments (`signing_key`, `pub_key`, `salt`, `data`, etc.) and their
semantics so the generated API docs are not misleading; update the prose above
`pub async fn get` and `pub async fn put_mutable` accordingly.
In `@src/gossip/topic/bootstrap.rs`:
- Around line 139-148: The code computes unix_minute via crate::unix_minute with
a -1 sentinel when last_published_unix_minute == 0, but then calls
record_publisher.get_records(unix_minute - 1) and get_records(unix_minute),
which yields M-2 and M-1; change the fetch window to unix_minute and unix_minute
+ 1 so the initial pass retrieves the previous minute and the current minute
(i.e., replace record_publisher.get_records(unix_minute - 1) and
record_publisher.get_records(unix_minute) with
record_publisher.get_records(unix_minute) and
record_publisher.get_records(unix_minute + 1)); keep references to
last_published_unix_minute, crate::unix_minute, and record_publisher.get_records
when editing.
---
Duplicate comments:
In `@src/gossip/topic/publisher.rs`:
- Around line 33-42: The constructor new currently creates ticker with
tokio::time::interval(initial_delay) which produces an immediate first tick and
thus ignores initial_delay; change to create the ticker with
tokio::time::interval_at(start, base_interval) where start =
tokio::time::Instant::now() + initial_delay so the first publish occurs after
initial_delay, keep the existing missed-tick behavior call
(ticker.set_missed_tick_behavior) and ensure you reference the same ticker
variable and the parameters initial_delay and base_interval.
---
Nitpick comments:
In `@src/config.rs`:
- Line 196: Update the comment to reflect that base_interval and max_jitter are
Durations and that the jitter is a randomly generated Duration added to
base_interval; e.g., state the formula as total_interval = base_interval +
random_jitter where random_jitter is a Duration in the range [0, max_jitter),
and reference the config fields base_interval, max_jitter and the resulting
total_interval so future readers understand the types and how the random value
is derived.
In `@src/crypto/mod.rs`:
- Line 13: The public API removed RecordTopic from the crypto exports; either
document this breaking change in the release notes/migration guide calling out
removal of RecordTopic, or restore a temporary deprecated compatibility alias
(e.g., re-export or type alias named RecordTopic with a #[deprecated] notice) in
src/crypto/mod.rs so downstream users have time to migrate; reference the
removed symbol RecordTopic and the existing exports EncryptedRecord, Record,
RecordPublisher when adding the note or alias.
In `@src/lib.rs`:
- Around line 40-43: The unix_minute function can produce a negative i64 when
minute_offset is large negative, and casting that to u64 will wrap; update
unix_minute to compute the base minute (chrono::Utc::now().timestamp() / 60) +
minute_offset into a signed integer, check for negativity (or use
checked_add/checked_sub) and if the result is negative return 0 (or otherwise
handle as an error), otherwise convert the non-negative value to u64 (e.g., via
TryFrom or an explicit if res >= 0 { res as u64 } ) to avoid unsigned
wrap-around.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 83955c07-79db-487e-84d9-3738ce5f3b99
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (27)
.github/workflows/test.ymlCargo.tomlREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 261-270: The README example uses a nonexistent
MergeConfig::new(...) constructor; update the snippet to use the actual builder
API by calling MergeConfig::builder(), chaining the appropriate per-field
setters to set BubbleMergeConfig::Disabled and
MessageOverlapMergeConfig::Disabled, then .build(), so the Config::builder()
call composes with MergeConfig::builder() instead of MergeConfig::new(...).
In `@src/config.rs`:
- Around line 416-419: Doc says "first non-zero wins" but base_interval
currently overwrites unconditionally; update the impl of base_interval in the
builder so it matches the siblings: if the provided interval is Duration::ZERO,
ignore it; otherwise only set self.config.base_interval when the existing
self.config.base_interval is Duration::ZERO (i.e., preserve first non-zero).
Reference the method base_interval and make behavior consistent with
DhtConfigBuilder::base_retry_interval, BubbleMergeConfigBuilder::base_interval,
PublisherConfigBuilder::base_interval and the clamp in message_overlap.rs so
docs and implementation agree.
In `@src/gossip/topic/topic.rs`:
- Around line 170-176: The current sync path in
wait_for_bootstrap_and_spawn_workers/Topic::new only awaits
gossip_receiver().joined(), so it doesn't wait for the detached worker startup
task and worker-startup failures can race/tear down the Topic via
cancel_token.cancel(); change the implementation so the spawn task that starts
workers signals completion (e.g., via a oneshot or by returning a JoinHandle the
caller awaits) and have the sync branch await that signal/handle in addition to
(or instead of) gossip_receiver().joined() so worker-startup errors propagate
out of Topic::new; update references to gossip_receiver().joined(), the detached
spawn task that creates publisher/bubble merge/message overlap workers, and
cancel_token.cancel() accordingly if you choose the oneshot or join approach.
- Around line 30-83: TopicId currently only derives Debug and Clone which
prevents equality and hashing use; update the TopicId declaration to also derive
PartialEq, Eq, and Hash so it can be compared and used as keys in
HashMap/HashSet (i.e., change the derive on the TopicId struct to include
PartialEq, Eq, Hash) and run tests that compare TopicId values (tests that
currently compare .hash() can be updated to compare TopicId instances directly).
- Around line 53-57: The From<Vec<u8>> implementation for TopicId silently
hashes the input via Self::new(bytes), which can mislead callers who pass a
pre-computed 32-byte hash; either add a clear doc comment on the impl that
states "Bytes are treated as a topic name and will be SHA-512 hashed (use
TopicId::from_hash(&[u8;32]) for pre-computed hashes)" or remove the impl
entirely to force explicit use of TopicId::from_hash; update the file's TopicId
From<Vec<u8>> block accordingly and ensure the doc text mentions both the
hashing behavior and the alternative function name (TopicId::from_hash) so
callers are not surprised.
In `@tests/tests.rs`:
- Around line 249-395: The test_multiple_receivers_all_get_events integration
test currently allocates a formatted String on the happy path by calling
expect(&format!("receiver {i} timed out")) after tokio::time::timeout(...,
handle). Replace that expect call with unwrap_or_else(|_| panic!("receiver {i}
timed out")) on the timeout result so you avoid the allocation; locate the
tokio::time::timeout(..., handle).await.expect(...) expression and swap the
outer expect for unwrap_or_else as described.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 59f4ae7c-ab6c-42bc-af9c-cdbb7b8cab59
📒 Files selected for processing (15)
ARCHITECTURE.mdPROTOCOL.mdREADME.mdexamples/chat.rsexamples/full_config.rsexamples/without_mergers.rssrc/config.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rstests/tests.rs
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
src/gossip/topic/topic.rs (2)
152-180:⚠️ Potential issue | 🟠 MajorPropagate sync bootstrap/worker startup failures.
For
async_bootstrap == false,spawn_workers(...)can cancel the topic on a required worker failure, but Lines 176-180 only log the error andTopic::newstill returnsOk(Self { ... }). The sync path should await the same bootstrap completion signal and return the worker startup error.🛠️ Proposed control-flow fix
- tokio::spawn({ - let api = Arc::downgrade(&api); - let config = config.clone(); - let cancel_token = cancel_token.clone(); - async move { - if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { - tracing::warn!("bootstrap failed: {}", err); - return; - } - - if async_bootstrap { - tracing::debug!("Bootstrap completed, now spawning workers"); - if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { - tracing::warn!("failed to spawn workers: {}", err); - } - } - } - }); - if !async_bootstrap { tracing::debug!("Topic: waiting for bootstrap to complete"); - bootstrap.gossip_receiver().await?.joined().await?; - if let Err(err) = - spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await - { - tracing::warn!("failed to spawn workers: {}", err); - } + wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await?; + spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await?; tracing::debug!("Topic: bootstrap completed"); } else { + tokio::spawn({ + let api = Arc::downgrade(&api); + let config = config.clone(); + let cancel_token = cancel_token.clone(); + async move { + if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await + { + tracing::warn!("bootstrap failed: {}", err); + return; + } + + tracing::debug!("Bootstrap completed, now spawning workers"); + if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { + tracing::warn!("failed to spawn workers: {}", err); + } + } + }); tracing::debug!("Topic: bootstrap started asynchronously"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/topic.rs` around lines 152 - 180, For the synchronous bootstrap path (when async_bootstrap == false) the code currently logs spawn_workers(...) failures but still returns Ok(...) from Topic::new; change this so the sync path awaits the bootstrap completion exactly like the async path and returns the spawn_workers error instead of swallowing it: after bootstrap.gossip_receiver().await?.joined().await? call, call spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await and propagate any Err by returning it (e.g. using ? or mapping into the function's error type) rather than only tracing::warn; reference the symbols async_bootstrap, bootstrap.gossip_receiver().await?.joined().await?, spawn_workers, and Topic::new when making the change.
53-56:⚠️ Potential issue | 🟠 MajorPreserve raw TopicId hashes instead of rehashing
Vec<u8>.
From<Vec<u8>>hashes every byte vector viaTopicId::new, so a serialized/pre-computed 32-byte topic hash passed through.try_into()/.into()becomes a different topic id. Make raw bytes fallible and length-checked, and keep name hashing explicit throughTopicId::new.🛠️ Proposed fix
-impl From<Vec<u8>> for TopicId { - fn from(bytes: Vec<u8>) -> Self { - Self::new(bytes) - } -} +impl TryFrom<Vec<u8>> for TopicId { + type Error = anyhow::Error; + + fn try_from(bytes: Vec<u8>) -> std::result::Result<Self, Self::Error> { + anyhow::ensure!( + bytes.len() == 32, + "TopicId byte representation must be exactly 32 bytes" + ); + + let mut hash = [0u8; 32]; + hash.copy_from_slice(&bytes); + Ok(Self::from_hash(&hash)) + } +}Also update README wording that currently advertises
Vec<u8>conversion.Verify raw-hash deserialization/call sites before applying:
#!/bin/bash # Description: Inspect Vec<u8> -> TopicId conversion and topic-hash deserialization call sites. # Expectation: serialized 32-byte topic hashes should use TopicId::try_from(...) or TopicId::from_hash(...), # not the name-hashing TopicId::new(...) path. rg -n -C3 --type=rust 'impl From<Vec<u8>> for TopicId|impl TryFrom<Vec<u8>> for TopicId|TopicId::from_hash|TopicId::new\(|try_into\(\)' rg -n -C4 --type=rust 'topic.*try_into\(\)|hash\(\)\.to_vec\(\)|\[u8; 32\].*TopicId'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/topic.rs` around lines 53 - 56, The current impl From<Vec<u8>> for TopicId re-hashes any byte vector via TopicId::new which corrupts precomputed 32-byte topic hashes; change this by removing or replacing impl From<Vec<u8>> for TopicId with a fallible, length-checked TryFrom<Vec<u8>> (or impl TryFrom<&[u8]>) that only accepts exactly 32 bytes and constructs a TopicId from the raw hash (use a new TopicId::from_hash or equivalent constructor), keep TopicId::new as the explicit name-hashing path, update call sites that relied on .into()/try_into() to use TopicId::from_hash(...) or .try_into() accordingly, and update README wording that advertised Vec<u8> conversion to document the fallible raw-hash path and explicit name hashing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 117: Typo in the README upgrade notes: replace the misspelled word
"unneccessary" with the correct spelling "unnecessary" in the 0.3 release
paragraph (the sentence mentioning "Reduced unneccessary DHT writes and reads"),
updating the README.md content so the line reads "Reduced unnecessary DHT writes
and reads" while preserving the rest of the sentence and punctuation.
- Around line 153-157: The README example uses a plain integer for
Config::builder().max_join_peer_count which expects a std::num::NonZeroU32;
update the example to pass a NonZeroU32 instance created with
std::num::NonZeroU32::new(…) and unwrap or handle the Option before calling
max_join_peer_count so the builder call compiles (refer to
RecordPublisher::builder, Config::builder, and
ConfigBuilder::max_join_peer_count).
- Around line 265-285: The example config block is missing a terminating
semicolon and the heading spacing is wrong; update the Config example created
with Config::builder / MergeConfig::builder (and nested
BubbleMergeConfig::Disabled and MessageOverlapMergeConfig::Disabled) to end with
a semicolon after .build(); and ensure there is a blank line after the closing
triple-backtick fence before the **`RecordPublisher::builder()`...** heading to
restore proper markdown separation.
In `@src/config.rs`:
- Around line 714-718: The doc comment that begins "Whether to check
`unix_minute` and `unix_minute-1` or `unix_minute-1` and `unix_minute-2` on
startup." contains a sentence that mistakenly repeats `unix_minute-1`; update
the sentence to state that when enabled we first fetch `unix_minute-1` and
`unix_minute-2` (i.e., replace the repeated `unix_minute-1` with
`unix_minute-2`), and apply the exact same wording fix to the getter docs that
contain the same description (the other doc block starting with the same
"Whether to check `unix_minute`..." text).
---
Duplicate comments:
In `@src/gossip/topic/topic.rs`:
- Around line 152-180: For the synchronous bootstrap path (when async_bootstrap
== false) the code currently logs spawn_workers(...) failures but still returns
Ok(...) from Topic::new; change this so the sync path awaits the bootstrap
completion exactly like the async path and returns the spawn_workers error
instead of swallowing it: after
bootstrap.gossip_receiver().await?.joined().await? call, call
spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await and
propagate any Err by returning it (e.g. using ? or mapping into the function's
error type) rather than only tracing::warn; reference the symbols
async_bootstrap, bootstrap.gossip_receiver().await?.joined().await?,
spawn_workers, and Topic::new when making the change.
- Around line 53-56: The current impl From<Vec<u8>> for TopicId re-hashes any
byte vector via TopicId::new which corrupts precomputed 32-byte topic hashes;
change this by removing or replacing impl From<Vec<u8>> for TopicId with a
fallible, length-checked TryFrom<Vec<u8>> (or impl TryFrom<&[u8]>) that only
accepts exactly 32 bytes and constructs a TopicId from the raw hash (use a new
TopicId::from_hash or equivalent constructor), keep TopicId::new as the explicit
name-hashing path, update call sites that relied on .into()/try_into() to use
TopicId::from_hash(...) or .try_into() accordingly, and update README wording
that advertised Vec<u8> conversion to document the fallible raw-hash path and
explicit name hashing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b753efc1-b5a8-45b8-8c89-d51bf72dbd10
📒 Files selected for processing (3)
README.mdsrc/config.rssrc/gossip/topic/topic.rs
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
examples/chat_no_wait.rs (1)
49-60:⚠️ Potential issue | 🟡 MinorLog receiver errors instead of silently ending the listener.
while let Ok(event)exits the spawned listener on the firstErrwith no signal to the user. Surface the error before breaking, and mirror the same pattern in the other updated examples.Proposed fix
tokio::spawn(async move { - while let Ok(event) = gossip_receiver.next().await { + loop { + let event = match gossip_receiver.next().await { + Ok(event) => event, + Err(err) => { + eprintln!("gossip receiver stopped: {err}"); + break; + } + }; + if let Event::Received(msg) = event { println!( "\nMessage from {}: {}", &msg.delivered_from.to_string()[0..8], String::from_utf8(msg.content.to_vec()).unwrap() @@ } else if let Event::NeighborUp(peer) = event { println!("\nJoined by {}", &peer.to_string()[0..8]); } } });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/chat_no_wait.rs` around lines 49 - 60, The listener spawned with tokio::spawn currently uses while let Ok(event) = gossip_receiver.next().await which silently stops on the first Err; change the loop to explicitly match the result from gossip_receiver.next().await (e.g., loop { match gossip_receiver.next().await { Ok(event) => /* handle Event::Received and Event::NeighborUp as before using msg and peer */ }, Err(e) => { log or eprintln the error (including e) so the user sees why the listener ended, then break or decide to continue/retry } } ) so errors are surfaced; apply the same pattern to other updated examples using gossip_receiver and the Event enum.src/gossip/merge/bubble.rs (1)
136-145: 🧹 Nitpick | 🔵 TrivialSimplify neighbor-check pattern for consistency and clarity.
Line 138 checks a raw
[u8; 32]againstHashSet<EndpointId>while line 151–153 converts first. Adopt the clearer pattern throughout: convertactive_peertoEndpointIdfirst, then check membership.Suggested improvement
.filter_map(|&active_peer| { if active_peer == [0; 32] - || neighbors.contains(&active_peer) || active_peer == record.pub_key() || active_peer.eq(self_pub_key.as_bytes()) { - None - } else { - iroh::EndpointId::from_bytes(&active_peer).ok() + return None; } + + let pub_key = EndpointId::from_bytes(&active_peer).ok()?; + (!neighbors.contains(&pub_key)).then_some(pub_key) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/merge/bubble.rs` around lines 136 - 145, Convert the raw active_peer bytes to an EndpointId up front inside the filter_map closure (use iroh::EndpointId::from_bytes(&active_peer).ok()); if conversion fails return None, then perform membership checks against neighbors.contains(&peer) and compare peer to record.pub_key() and to self_pub_key (convert those to EndpointId once and reuse) and also filter out the all-zero ID by comparing to EndpointId from [0;32]; this makes the logic consistent and clearer in the filter_map that currently references active_peer, neighbors, record.pub_key(), self_pub_key, and iroh::EndpointId::from_bytes.PROTOCOL.md (1)
398-403:⚠️ Potential issue | 🟡 MinorEncoding Format mislabels the fixed-size key field as "variable".
encrypted_decryption_keyis exactly 88 bytes (enforced byEncryptedRecord::from_bytesviaENCRYPTED_KEY_LENGTH = 88), so thevariable:prefix is misleading. Match the spec to the implementation:📝 Proposed doc fix
[4 bytes: encrypted_record_length (little-endian u32)] [variable: encrypted_record data] -[variable: encrypted_decryption_key data (88 bytes)] +[88 bytes: encrypted_decryption_key]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROTOCOL.md` around lines 398 - 403, The spec incorrectly labels encrypted_decryption_key as "variable" even though the implementation enforces a fixed length; update the PROTOCOL.md EncryptedRecord Serialization section to reflect that encrypted_decryption_key is a fixed 88-byte field (match EncryptedRecord::from_bytes and ENCRYPTED_KEY_LENGTH = 88) instead of using the "variable:" prefix so the doc matches the code.src/crypto/record.rs (1)
261-275:⚠️ Potential issue | 🟠 MajorAdd a comment explaining the
i64::MAXsequence number choice, or implement monotonically-increasing sequences per minute.The current hardcoded
i64::MAXsequence number (1) creates an effective lock on the DHT slot for the entireunix_minutewindow—blocking subsequent publishes with the same signing key and salt—and (2) risksConflictRiskerrors when multiple nodes attempt concurrent publishes with identical sequence numbers and different payloads. This contradicts the dedup logic inget_records(), which expects to accumulate multiple records per minute slot from different publishers. Either document whyi64::MAXis intentional and acceptable, or switch to a per-minute monotonically-increasing sequence (e.g.,unix_minute * 1000 + counter) to permit re-publishes and handle concurrent updates within the same minute.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crypto/record.rs` around lines 261 - 275, The hardcoded next_seq_num = i64::MAX in RecordPublisher (where you call self.dht.put_mutable(... next_seq_num)) effectively locks the DHT slot and causes conflicts; either add a clear comment next to that constant explaining why using MAX is intentional and acceptable for this design, or replace it with a per-minute monotonically-increasing sequence derived from record.unix_minute() (e.g., seq = unix_minute * 1000 + per-minute counter) and persist/track the counter in RecordPublisher (e.g., a HashMap keyed by (sign_key, salt, unix_minute) or an atomic counter reset per minute) so put_mutable receives increasing sequence numbers and concurrent publishes within the same minute succeed; ensure get_records() behavior remains compatible with the new sequence scheme.
♻️ Duplicate comments (5)
src/gossip/topic/bootstrap.rs (1)
143-164:⚠️ Potential issue | 🟠 MajorDo not skip current-minute records during first-pass lookup.
When the startup flag selects
crate::unix_minute(-1), Lines 156-164 fetch current-2/current-1 and skip current. The no-record path can then publish a stale previous-minute record even after startup published current. Keep the lookup window separate from the minute used for publishing/caching.🔧 Proposed direction
- let mut use_cached_next = true; - let unix_minute = crate::unix_minute( - if last_published_unix_minute == 0 - && bootstrap_config.check_last_minute_record_first_on_startup() - { - use_cached_next = false; - -1 - } else { - 0 - }, - ); + let unix_minute = crate::unix_minute(0); + let include_previous = last_published_unix_minute == 0 + && bootstrap_config.check_last_minute_record_first_on_startup(); // Unique, verified records for the unix minute - let mut records = record_publisher - .get_records(unix_minute - 1) - .await - .unwrap_or_default(); + let mut records = if include_previous { + record_publisher + .get_records(unix_minute.saturating_sub(1)) + .await + .unwrap_or_default() + } else { + HashSet::new() + }; let current_records = record_publisher .get_records(unix_minute) .await .unwrap_or_default();Then pass
Some(current_records.clone())for current-minute cached publishes instead of disabling the cache based on the previous-minute lookup.Also applies to: 192-200, 343-351
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/bootstrap.rs` around lines 143 - 164, The code currently flips use_cached_next to false and calls crate::unix_minute(-1) which causes the lookup window to skip the current-minute records: adjust the logic in the bootstrap flow around use_cached_next, unix_minute, last_published_unix_minute, and bootstrap_config.check_last_minute_record_first_on_startup() so the "lookup window" (records gathered via record_publisher.get_records(unix_minute - 1) and get_records(unix_minute)) is independent from the minute used for publishing/caching; always fetch current_records (call record_publisher.get_records(unix_minute) regardless of the -1 startup path), extend records with current_records.clone(), and when disabling the cached-next-path still pass Some(current_records.clone()) as the cached current-minute publishes (instead of skipping current-minute data) so the first-pass lookup never omits the current-minute records referenced by functions like build_publish_list or whatever uses use_cached_next.src/dht.rs (1)
156-169:⚠️ Potential issue | 🟡 MinorFinish removing lossy retry-jitter arithmetic.
The retry clamp was removed, but Line 157 still casts
as_micros()tou64; sub-microsecond jitter becomes% 0, and extreme durations truncate. UseDuration/nanosecond arithmetic with a non-zero bound.🛡️ Proposed fix
- let jitter = if self.config.max_retry_jitter() > Duration::ZERO { - Duration::from_micros(rand::random::<u64>() % self.config.max_retry_jitter().as_micros() as u64) - } else { + let jitter = if self.config.max_retry_jitter().is_zero() { Duration::ZERO + } else { + let max_jitter_nanos = self + .config + .max_retry_jitter() + .as_nanos() + .min(u64::MAX as u128) as u64; + Duration::from_nanos(rand::random::<u64>() % max_jitter_nanos) };#!/bin/bash rg -n -C2 'from_micros\(rand::random::<u64>\(\) % .*as_micros\(\) as u64\)' --type rust🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dht.rs` around lines 156 - 169, The current jitter calculation casts max_retry_jitter().as_micros() to u64 causing a modulo-by-zero and truncation issues; update the jitter logic in the put_mutable retry block (where variables jitter and retry_interval are computed and used) to derive a non-zero bound in nanoseconds and use rand::Rng::gen_range to produce a random number of nanoseconds, e.g. compute let max_jitter_ns = self.config.max_retry_jitter().as_nanos(); if max_jitter_ns > 0 { let n = thread_rng().gen_range(0..max_jitter_ns as u64); jitter = Duration::from_nanos(n) } else { jitter = Duration::ZERO } and then compute retry_interval = self.config.base_retry_interval() + jitter so you avoid the as_micros() cast and % 0 errors.PROTOCOL.md (2)
149-153:⚠️ Potential issue | 🟡 MinorStale "DHT timeouts return empty record sets" claim.
Line 43 in the same document already correctly states timeouts return errors after retries are exhausted. This bullet here, plus the doc line at 42-43 already updated, are now inconsistent. Update this section to match the caller-handled error propagation semantics used by
DhtActor::get()/RecordPublisher::get_records()(which now propagates with?).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROTOCOL.md` around lines 149 - 153, Update the "DHT timeouts return empty record sets (non-fatal)" bullet to reflect that DHT timeouts and related failures now propagate errors to the caller (not silently return empty sets) — align this bullet with the earlier statement about timeouts returning errors after retries are exhausted and note that callers such as DhtActor::get() and RecordPublisher::get_records() propagate those errors (via ?), so callers must handle them.
110-113:⚠️ Potential issue | 🟡 MinorTime Window Selection still internally contradictory.
Bullets 1-2 distinguish two branches (
unix_minute-2 + unix_minute-1vsunix_minute + unix_minute-1), but bullet 3 ("Bothunix_minuteandunix_minute-1records are always fetched") flatly contradicts the alternate branch. Either drop bullet 3 or rewrite it as "two adjacent minute slots are fetched; which pair depends oncheck_last_minute_record_first_on_startup."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROTOCOL.md` around lines 110 - 113, The "Time Window Selection" section is internally contradictory: bullets describe two alternate branches (when check_last_minute_record_first_on_startup is true use unix_minute-2 + unix_minute-1; otherwise use unix_minute + unix_minute-1) but then claim "Both unix_minute and unix_minute-1 records are always fetched." Update the text to remove the contradiction by replacing the third bullet with a single clear statement such as "Two adjacent minute slots are fetched; the pair is either (unix_minute-2, unix_minute-1) on first startup when check_last_minute_record_first_on_startup is enabled, or (unix_minute, unix_minute-1) otherwise," and ensure the terms check_last_minute_record_first_on_startup, unix_minute, unix_minute-1, and unix_minute-2 are used to disambiguate the branches.src/config.rs (1)
786-797:⚠️ Potential issue | 🟡 MinorSame doc still says
unix_minute-1twice — only the builder copy was fixed.The companion fix landed on the builder at lines 714-718 (now correctly reads
unix_minute-1andunix_minute-2), but this getter's doc still has the duplicateunix_minute-1. Mirror the builder text here.📝 Proposed doc fix
- /// If this is enabled than we first fetch `unix_minute-1` and `unix_minute-1`. + /// If this is enabled, we first fetch `unix_minute-1` and `unix_minute-2`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config.rs` around lines 786 - 797, The docstring for the getter check_last_minute_record_first_on_startup still repeats `unix_minute-1` twice; update the comment above the pub fn check_last_minute_record_first_on_startup(&self) -> bool to mirror the builder docs and read `unix_minute-1` and `unix_minute-2` (same wording as the builder copy), leaving the rest of the explanation intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ARCHITECTURE.md`:
- Around line 123-124: The pseudocode has an off-by-one when minute is set to
-1: using recs = get_records(unix_minute(minute) - 1) +
get_records(unix_minute(minute)) ends up fetching current-2 and current-1
instead of current and previous; change the logic so that when first_attempt &&
check_last_minute_first is true the offset yields 0 (current) and -1 (previous)
for the two get_records calls (i.e., compute a proper minute offset variable
rather than reusing minute as an index), update the calls to
get_records(unix_minute(offset)) and get_records(unix_minute(offset - 1))
accordingly, and reflect this corrected current-plus-previous lookup in the
surrounding bootstrap flow description; refer to minute, first_attempt,
check_last_minute_first, get_records, unix_minute, and recs when making the
change.
In `@README.md`:
- Around line 289-291: The markdown task list item uses the wrong checkbox
syntax; update the line with "- [] Network degradation testing" to use the
GitHub task-list format "- [ ] Network degradation testing" so it renders as an
interactive checkbox (you can find this under the "## Todo's" section in
README.md).
In `@src/config.rs`:
- Around line 939-947: The Config::build() silently overrides an explicitly set
publisher_config to PublisherConfig::Disabled when
bootstrap_config.max_bootstrap_records == 0; update build (the Config::builder()
-> build() path) to either emit a tracing::warn! that includes the original
publisher_config value and the reason for the override (mention
bootstrap_config.max_bootstrap_records == 0) or change build() to return
Result<Config, ConfigError> and return a typed error instead of mutating
publisher_config; locate the logic in the build method that checks
bootstrap_config.max_bootstrap_records, and add the warning log or convert the
function signature to return a Result so callers are informed of the override
rather than silently losing their PublisherConfig.
- Around line 580-583: Add the missing Debug derive to the MergeConfigBuilder
struct so it matches the other builders and allows Debug propagation;
specifically, add #[derive(Debug)] above the declaration of MergeConfigBuilder
(the struct named MergeConfigBuilder that contains config: MergeConfig) so
callers holding this builder can be debug-printed like the other builders.
In `@src/crypto/keys.rs`:
- Line 5: The crypto module imports TopicId unconditionally but TopicId is
feature-gated; update the code so it compiles without the iroh-gossip feature:
either remove the unconditional use crate::TopicId from src/crypto/keys.rs and
gate the import and all APIs that reference TopicId with #[cfg(feature =
"iroh-gossip")], or re-export/move TopicId to an unconditional module so the
existing functions signing_keypair, encryption_keypair, and salt keep working;
specifically, pick one approach and apply it consistently—if gating, add
#[cfg(feature = "iroh-gossip")] to the use statement and to the function
signatures/impls that take &TopicId; if un-gating TopicId, move/re-export
TopicId from the crate root so the use crate::TopicId remains valid.
In `@src/crypto/record.rs`:
- Around line 226-235: The else branch needlessly re-iterates and clones the
HashSet returned by get_records(record.unix_minute()).await?; replace the
expression
self.get_records(record.unix_minute()).await?.iter().cloned().collect::<HashSet<_>>()
with the HashSet value directly by using the return of get_records(...) (e.g.,
assign the awaited result to records) so cached_records and get_records both
produce a HashSet<Record> without the redundant iter/cloned/collect; update any
variable binding (records) accordingly in the surrounding function.
In `@src/gossip/merge/bubble.rs`:
- Around line 89-96: The jitter calculation can panic because checking
self.max_jitter > Duration::ZERO doesn't prevent as_micros()/as_nanos() from
being zero; update the jitter logic inside the BubbleMerge code path (the block
computing jitter and next_interval in bubble.rs) to first compute a usable
non-zero bound (e.g., let max_nanos = self.max_jitter.as_nanos() as u64; if
max_nanos > 0 { Duration::from_nanos(rand::random::<u64>() % max_nanos) } else {
Duration::ZERO }) or explicitly check as_micros()/as_nanos() > 0 before taking
the modulus; apply the same pattern to the other three occurrences referenced
(functions in dht.rs, gossip/merge/message_overlap.rs, and
gossip/topic/publisher.rs) so modulus never receives zero.
In `@src/gossip/merge/message_overlap.rs`:
- Around line 138-177: peers_to_join may produce an empty pub_keys set after
filtering (all candidates are self or already neighbors), so avoid calling
self.gossip_sender.join_peers with an empty list; after computing pub_keys (from
peers_to_join, using EndpointId::from_bytes, filtering against self_pub_key and
active_neighbors from self.gossip_receiver.neighbors()), check
pub_keys.is_empty() and only call
self.gossip_sender.join_peers(pub_keys.iter().cloned().collect::<Vec<_>>(),
Some(self.max_join_peers)).await? when not empty to prevent sending an empty
join request.
- Around line 80-87: The jitter calculation can panic when self.max_jitter >
Duration::ZERO but self.max_jitter.as_micros() == 0 (sub-microsecond durations)
because you end up doing rand % 0; fix it by computing a safe integer bound
first (e.g., let max_micros = self.max_jitter.as_micros().max(1) as u64) and
then use that bound when generating the random value for jitter, convert the
sampled microseconds back into a Duration for jitter, then keep the rest of the
logic (next_interval = self.base_interval + jitter; tracing::debug!...;
self.ticker.reset_after(next_interval)) the same so that symbols like
self.max_jitter, jitter, next_interval, and self.ticker.reset_after remain
unchanged.
In `@src/gossip/topic/bootstrap.rs`:
- Around line 213-230: The code currently always includes record.pub_key() into
bootstrap_nodes, allowing the local node to be added to its own bootstrap
candidates; update the closure in the flat_map that builds bootstrap_nodes so it
only pushes record.pub_key() when it passes the same self-filter used for
active_peers (i.e., skip if equal to record_publisher.pub_key().as_bytes() or
the zeroed key), keeping the existing logic that also filters active_peers;
reference the bootstrap_nodes construction, record.pub_key(),
record_content.active_peers, and record_publisher.pub_key() and ensure
EndpointId::from_bytes(&pub_key).ok() still performs the final validation.
In `@src/gossip/topic/publisher.rs`:
- Around line 78-85: The jitter calculation can panic when self.max_jitter < 1
microsecond because as_micros() yields 0 and the rand::random::<u64>() % 0
causes a divide-by-zero; change the logic in the publisher jitter block (where
jitter is computed and next_interval set) to use a non-zero modulus by deriving
the jitter range from a safe value (e.g., compute max_micros =
self.max_jitter.as_micros().max(1) and use that for the modulo) or compute
jitter in a smaller unit (as_nanos) and convert back, then construct
Duration::from_micros or Duration::from_nanos accordingly before adding to
self.base_interval and calling self.ticker.reset_after(next_interval).
In `@src/gossip/topic/topic.rs`:
- Around line 462-501: The comments and expect() messages claim next()/joined()
should "return None" (or that WeakSender upgrade returns None) but the test
actually asserts result.is_err() for survivor.next(), survivor.joined(), and
late_clone.next()/joined(); reconcile by either changing the assertions to
expect None (if the API returns Option) or update the comments and all
expect()/assert messages to state that these methods return Err after shutdown
(if the API returns Result). Specifically audit the shutdown test around
topic.cancel_token(), receiver.clone(), survivor.next(), survivor.joined(), and
late_clone.clone() and update the inline comments and expect(...) strings to
accurately reflect the inner return type (Err vs None) or adjust the asserts to
match the documented contract.
- Around line 308-313: The else branch that logs "Topic: cancelled before
workers could be spawned" is entered because cancel_token.is_cancelled() is
true, so calling cancel_token.cancel() is redundant; remove the
cancel_token.cancel() call from the else branch (the block containing
tracing::warn! and Err(anyhow::anyhow!(...))) and keep just the warning and
error return (or alternatively invert the conditional to avoid the
cancellation/no-op), leaving cancel_token.cancel() only where an actual cancel
action is needed.
---
Outside diff comments:
In `@examples/chat_no_wait.rs`:
- Around line 49-60: The listener spawned with tokio::spawn currently uses while
let Ok(event) = gossip_receiver.next().await which silently stops on the first
Err; change the loop to explicitly match the result from
gossip_receiver.next().await (e.g., loop { match gossip_receiver.next().await {
Ok(event) => /* handle Event::Received and Event::NeighborUp as before using msg
and peer */ }, Err(e) => { log or eprintln the error (including e) so the user
sees why the listener ended, then break or decide to continue/retry } } ) so
errors are surfaced; apply the same pattern to other updated examples using
gossip_receiver and the Event enum.
In `@PROTOCOL.md`:
- Around line 398-403: The spec incorrectly labels encrypted_decryption_key as
"variable" even though the implementation enforces a fixed length; update the
PROTOCOL.md EncryptedRecord Serialization section to reflect that
encrypted_decryption_key is a fixed 88-byte field (match
EncryptedRecord::from_bytes and ENCRYPTED_KEY_LENGTH = 88) instead of using the
"variable:" prefix so the doc matches the code.
In `@src/crypto/record.rs`:
- Around line 261-275: The hardcoded next_seq_num = i64::MAX in RecordPublisher
(where you call self.dht.put_mutable(... next_seq_num)) effectively locks the
DHT slot and causes conflicts; either add a clear comment next to that constant
explaining why using MAX is intentional and acceptable for this design, or
replace it with a per-minute monotonically-increasing sequence derived from
record.unix_minute() (e.g., seq = unix_minute * 1000 + per-minute counter) and
persist/track the counter in RecordPublisher (e.g., a HashMap keyed by
(sign_key, salt, unix_minute) or an atomic counter reset per minute) so
put_mutable receives increasing sequence numbers and concurrent publishes within
the same minute succeed; ensure get_records() behavior remains compatible with
the new sequence scheme.
In `@src/gossip/merge/bubble.rs`:
- Around line 136-145: Convert the raw active_peer bytes to an EndpointId up
front inside the filter_map closure (use
iroh::EndpointId::from_bytes(&active_peer).ok()); if conversion fails return
None, then perform membership checks against neighbors.contains(&peer) and
compare peer to record.pub_key() and to self_pub_key (convert those to
EndpointId once and reuse) and also filter out the all-zero ID by comparing to
EndpointId from [0;32]; this makes the logic consistent and clearer in the
filter_map that currently references active_peer, neighbors, record.pub_key(),
self_pub_key, and iroh::EndpointId::from_bytes.
---
Duplicate comments:
In `@PROTOCOL.md`:
- Around line 149-153: Update the "DHT timeouts return empty record sets
(non-fatal)" bullet to reflect that DHT timeouts and related failures now
propagate errors to the caller (not silently return empty sets) — align this
bullet with the earlier statement about timeouts returning errors after retries
are exhausted and note that callers such as DhtActor::get() and
RecordPublisher::get_records() propagate those errors (via ?), so callers must
handle them.
- Around line 110-113: The "Time Window Selection" section is internally
contradictory: bullets describe two alternate branches (when
check_last_minute_record_first_on_startup is true use unix_minute-2 +
unix_minute-1; otherwise use unix_minute + unix_minute-1) but then claim "Both
unix_minute and unix_minute-1 records are always fetched." Update the text to
remove the contradiction by replacing the third bullet with a single clear
statement such as "Two adjacent minute slots are fetched; the pair is either
(unix_minute-2, unix_minute-1) on first startup when
check_last_minute_record_first_on_startup is enabled, or (unix_minute,
unix_minute-1) otherwise," and ensure the terms
check_last_minute_record_first_on_startup, unix_minute, unix_minute-1, and
unix_minute-2 are used to disambiguate the branches.
In `@src/config.rs`:
- Around line 786-797: The docstring for the getter
check_last_minute_record_first_on_startup still repeats `unix_minute-1` twice;
update the comment above the pub fn
check_last_minute_record_first_on_startup(&self) -> bool to mirror the builder
docs and read `unix_minute-1` and `unix_minute-2` (same wording as the builder
copy), leaving the rest of the explanation intact.
In `@src/dht.rs`:
- Around line 156-169: The current jitter calculation casts
max_retry_jitter().as_micros() to u64 causing a modulo-by-zero and truncation
issues; update the jitter logic in the put_mutable retry block (where variables
jitter and retry_interval are computed and used) to derive a non-zero bound in
nanoseconds and use rand::Rng::gen_range to produce a random number of
nanoseconds, e.g. compute let max_jitter_ns =
self.config.max_retry_jitter().as_nanos(); if max_jitter_ns > 0 { let n =
thread_rng().gen_range(0..max_jitter_ns as u64); jitter =
Duration::from_nanos(n) } else { jitter = Duration::ZERO } and then compute
retry_interval = self.config.base_retry_interval() + jitter so you avoid the
as_micros() cast and % 0 errors.
In `@src/gossip/topic/bootstrap.rs`:
- Around line 143-164: The code currently flips use_cached_next to false and
calls crate::unix_minute(-1) which causes the lookup window to skip the
current-minute records: adjust the logic in the bootstrap flow around
use_cached_next, unix_minute, last_published_unix_minute, and
bootstrap_config.check_last_minute_record_first_on_startup() so the "lookup
window" (records gathered via record_publisher.get_records(unix_minute - 1) and
get_records(unix_minute)) is independent from the minute used for
publishing/caching; always fetch current_records (call
record_publisher.get_records(unix_minute) regardless of the -1 startup path),
extend records with current_records.clone(), and when disabling the
cached-next-path still pass Some(current_records.clone()) as the cached
current-minute publishes (instead of skipping current-minute data) so the
first-pass lookup never omits the current-minute records referenced by functions
like build_publish_list or whatever uses use_cached_next.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 80033009-849d-45eb-8db5-4521b5ae72fa
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
.github/workflows/test.ymlARCHITECTURE.mdCargo.tomlPROTOCOL.mdREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
…t_minute_record_first_on_startup -> check_older_records_first_on_startup + publishing only to current unix_minute + docs fixes
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
examples/chat_no_wait.rs (1)
78-87:⚠️ Potential issue | 🟡 MinorCRLF handling differs from
examples/chat.rs.
examples/chat.rswas updated to trim both\rand\nviabuffer.trim_end_matches(&['\r', '\n'][..]), but this example still usesbuffer.clone().replace("\n", ""), which leaves a stray\rin the broadcast payload on Windows (CRLF input). Align with the other example for consistency.🛠️ Proposed fix
gossip_sender - .broadcast(buffer.clone().replace("\n", "").into()) + .broadcast( + buffer + .trim_end_matches(&['\r', '\n'][..]) + .as_bytes() + .to_vec(), + ) .await .unwrap();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/chat_no_wait.rs` around lines 78 - 87, The broadcast currently uses buffer.clone().replace("\n", "") which leaves a carriage return on CRLF input; update the loop to trim both '\r' and '\n' like the other example by calling buffer.trim_end_matches(&['\r', '\n'][..]) (or equivalent) before converting to into() for gossip_sender.broadcast, referencing the same buffer variable and the gossip_sender.broadcast call so the sent payload matches examples/chat.rs.PROTOCOL.md (1)
97-168:⚠️ Potential issue | 🟡 MinorUse consistent config flag name throughout PROTOCOL.md.
The canonical config flag in
BootstrapConfigischeck_older_records_first_on_startup, but PROTOCOL.md references it inconsistently:
- Line 97: Uses descriptive phrase "Check last minute record first on startup" instead of the flag name
- Line 111: Correctly uses
check_older_records_first_on_startup- Line 168: Uses shortened version
check_older_records_first(missing_on_startupsuffix)Update all references to match the canonical name.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROTOCOL.md` around lines 97 - 168, Replace all variants and descriptive phrases with the canonical BootstrapConfig flag name check_older_records_first_on_startup in PROTOCOL.md; specifically, change the descriptive "Check last minute record first on startup" and the shortened check_older_records_first to check_older_records_first_on_startup so every reference (including the sentence at the top of the Bootstrap Loop and the conditional in the Time Window Selection) uses the exact symbol BootstrapConfig.check_older_records_first_on_startup.src/gossip/merge/bubble.rs (1)
128-159:⚠️ Potential issue | 🔴 Critical
neighbors.contains(&active_peer)will not compile due to type mismatch.
neighborsisHashSet<EndpointId>(fromGossipReceiver::neighbors()), butactive_peeris[u8; 32]. According to iroh 0.97.0 documentation,EndpointIddoes not implementBorrow<[u8; 32]>— only standard blanketBorrow<T> for Timpls exist. The code elsewhere usesEndpointId::from_bytes(&active_peer)to constructEndpointIdinstances from[u8; 32]bytes, confirming the expected pattern.Line 138 should convert
active_peertoEndpointIdbefore the membership check, e.g.:.flat_map(|&active_peer| { if active_peer == [0; 32] || EndpointId::from_bytes(&active_peer).ok().map_or(false, |id| neighbors.contains(&id)) || active_peer == record.pub_key() || active_peer.eq(self_pub_key.as_bytes()) { None } else { EndpointId::from_bytes(&active_peer).ok() } })Alternatively, extract the
EndpointIdonce before the conditional checks to avoid repeated conversions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/merge/bubble.rs` around lines 128 - 159, The membership check fails because neighbors is a HashSet<EndpointId> while active_peer is [u8;32]; convert active_peer to an EndpointId once before checking membership and reuse it in the conditions inside the flat_map over records (the closure handling active_peer), e.g., attempt EndpointId::from_bytes(&active_peer).ok() into a local variable and use that for the neighbors.contains(...) test and for pushing into pub_keys; also ensure you still skip [0;32], skip when conversion fails, and keep the existing checks against record.pub_key() and self_pub_key when comparing bytes.src/crypto/record.rs (1)
324-338:⚠️ Potential issue | 🟠 MajorVerify
next_seq_num = i64::MAXis the intended design for cluster-wide publishing.Because
sign_keyandsaltare both deterministically derived fromtopic_id + unix_minute(lines 315-316), every cluster node writes to the same DHT target slotSHA-1(derived_pub_key || salt)for a given minute. Per BEP44 (and as confirmed in the previousNextDhtSeqthread), storage nodes reject a mutable put whoseseqis less than or equal to the stored seq when the value differs (ConcurrencyError::ConflictRisk/ error 302).With
next_seq_num = i64::MAX:
- The first cluster node to publish in a given minute-slot wins; its
Recordis stored atseq = i64::MAX.- Any subsequent publisher in the same minute-slot carries a different value (different
pub_key-signedRecord) atseq = i64::MAX→ rejected by the storage nodes.put_mutablesurfaces that rejection asErr, which is propagated through?at line 338 and logged viapublish_record_fire_and_forgetin bootstrap.Net effect: only one node's record survives per (topic, unix_minute). New peers bootstrapping off the DHT will still find that one record, which may be sufficient, but it's a notable departure from the previous per-slot seq tracking and from
publish_record's apparent "publish if undermax_bootstrap_records" intent (the check at line 306 is pre-put-local; global uniqueness is enforced by seq collision after the fact).If this is the intended "first-publisher-wins" design, consider a doc comment on
publish_record_cached_recordsexplaining the global single-winner semantics and thatmax_bootstrap_recordsis a local-observation cap rather than a global one. Otherwise, the previous discussion'sunix_minute-seeded counter would preserve multi-publisher semantics within a minute-slot.mainline rust crate put_mutable seq i64::MAX conflict behavior same target different value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crypto/record.rs` around lines 324 - 338, The current use of a constant next_seq_num = i64::MAX causes first-publisher-wins for the DHT slot (sign_key + salt); either document that global single-winner semantic on publish_record_cached_records (explaining max_bootstrap_records is local only) or restore multi-publisher behavior by replacing the constant with the prior unix_minute-seeded counter/NextDhtSeq logic: compute a per-(topic_id, unix_minute) sequence (e.g., call NextDhtSeq::next or a persisted counter keyed by topic_id+unix_minute) and pass that value to self.dht.put_mutable instead of i64::MAX so concurrent publishers can advance the seq instead of all attempting the max value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/config.rs`:
- Around line 139-150: The setters put_timeout and get_timeout currently accept
Duration::ZERO and should mirror the base_retry_interval guard; modify both
functions (put_timeout and get_timeout) to only assign self.config.put_timeout /
self.config.get_timeout when timeout > Duration::ZERO (otherwise leave the
existing value) so zero durations are ignored and do not cause immediate
timeouts.
In `@src/crypto/record.rs`:
- Around line 374-389: The loop currently does a full clone of records_iter;
since records_iter is a Vec<MutableItem> and not used after the loop, iterate by
value instead of cloning: replace `for item in records_iter.clone()` with `for
item in records_iter` (or `for item in records_iter.into_iter()`), leaving the
inner logic (EncryptedRecord::from_bytes, encrypted_record.decrypt,
record.verify, record.pub_key, dedubed_records) unchanged so items are moved
rather than cloned.
In `@src/gossip/mod.rs`:
- Around line 24-30: GossipRecordContent embeds fixed-size arrays whose lengths
are wire-format constants (MAX_RECORD_PEERS and MAX_MESSAGE_HASHES) that must
not be exposed as user-tunable public API; change their visibility or document
their invariance: either make MAX_RECORD_PEERS and MAX_MESSAGE_HASHES non-public
(pub(crate)) and move them into the gossip module alongside GossipRecordContent,
or keep them public but add clear documentation on the constants (adjacent to
the pub const re-exports) stating they are protocol-level constants that must be
identical across all peers and only change with a protocol version bump; update
references to the constants accordingly (GossipRecordContent, MAX_RECORD_PEERS,
MAX_MESSAGE_HASHES).
In `@src/gossip/topic/topic.rs`:
- Around line 106-119: The sync branch currently awaits
bootstrap.gossip_receiver().await?.joined().await? and can both proceed before
the actual bootstrap completion and return errors without cancelling; change it
to await the same bootstrap_done future used by the spawned task (the
bootstrap_done consumed by wait_for_bootstrap) instead of joined(), and ensure
any early error from gossip_receiver() or awaiting bootstrap_done triggers
cancel_token.cancel() before returning; update the sync branch where
spawn_workers is called (and where you currently call joined()) to mirror the
spawned task's logic (use bootstrap_done, call cancel_token.cancel() on any Err,
then return the error), referencing bootstrap_done, bootstrap.gossip_receiver(),
joined(), wait_for_bootstrap, spawn_workers, and cancel_token.cancel().
- Line 442: Fix the typo in the expect message: change the string passed to the
.expect call that currently reads "failed to bind edpoint" to "failed to bind
endpoint" so the error message is spelled correctly.
In `@src/lib.rs`:
- Around line 15-19: The public re-exports in src/lib.rs are missing the builder
types DhtConfigBuilder, BootstrapConfigBuilder, and MergeConfigBuilder; update
the pub use config::{ ... } list to include DhtConfigBuilder,
BootstrapConfigBuilder, and MergeConfigBuilder so callers can use
DhtConfig::builder(), BootstrapConfig::builder(), and MergeConfig::builder()
types (e.g., returning DhtConfigBuilder from functions) consistently with the
other builders like ConfigBuilder and PublisherConfigBuilder.
---
Outside diff comments:
In `@examples/chat_no_wait.rs`:
- Around line 78-87: The broadcast currently uses buffer.clone().replace("\n",
"") which leaves a carriage return on CRLF input; update the loop to trim both
'\r' and '\n' like the other example by calling buffer.trim_end_matches(&['\r',
'\n'][..]) (or equivalent) before converting to into() for
gossip_sender.broadcast, referencing the same buffer variable and the
gossip_sender.broadcast call so the sent payload matches examples/chat.rs.
In `@PROTOCOL.md`:
- Around line 97-168: Replace all variants and descriptive phrases with the
canonical BootstrapConfig flag name check_older_records_first_on_startup in
PROTOCOL.md; specifically, change the descriptive "Check last minute record
first on startup" and the shortened check_older_records_first to
check_older_records_first_on_startup so every reference (including the sentence
at the top of the Bootstrap Loop and the conditional in the Time Window
Selection) uses the exact symbol
BootstrapConfig.check_older_records_first_on_startup.
In `@src/crypto/record.rs`:
- Around line 324-338: The current use of a constant next_seq_num = i64::MAX
causes first-publisher-wins for the DHT slot (sign_key + salt); either document
that global single-winner semantic on publish_record_cached_records (explaining
max_bootstrap_records is local only) or restore multi-publisher behavior by
replacing the constant with the prior unix_minute-seeded counter/NextDhtSeq
logic: compute a per-(topic_id, unix_minute) sequence (e.g., call
NextDhtSeq::next or a persisted counter keyed by topic_id+unix_minute) and pass
that value to self.dht.put_mutable instead of i64::MAX so concurrent publishers
can advance the seq instead of all attempting the max value.
In `@src/gossip/merge/bubble.rs`:
- Around line 128-159: The membership check fails because neighbors is a
HashSet<EndpointId> while active_peer is [u8;32]; convert active_peer to an
EndpointId once before checking membership and reuse it in the conditions inside
the flat_map over records (the closure handling active_peer), e.g., attempt
EndpointId::from_bytes(&active_peer).ok() into a local variable and use that for
the neighbors.contains(...) test and for pushing into pub_keys; also ensure you
still skip [0;32], skip when conversion fails, and keep the existing checks
against record.pub_key() and self_pub_key when comparing bytes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ba07a0f3-57cf-4719-bae4-a06e5ebf03c9
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
.github/workflows/test.ymlARCHITECTURE.mdCargo.tomlPROTOCOL.mdREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/mod.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
examples/chat.rs (2)
72-78:⚠️ Potential issue | 🟡 MinorAvoid panicking on non-UTF-8 gossip payloads.
Gossip messages are raw bytes, so a peer can send invalid UTF-8 and terminate the listener task through
unwrap(). Use lossy decoding for this chat example.🛡️ Proposed fix
while let Ok(event) = gossip_receiver.next().await { if let Event::Received(msg) = event { + let content = msg.content.to_vec(); println!( "\nMessage from {}: {}", &msg.delivered_from.to_string()[0..8], - String::from_utf8(msg.content.to_vec()).unwrap() + String::from_utf8_lossy(&content) );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/chat.rs` around lines 72 - 78, The code currently calls String::from_utf8(msg.content.to_vec()).unwrap() inside the gossip_receiver loop (in the Event::Received branch), which can panic on non-UTF-8 payloads; replace the unwrap path with a lossy decode such as String::from_utf8_lossy(&msg.content) or std::str::from_utf8_lossy(&msg.content) so invalid bytes are replaced rather than crashing the listener task, updating the println to print the resulting Cow<str>.
89-104:⚠️ Potential issue | 🟡 MinorBreak the input loop on EOF instead of broadcasting forever.
read_linereturnsOk(0)at EOF. The current loop keeps running and broadcasts an empty message repeatedly when stdin closes.🛠️ Proposed fix
loop { print!("\n> "); - stdin.read_line(&mut buffer).unwrap(); + let bytes_read = stdin.read_line(&mut buffer)?; + if bytes_read == 0 { + break; + } gossip_sender .broadcast( buffer @@ println!(" - (sent)"); buffer.clear(); } + + Ok(()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/chat.rs` around lines 89 - 104, The input loop currently calls stdin.read_line(&mut buffer).unwrap() and will broadcast an empty payload forever at EOF; change it to capture the Result from stdin.read_line into a variable, check if the returned usize is 0 (EOF) and break the loop instead of calling gossip_sender.broadcast, and handle any Err by logging or breaking; specifically update the loop around stdin.read_line, buffer, and gossip_sender.broadcast so you only call gossip_sender.broadcast when read_line returns Ok(n) with n > 0 and avoid broadcasting empty messages.src/crypto/record.rs (1)
324-338: 🧹 Nitpick | 🔵 TrivialDocument the
next_seq_num = i64::MAXdesign choice.All cluster nodes derive the same
(sign_key, salt)from(topic_id, unix_minute)and therefore write to the same BEP44 mutable slot. Usingi64::MAXas the seq means:
- The first value a given DHT storage node accepts for this minute-slot is effectively pinned, because every subsequent put with seq ≤ current is rejected (and
seq == i64::MAXwith a differentvspecifically hitsConflictRisk/ error 302).- Different storage nodes can pin different cluster publishers, which is what lets
get_recordsreturn multiple distinct records even though they all target the same slot.- There is no path to ever update this slot within the same unix minute; the choice is intentional but irreversible.
This is a subtle, non-obvious protocol decision that interacts with the
max_bootstrap_recordsrate limit and the dedup-by-pub_key logic inget_records(). Please add a short inline comment here (and ideally mirror it inPROTOCOL.md) explaining whyi64::MAXis used and the consequence that cluster-wide only the first-arriving record at each storage node is retained per minute.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crypto/record.rs` around lines 324 - 338, Add an inline comment above the next_seq_num = i64::MAX assignment explaining the design: note that all nodes derive the same (sign_key, salt) from (topic_id, unix_minute) and thus target the same BEP44 mutable slot, that using i64::MAX pins the first value accepted by each DHT storage node for that minute (subsequent puts with seq ≤ current are rejected and seq==i64::MAX with different v causes ConflictRisk), that different storage nodes can therefore pin different publishers so get_records() may return multiple distinct records, and that this choice intentionally makes the slot immutable for that unix minute; also add the same explanation to PROTOCOL.md referencing next_seq_num, put_mutable, sign_key, salt, and get_records so readers understand the irreversible per-minute pinning behavior.
♻️ Duplicate comments (3)
README.md (1)
123-138:⚠️ Potential issue | 🟡 MinorClarify that
Vec<u8>conversion is fallible.Line 125 still lists
Vec<u8>alongside infallible conversions, which implies.into()works. If the implementation decodes bytes as UTF-8, document this asTryFrom<Vec<u8>>and show callers handling theResult.📚 Proposed wording fix
-`RecordTopic` has been removed. `TopicId` now serves as the unified topic identifier across the entire API and supports conversion from `&str`, `String`, `Vec<u8>`, and `FromStr`. +`RecordTopic` has been removed. `TopicId` now serves as the unified topic identifier across the entire API and supports conversion from `&str`, `String`, `FromStr`, and fallible `TryFrom<Vec<u8>>`.Run this read-only check to confirm the current conversion implementations:
#!/bin/bash # Verify TopicId conversion traits and whether Vec<u8> is fallible. rg -n -C3 'impl\s+(From|TryFrom)<Vec<u8>>\s+for\s+TopicId|impl\s+FromStr\s+for\s+TopicId|impl\s+From<&str>\s+for\s+TopicId|impl\s+From<String>\s+for\s+TopicId'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 123 - 138, The README incorrectly suggests Vec<u8> conversion into TopicId is infallible; update docs to state that conversion from Vec<u8> is fallible (implementing TryFrom<Vec<u8>> rather than From<Vec<u8>>), show usage handling the Result (e.g., TopicId::try_from(bytes)? or match), and adjust the examples list to list From<&str>, From<String>, FromStr (infallible/inferred) and TryFrom<Vec<u8>> (fallible) while referencing the TopicId type and its conversion trait names (TryFrom<Vec<u8>>, FromStr, From<String>, From<&str>) so readers know to handle possible errors.ARCHITECTURE.md (1)
112-124:⚠️ Potential issue | 🟡 MinorFix the bootstrap minute-selection pseudocode.
Line 123 sets
minute = -1, then Line 124 fetchesunix_minute(-1) - 1andunix_minute(-1), i.e. current-2 and previous, while Lines 112-113 say the flow checks current and previous. Make the pseudocode match the intended lookup order, and use the current config name consistently.📝 Proposed doc fix
-- First iteration: optionally also check previous unix minute (`check_last_minute_record_first_on_startup`). +- First iteration: optionally check older records first (`check_older_records_first_on_startup`). - Both `unix_minute` and `unix_minute - 1` records are always fetched. @@ - minute = first_attempt && check_last_minute_first ? -1 : 0 - recs = get_records(unix_minute(minute) - 1) + get_records(unix_minute(minute)) + minute = unix_minute(0) + recs = get_records(minute) + get_records(minute - 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ARCHITECTURE.md` around lines 112 - 124, The pseudocode for bootstrap minute selection miscomputes minute when first_attempt and check_last_minute_record_first_on_startup are true: it sets minute = -1 which causes calls to unix_minute(minute) - 1 and unix_minute(minute) to fetch current-2 and current-1 instead of current-1 and current; fix by computing minute as 0 normally and -1 only to indicate "look at previous minute first", then call get_records(unix_minute(minute)) and get_records(unix_minute(minute) - 1) (or swap call order) so that when minute == -1 you fetch previous and current-1/current as intended; update variable and function references in the pseudocode (minute, first_attempt, check_last_minute_record_first_on_startup, unix_minute, get_records) so wording and config name are consistent.src/gossip/topic/topic.rs (1)
84-119:⚠️ Potential issue | 🟡 MinorAwait
bootstrap_donein the synchronous path.The sync path still waits for
joined()while a spawned task consumesbootstrap_done. This can spawn workers before the bootstrap actor reports completion, and bootstrap failure can cancel the topic afterTopic::new(..., false)has already returnedOk.🔧 Suggested control-flow shape
- tokio::spawn({ - let api = Arc::downgrade(&api); - let config = config.clone(); - let cancel_token = cancel_token.clone(); - async move { - if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { - tracing::warn!("bootstrap failed: {}", err); - return; - } - - if async_bootstrap { - tracing::debug!("Bootstrap completed, now spawning workers"); - if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { - cancel_token.cancel(); - tracing::warn!("failed to spawn workers: {}", err); - } - } - } - }); - if !async_bootstrap { tracing::debug!("Topic: waiting for bootstrap to complete"); - bootstrap.gossip_receiver().await?.joined().await?; + wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await?; if let Err(err) = spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await { @@ tracing::debug!("Topic: bootstrap completed"); } else { + tokio::spawn({ + let api = Arc::downgrade(&api); + let config = config.clone(); + let cancel_token = cancel_token.clone(); + async move { + if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { + tracing::warn!("bootstrap failed: {}", err); + return; + } + + tracing::debug!("Bootstrap completed, now spawning workers"); + if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { + cancel_token.cancel(); + tracing::warn!("failed to spawn workers: {}", err); + } + } + }); tracing::debug!("Topic: bootstrap started asynchronously"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/topic.rs` around lines 84 - 119, The synchronous branch is not awaiting bootstrap_done (the result of bootstrap.bootstrap()) because the spawned task consumes it via wait_for_bootstrap, so workers may start before bootstrap actually completes; fix by awaiting wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await in the sync path instead of relying on bootstrap.gossip_receiver().await?.joined().await?, then only call spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await after wait_for_bootstrap returns Ok, and propagate errors the same way the async branch does (cancel_token.cancel() and return Err) to ensure bootstrap failures are handled before Topic::new returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/chat_no_wait.rs`:
- Around line 80-88: The call to gossip_sender.broadcast(...).await.unwrap()
will panic on transient send errors; change error handling in the send loop
around gossip_sender.broadcast so it doesn't crash: replace the .unwrap() with
proper handling (either propagate the error with ? from the surrounding async
function or handle the Err case by logging the error and continuing the loop),
keeping the existing trimming logic
(buffer.trim_end_matches(...).as_bytes().to_vec()) and referencing the
gossip_sender.broadcast call and the buffer trimming expression when making the
change.
In `@PROTOCOL.md`:
- Around line 115-120: The protocol doc currently hard-codes fetching
`unix_minute - 1` and `unix_minute` in the Record Discovery section and the
mermaid label, which contradicts the earlier description and the implementation
that uses `check_older_records_first_on_startup` / `unix_minute_offset`; update
the text and diagram so the `get_records()` calls use the selected pair
determined by that flag (i.e., call for `unix_minute - 2` and `unix_minute - 1`
when `check_older_records_first_on_startup` is true, otherwise `unix_minute - 1`
and `unix_minute`), and mirror the same conditional wording in the mermaid label
so it matches the `unix_minute_offset` logic used in bootstrap.rs and the
`get_records()` behavior.
- Line 97: Replace the human-readable flag name with the actual config field
name used in code: change the description to reference
check_older_records_first_on_startup (the field on BootstrapConfig) and keep the
default/description text intact so operators see the real config key to use when
configuring startup behavior.
In `@src/gossip/merge/mod.rs`:
- Around line 4-6: Update the "Message Overlap" doc comment in
src/gossip/merge/mod.rs to clarify the detection mechanism: replace the line
that currently reads "Detects when isolated clusters have no common message
hashes" with wording that explains the node compares recent DHT-advertised
message hashes from other peers against its own locally seen hashes (e.g.,
"Detects when recent DHT‑advertised message hashes from other peers don't
overlap with locally seen hashes, indicating a likely partition"), so readers
understand the check performed by the Message Overlap logic in the Bubble Merge
documentation.
In `@src/gossip/topic/bootstrap.rs`:
- Around line 378-391: The detached spawn in publish_record_fire_and_forget lets
publish_record_cached_records run after shutdown; modify the function to accept
a CancellationToken (or a reference to the actor's cancel_token) and inside the
spawned task wrap the await in a tokio::select! that returns early when
cancel_token.cancelled() fires (or alternatively push tasks into a
JoinSet/TaskTracker owned by the actor and abort them on shutdown);
specifically, update publish_record_fire_and_forget signature to take a cancel
token, and inside the tokio::spawn use tokio::select! { _ =
cancel_token.cancelled() => {}, res =
record_publisher.publish_record_cached_records(record, cached_records) => if let
Err(err) = res { tracing::warn!(...) } } so the publish is aborted when
self.cancel_token is cancelled.
In `@src/gossip/topic/topic.rs`:
- Around line 179-218: The pattern matches for the tuple variants are using
struct syntax; change the checks that match Enabled { .. } to tuple-pattern
Enabled(_) for PublisherConfig, BubbleMergeConfig, and MessageOverlapMergeConfig
(i.e., update matches!(config.publisher_config(), PublisherConfig::Enabled { ..
}) and the similar matches! calls for config.merge_config().bubble_merge() and
config.merge_config().message_overlap_merge() to use Enabled(_)); retain the
existing logic that calls api.upgrade() and invokes actor.start_publishing() and
actor.start_bubble_merge() on the upgraded api.
---
Outside diff comments:
In `@examples/chat.rs`:
- Around line 72-78: The code currently calls
String::from_utf8(msg.content.to_vec()).unwrap() inside the gossip_receiver loop
(in the Event::Received branch), which can panic on non-UTF-8 payloads; replace
the unwrap path with a lossy decode such as
String::from_utf8_lossy(&msg.content) or std::str::from_utf8_lossy(&msg.content)
so invalid bytes are replaced rather than crashing the listener task, updating
the println to print the resulting Cow<str>.
- Around line 89-104: The input loop currently calls stdin.read_line(&mut
buffer).unwrap() and will broadcast an empty payload forever at EOF; change it
to capture the Result from stdin.read_line into a variable, check if the
returned usize is 0 (EOF) and break the loop instead of calling
gossip_sender.broadcast, and handle any Err by logging or breaking; specifically
update the loop around stdin.read_line, buffer, and gossip_sender.broadcast so
you only call gossip_sender.broadcast when read_line returns Ok(n) with n > 0
and avoid broadcasting empty messages.
In `@src/crypto/record.rs`:
- Around line 324-338: Add an inline comment above the next_seq_num = i64::MAX
assignment explaining the design: note that all nodes derive the same (sign_key,
salt) from (topic_id, unix_minute) and thus target the same BEP44 mutable slot,
that using i64::MAX pins the first value accepted by each DHT storage node for
that minute (subsequent puts with seq ≤ current are rejected and seq==i64::MAX
with different v causes ConflictRisk), that different storage nodes can
therefore pin different publishers so get_records() may return multiple distinct
records, and that this choice intentionally makes the slot immutable for that
unix minute; also add the same explanation to PROTOCOL.md referencing
next_seq_num, put_mutable, sign_key, salt, and get_records so readers understand
the irreversible per-minute pinning behavior.
---
Duplicate comments:
In `@ARCHITECTURE.md`:
- Around line 112-124: The pseudocode for bootstrap minute selection miscomputes
minute when first_attempt and check_last_minute_record_first_on_startup are
true: it sets minute = -1 which causes calls to unix_minute(minute) - 1 and
unix_minute(minute) to fetch current-2 and current-1 instead of current-1 and
current; fix by computing minute as 0 normally and -1 only to indicate "look at
previous minute first", then call get_records(unix_minute(minute)) and
get_records(unix_minute(minute) - 1) (or swap call order) so that when minute ==
-1 you fetch previous and current-1/current as intended; update variable and
function references in the pseudocode (minute, first_attempt,
check_last_minute_record_first_on_startup, unix_minute, get_records) so wording
and config name are consistent.
In `@README.md`:
- Around line 123-138: The README incorrectly suggests Vec<u8> conversion into
TopicId is infallible; update docs to state that conversion from Vec<u8> is
fallible (implementing TryFrom<Vec<u8>> rather than From<Vec<u8>>), show usage
handling the Result (e.g., TopicId::try_from(bytes)? or match), and adjust the
examples list to list From<&str>, From<String>, FromStr (infallible/inferred)
and TryFrom<Vec<u8>> (fallible) while referencing the TopicId type and its
conversion trait names (TryFrom<Vec<u8>>, FromStr, From<String>, From<&str>) so
readers know to handle possible errors.
In `@src/gossip/topic/topic.rs`:
- Around line 84-119: The synchronous branch is not awaiting bootstrap_done (the
result of bootstrap.bootstrap()) because the spawned task consumes it via
wait_for_bootstrap, so workers may start before bootstrap actually completes;
fix by awaiting wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await
in the sync path instead of relying on
bootstrap.gossip_receiver().await?.joined().await?, then only call
spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await after
wait_for_bootstrap returns Ok, and propagate errors the same way the async
branch does (cancel_token.cancel() and return Err) to ensure bootstrap failures
are handled before Topic::new returns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 431023fb-d228-42f2-9334-30000318827e
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
.github/workflows/test.ymlARCHITECTURE.mdCargo.tomlPROTOCOL.mdREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/mod.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/crypto/keys.rs (1)
139-148:⚠️ Potential issue | 🟡 MinorProtocol-level incompatibility is intentional — call it out in release notes.
Prepending
b"salt"to thesalt()SHA-512 input changes the DHT slot derivation, so 0.3 nodes won't discover or overwrite records stored by 0.2 nodes on the same topic. This is consistent with the major version bump and is mentioned implicitly in the upgrade guide, but you may want an explicit "DHT wire incompatibility" bullet in the 0.2→0.3 migration section so mixed-version deployments aren't surprised.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crypto/keys.rs` around lines 139 - 148, The salt() function now prepends the constant b"salt" to the SHA-512 input (salt(topic_id: &TopicId, unix_minute: u64)), which intentionally changes DHT slot derivation and makes 0.3 nodes incompatible with 0.2; add an explicit "DHT wire incompatibility" bullet to the 0.2→0.3 migration/release notes (and changelog/upgrade guide) stating that records derived by salt() will not be discovered/overwritten across versions and advise operators about mixed-version deployments and required migration/restart steps.ARCHITECTURE.md (1)
111-124:⚠️ Potential issue | 🟡 MinorClarify the flag-dependent bootstrap record window.
Line 113 says current and previous records are always fetched, but the pseudocode fetches
current-2/current-1when older-first startup is enabled. Document the selected adjacent pair explicitly.📝 Proposed doc fix
-- First iteration: optionally check older records first (`check_older_records_first_on_startup`). -- Both `unix_minute` and `unix_minute - 1` records are always fetched. +- Each iteration fetches two adjacent record windows. +- On the first iteration, `check_older_records_first_on_startup` selects + `unix_minute - 2` and `unix_minute - 1`; otherwise the selected pair is + `unix_minute - 1` and `unix_minute`. ... - minute = first_attempt && check_last_minute_first ? -1 : 0 - recs = get_records(unix_minute(minute) - 1) + get_records(unix_minute(minute)) + offset = first_attempt && check_older_records_first_on_startup ? 1 : 0 + recs = get_records(unix_minute(0) - offset - 1) + get_records(unix_minute(0) - offset)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ARCHITECTURE.md` around lines 111 - 124, Clarify that when check_older_records_first_on_startup (the flag) is true the bootstrap window shifts one minute earlier: on the first attempt the code sets minute = -1 so get_records is called for unix_minute(minute) - 1 and unix_minute(minute) (i.e., the pair current-2 and current-1), otherwise it fetches the pair current-1 and current (use the symbols minute, unix_minute(...) and get_records(...) in the text), and update the prose to explicitly state which adjacent-minute pair is fetched in each branch so the doc matches the pseudocode.
♻️ Duplicate comments (7)
tests/tests.rs (1)
383-386: 🧹 Nitpick | 🔵 TrivialAvoid formatting the panic message on the success path.
expect(&format!(...))allocates even when the timeout/join succeeds. Useunwrap_or_elseso formatting only happens on failure.♻️ Proposed cleanup
let received = tokio::time::timeout(std::time::Duration::from_secs(60), handle) .await - .expect(&format!("receiver {i} timed out")) - .expect(&format!("receiver {i} panicked")); + .unwrap_or_else(|_| panic!("receiver {i} timed out")) + .unwrap_or_else(|_| panic!("receiver {i} panicked"));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/tests.rs` around lines 383 - 386, The current code uses expect(&format!(...)) after awaiting tokio::time::timeout(..., handle) which allocates the formatted panic string on the success path; change the call sequence so that the two expect calls are replaced with unwrap_or_else (or map_err/unwrap_or_else) to defer formatting until an error occurs — specifically update the handling around tokio::time::timeout(...) and the join result from handle so that you call unwrap_or_else or similar on the Result to format the "receiver {i} timed out" and "receiver {i} panicked" messages only in the error branches, referencing the existing variables/functions tokio::time::timeout, handle, received, and the current expect invocations.src/gossip/merge/bubble.rs (1)
89-96:⚠️ Potential issue | 🟡 MinorKeep the jitter divisor non-zero after narrowing.
as_nanos()isu128;as u64can truncate large configured durations, including to0, which makesrand % boundpanic. Clamp/check the narrowed bound before taking the modulo.🛡️ Proposed fix
- let jitter = if self.max_jitter > Duration::ZERO { - Duration::from_nanos(rand::random::<u64>() % self.max_jitter.as_nanos() as u64) - } else { + let max_jitter_nanos = + self.max_jitter.as_nanos().min(u64::MAX as u128) as u64; + let jitter = if max_jitter_nanos == 0 { Duration::ZERO + } else { + Duration::from_nanos(rand::random::<u64>() % max_jitter_nanos) };Verify the same narrowing pattern elsewhere:
#!/bin/bash # Description: Find jitter calculations that narrow Duration::as_nanos() before modulo. rg -n -C3 --type rust 'rand::random::<u64>\(\)\s*%\s*[^;\n]*\.as_nanos\(\)\s*as\s+u64'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/merge/bubble.rs` around lines 89 - 96, The jitter calculation can truncate self.max_jitter.as_nanos() (u128 -> u64) to zero and cause a modulo-by-zero panic; update the jitter logic in the BubbleMerge code (the block referencing self.max_jitter, self.base_interval, next_interval, and self.ticker.reset_after) to ensure the modulus bound is never zero by either clamping the narrowed value to at least 1 before using it in rand::random::<u64>() % bound or by switching to a safe API (e.g. generate a u128 random and reduce, or use a ranged generator) so the divisor is non-zero; apply the same fix to any other occurrences found by the suggested ripgrep pattern.PROTOCOL.md (2)
17-19:⚠️ Potential issue | 🟡 MinorSpecify the minute byte encoding in key/salt derivation.
The implementation hashes the little-endian minute bytes;
topic_hash + unix_minuteis ambiguous in a protocol spec. Spell outunix_minute.to_le_bytes()for signing, encryption input, and salt derivation.📝 Proposed doc fix
- - Derive signing keypair: `keypair_seed = SHA512(topic_hash + unix_minute)[..32]` - - Derive encryption keypair: `enc_keypair_seed = secret_rotation_function.get_unix_minute_secret(topic_hash, unix_minute, initial_secret_hash)` - - Calculate salt: `salt = SHA512("salt" + topic_hash + unix_minute)[..32]` + - Derive signing keypair: `keypair_seed = SHA512(topic_hash + unix_minute.to_le_bytes())[..32]` + - Derive encryption keypair: `enc_keypair_seed = secret_rotation_function.get_unix_minute_secret(topic_hash, unix_minute.to_le_bytes(), initial_secret_hash)` + - Calculate salt: `salt = SHA512("salt" + topic_hash + unix_minute.to_le_bytes())[..32]` ... -- Derivation: `signing_keypair_seed = SHA512(topic_hash + unix_minute)[..32]` +- Derivation: `signing_keypair_seed = SHA512(topic_hash + unix_minute.to_le_bytes())[..32]` ... -- Derivation: `encryption_keypair_seed = secret_rotation_function.get_unix_minute_secret(topic_hash, unix_minute, initial_secret_hash)` +- Derivation: `encryption_keypair_seed = secret_rotation_function.get_unix_minute_secret(topic_hash, unix_minute.to_le_bytes(), initial_secret_hash)` ... -- Derivation: `salt = SHA512("salt" + topic_hash + unix_minute)[..32]` +- Derivation: `salt = SHA512("salt" + topic_hash + unix_minute.to_le_bytes())[..32]`Also applies to: 343-357
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROTOCOL.md` around lines 17 - 19, The protocol currently ambiguously states concatenation like `topic_hash + unix_minute` for key/salt derivation; update the spec text for `keypair_seed = SHA512(topic_hash + unix_minute)[..32]`, `enc_keypair_seed = secret_rotation_function.get_unix_minute_secret(topic_hash, unix_minute, initial_secret_hash)`, and `salt = SHA512("salt" + topic_hash + unix_minute)[..32]` to explicitly state that the minute is encoded as little-endian bytes (e.g. `unix_minute.to_le_bytes()`) before concatenation (i.e. `topic_hash || unix_minute.to_le_bytes()` and `"salt" || topic_hash || unix_minute.to_le_bytes()`), and apply the same clarification to the corresponding lines 343-357.
110-120:⚠️ Potential issue | 🟡 MinorKeep record discovery aligned with the selected minute pair.
Lines 111-112 describe two possible fetch windows, but line 116 and the diagram still hard-code
unix_minute - 1plusunix_minute. Use the pair selected by the older-first flag.📝 Proposed doc fix
- - Call `get_records()` for both `unix_minute - 1` and `unix_minute`: + - Call `get_records()` for the selected pair: + - older-first startup: `unix_minute - 2` and `unix_minute - 1` + - default: `unix_minute - 1` and `unix_minute` ... - G --> I[Get Records for unix_minute-1 AND unix_minute] - H --> I + G --> I[Get Records for unix_minute-2 AND unix_minute-1] + H --> I2[Get Records for unix_minute-1 AND unix_minute]Also applies to: 168-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PROTOCOL.md` around lines 110 - 120, The protocol text hard-codes the minute pair as unix_minute-1 and unix_minute in the Record Discovery steps, which contradicts the Time Window Selection option check_older_records_first_on_startup; update the wording so get_records() iterates over the selected minute pair (either [unix_minute-1, unix_minute-2] when check_older_records_first_on_startup is enabled on first attempt, or [unix_minute, unix_minute-1] otherwise) and make it explicit that keypair_seed = SHA512(topic_hash + <selected_minute>)[..32], the salt = SHA512("salt" + topic_hash + <selected_minute>)[..32], and the DHT call get_mutable(signing_pubkey, salt) (with 10s timeout) are performed for each minute in that chosen pair; apply the same correction to the later Record Discovery paragraph as well.src/gossip/topic/bootstrap.rs (1)
225-244:⚠️ Potential issue | 🟡 MinorApply the self-filter to
record.pub_key()too.The active peer path excludes the local publisher key, but
record.pub_key()is added unconditionally. Keep the guard here instead of relying on upstream filtering, so an own DHT record cannot become a self-join candidate.🛡️ Proposed fix
let bootstrap_nodes = records .iter() .flat_map(|record| { - // records are already filtered by self entry - let mut v = vec![record.pub_key()]; + let self_pub_key = *record_publisher.pub_key().as_bytes(); + let mut v = if record.pub_key() != [0; 32] + && record.pub_key() != self_pub_key + { + vec![record.pub_key()] + } else { + Vec::new() + }; if let Ok(record_content) = record.content::<GossipRecordContent>() { for peer in record_content.active_peers { if peer != [0; 32] - && !peer.eq(record_publisher.pub_key().as_bytes()) + && peer != self_pub_key { v.push(peer); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/bootstrap.rs` around lines 225 - 244, When building bootstrap_nodes in the records.iter().flat_map block, avoid unconditionally adding record.pub_key(); only push record.pub_key() into v if it is not the local publisher key (same check used for active_peers, e.g. compare record.pub_key() to record_publisher.pub_key().as_bytes()). Update the initial v population in the flat_map (and use the same byte-equality guard used for active_peers) so an own DHT record cannot be included as a self-join candidate before the EndpointId::from_bytes conversion.src/dht.rs (1)
156-169:⚠️ Potential issue | 🟡 MinorAvoid modulo with a truncated jitter bound.
max_retry_jitter().as_nanos() as u64can truncate largeDurations; if it narrows to0, the modulo panics. Clamp/check the bound before using it.🛡️ Proposed fix
- let jitter = if self.config.max_retry_jitter() > Duration::ZERO { - Duration::from_nanos(rand::random::<u64>() % self.config.max_retry_jitter().as_nanos() as u64) - } else { + let max_jitter_nanos = self + .config + .max_retry_jitter() + .as_nanos() + .min(u64::MAX as u128) as u64; + let jitter = if max_jitter_nanos == 0 { Duration::ZERO + } else { + Duration::from_nanos(rand::random::<u64>() % max_jitter_nanos) };Verify there are no other retry/jitter modulo sites with the same narrowing:
#!/bin/bash # Description: Find jitter calculations that narrow Duration::as_nanos() before modulo. rg -n -C3 --type rust 'rand::random::<u64>\(\)\s*%\s*[^;\n]*\.as_nanos\(\)\s*as\s+u64'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dht.rs` around lines 156 - 169, The jitter calculation in the retry loop (involving self.config.max_retry_jitter(), jitter, and retry_interval used in the put_mutable retry) performs a modulo against self.config.max_retry_jitter().as_nanos() as u64 which can truncate large Durations or become zero and panic; fix by computing a safe non-zero bound before doing the modulo (e.g., check/clamp max_retry_jitter() to a reasonable max, convert to a wide integer type or use rand::Rng::gen_range(0..bound) / Uniform to avoid narrowing, and only perform the random modulo when bound > 0), update the jitter assignment and related logging in the put_mutable retry block (variables: jitter, retry_interval) and scan for other occurrences of rand::random::<u64>() % ... .as_nanos() as u64 to apply the same safe-bound pattern.src/gossip/topic/topic.rs (1)
84-115:⚠️ Potential issue | 🟡 MinorAwait
bootstrap_donein the sync path instead ofjoined().The sync path currently spawns a task to consume
bootstrap_done, then waits onjoined()separately. That can start workers before the bootstrap completion contract resolves, and error cleanup is split across two paths.🔧 Proposed control-flow fix
let bootstrap_done = bootstrap.bootstrap().await?; let config = record_publisher.config().clone(); - tokio::spawn({ - let api = Arc::downgrade(&api); - let config = config.clone(); - let cancel_token = cancel_token.clone(); - async move { - if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { - tracing::warn!("bootstrap failed: {}", err); - return; - } - - if async_bootstrap { - tracing::debug!("Bootstrap completed, now spawning workers"); - if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { - cancel_token.cancel(); - tracing::warn!("failed to spawn workers: {}", err); - } - } - } - }); - if !async_bootstrap { + if async_bootstrap { + tokio::spawn({ + let api = Arc::downgrade(&api); + let config = config.clone(); + let cancel_token = cancel_token.clone(); + async move { + if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { + tracing::warn!("bootstrap failed: {}", err); + return; + } + + tracing::debug!("Bootstrap completed, now spawning workers"); + if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { + cancel_token.cancel(); + tracing::warn!("failed to spawn workers: {}", err); + } + } + }); + tracing::debug!("Topic: bootstrap started asynchronously"); + } else { tracing::debug!("Topic: waiting for bootstrap to complete"); - bootstrap.gossip_receiver().await?.joined().await?; + wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await?; if let Err(err) = spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await { tracing::warn!("failed to spawn workers: {}", err); cancel_token.cancel(); return Err(anyhow::anyhow!("failed to spawn workers: {}", err)); } tracing::debug!("Topic: bootstrap completed"); - } else { - tracing::debug!("Topic: bootstrap started asynchronously"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/topic.rs` around lines 84 - 115, The sync path should wait for the same bootstrap completion future/logic used by the spawned task instead of calling gossip_receiver().joined(); replace the wait in the !async_bootstrap branch with awaiting wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await (or otherwise awaiting the bootstrap_done via the same helper used in the spawned task), handle Err the same way (log, cancel_token.cancel(), and return Err), and then call spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await only after successful bootstrap; remove the separate bootstrap.gossip_receiver().await?.joined().await? usage so error cleanup and control flow are unified (referencing bootstrap_done, wait_for_bootstrap, spawn_workers, and cancel_token).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/full_config.rs`:
- Around line 15-74: Add a unit test that fails if the handwritten
config_builder() drifts from the canonical defaults by asserting
config_builder() == Config::default(); create a #[cfg(test)] mod with a #[test]
function (e.g., fn defaults_match_config_builder()) that calls
assert_eq!(config_builder(), Config::default()) so CI will catch any future
changes to Config::default() (ensure the test is placed where Config and
config_builder() are in scope).
In `@src/config.rs`:
- Around line 933-941: The builder method max_join_peer_count currently accepts
NonZeroU32 (causing awkward caller ergonomics and a u32→usize cast); change its
signature to take a plain usize (pub fn max_join_peer_count(mut self, max_peers:
usize) -> Self) and clamp/validate inside (e.g. ensure at least 1) before
assigning to self.config.max_join_peer_count to match the rest of the builder
APIs like max_bootstrap_records and retries; alternatively, if you prefer
type-level non-zero enforcement, switch to NonZeroUsize across the related "min
1" setters (e.g. max_bootstrap_records, min_neighbors) to avoid casting, but do
not leave this single setter as NonZeroU32 with a cast.
- Around line 954-967: ConfigBuilder::build currently silently overrides
PublisherConfig::Enabled(_) to PublisherConfig::Disabled when
BootstrapConfig.max_bootstrap_records == 0 and only emits a tracing::warn, which
can be missed; either make this a hard error by changing ConfigBuilder::build to
return Result<Config, ConfigError> and return an appropriate error type when
max_bootstrap_records == 0 and publisher is Enabled, or keep the soft-override
but add a unit test that exercises ConfigBuilder::build to assert that when
BootstrapConfig.max_bootstrap_records == 0 and publisher was
PublisherConfig::Enabled(_), the returned Config.publisher_config is
PublisherConfig::Disabled (and the test can also capture/log that a warning was
emitted), referencing ConfigBuilder::build,
BootstrapConfig.max_bootstrap_records, and PublisherConfig::Enabled/Disabled to
locate the logic to change or cover with tests.
In `@src/crypto/record.rs`:
- Around line 291-355: publish_record_cached_records currently propagates errors
from self.get_records(...) via .await? which aborts publishing on transient DHT
read failures; change it to treat a failed get as "no records observed" by
replacing the direct .await? with a match (or .await.map_or_else) that on
Ok(records) uses them and on Err(e) logs the error (trace/warn) and uses
HashSet::new() so publishing proceeds; update the code paths around
self.get_records(record.unix_minute(), cancel_token.clone()) in
publish_record_cached_records to implement this fallback (and optionally add a
short doc comment on publish_record_cached_records noting it treats read
failures as empty results).
- Around line 317-341: The code sets next_seq_num = i64::MAX before calling
self.dht.put_mutable(...), which pins the BEP-44 sequence and makes the DHT slot
write-once for that unix_minute; replace this with a real monotonic per-call
sequence value (not i64::MAX). Implement a counter (e.g., a struct field like
publish_seq_counter or a map keyed by record.unix_minute) that is seeded
sufficiently high on startup and incremented on each publish attempt for that
(topic_id, unix_minute) so each put_mutable(sign_key, Some(salt), ..., seq) uses
an increasing seq; ensure the code that constructs MutableItem/put_mutable
(referenced via put_mutable and MutableItem::new in src/dht.rs) consumes this
incremented seq rather than the constant i64::MAX and persists or seeds the
counter to be restart-safe.
In `@src/gossip/receiver.rs`:
- Around line 157-168: The current joined() collapses any error from
self.is_joined().await into ChannelError::Closed; change the error propagation
so upstream errors are preserved — either add a
ChannelError::Internal(anyhow::Error) variant and map the anyhow error into
that, or change joined() to return anyhow::Result and propagate the original
error from is_joined(); update the mapping at the is_joined() call (inside
joined()) to return the new Internal variant (or use ? to bubble up) instead of
ChannelError::Closed. Also simplify the match arm for recv() to make the binding
clearer by replacing Some(event) => Ok(event) (where event is ()) with an
explicit Some(()) => Ok(()) pattern. Ensure references to joined(), is_joined(),
join_channel_receiver.recv(), and ChannelError are updated accordingly.
- Around line 211-218: The code in receiver.rs uses `if let Ok(lmh) =
hash.finalize()[..32].try_into()` which is dead because converting a 32-byte
slice to `[u8; 32]` cannot fail; replace the conditional with a direct
conversion that surfaces errors (e.g., using `try_into().expect(...)` or an
explicit unwrap) so failures don't get silently ignored, keeping the logic that
pops from `self.last_message_hashes` when `self.last_message_hashes.len() ==
MAX_MESSAGE_HASHES` and then pushes the resulting `lmh` onto
`self.last_message_hashes`.
- Around line 191-237: The Event::Lagged arm is only logged but still falls
through and gets forwarded via next_channel_sender.send(Some(event)), so change
the handling in the match over iroh_gossip::api::Event to suppress forwarding
for Lagged: inside the iroh_gossip::api::Event::Lagged branch (and/or the
enclosing Some(Ok(ref event)) branch) return early or continue the outer loop
(so you do not reach event.clone() and the final
self.next_channel_sender.send(Some(event)).ok()), ensuring
NeighborUp/NeighborDown/Received still push to join_channel_sender and get
forwarded but Lagged remains internal to the GossipReceiver actor.
In `@src/gossip/topic/publisher.rs`:
- Around line 33-60: Publisher::new currently returns Result<Self> but contains
no fallible operations; change it to a non-fallible constructor and remove the
Result wrapper (i.e. make the signature pub fn new(...) -> Self and return Self
{ _api: api }) and eliminate the Ok(...) wrapping; additionally, address the
silent clamp of base_interval: either move the
base_interval.max(Duration::from_secs(1)) logic into the config builders
(BubbleMergeConfigBuilder::base_interval and
PublisherConfigBuilder::base_interval) so docs and behavior match, or keep the
clamp here but add a tracing::warn! when base_interval < 1s to surface the
override; update references to Publisher::new and any call sites accordingly.
- Around line 100-116: The silent use of filter_map(... .try_into().ok()) when
building active_peers and last_message_hashes can hide size mismatches; replace
the .filter_map(...try_into().ok()) calls in the active_peers and
last_message_hashes construction so they unwrap with explicit panics—e.g., map
the result and call expect("EndpointId is 32 bytes") for the neighbors
conversion and expect("SHA-512 truncated hash is 32 bytes") for the
last_message_hashes conversion—so any unexpected length/type change in
gossip_receiver.neighbors() or gossip_receiver.last_message_hashes() fails
loudly; keep the rest of the chaining (.iter(), .take(MAX_...),
.collect::<Vec<_>>()) the same and reference the symbols gossip_receiver,
neighbors(), last_message_hashes(), EndpointId::as_bytes(), and try_into() to
locate the changes.
---
Outside diff comments:
In `@ARCHITECTURE.md`:
- Around line 111-124: Clarify that when check_older_records_first_on_startup
(the flag) is true the bootstrap window shifts one minute earlier: on the first
attempt the code sets minute = -1 so get_records is called for
unix_minute(minute) - 1 and unix_minute(minute) (i.e., the pair current-2 and
current-1), otherwise it fetches the pair current-1 and current (use the symbols
minute, unix_minute(...) and get_records(...) in the text), and update the prose
to explicitly state which adjacent-minute pair is fetched in each branch so the
doc matches the pseudocode.
In `@src/crypto/keys.rs`:
- Around line 139-148: The salt() function now prepends the constant b"salt" to
the SHA-512 input (salt(topic_id: &TopicId, unix_minute: u64)), which
intentionally changes DHT slot derivation and makes 0.3 nodes incompatible with
0.2; add an explicit "DHT wire incompatibility" bullet to the 0.2→0.3
migration/release notes (and changelog/upgrade guide) stating that records
derived by salt() will not be discovered/overwritten across versions and advise
operators about mixed-version deployments and required migration/restart steps.
---
Duplicate comments:
In `@PROTOCOL.md`:
- Around line 17-19: The protocol currently ambiguously states concatenation
like `topic_hash + unix_minute` for key/salt derivation; update the spec text
for `keypair_seed = SHA512(topic_hash + unix_minute)[..32]`, `enc_keypair_seed =
secret_rotation_function.get_unix_minute_secret(topic_hash, unix_minute,
initial_secret_hash)`, and `salt = SHA512("salt" + topic_hash +
unix_minute)[..32]` to explicitly state that the minute is encoded as
little-endian bytes (e.g. `unix_minute.to_le_bytes()`) before concatenation
(i.e. `topic_hash || unix_minute.to_le_bytes()` and `"salt" || topic_hash ||
unix_minute.to_le_bytes()`), and apply the same clarification to the
corresponding lines 343-357.
- Around line 110-120: The protocol text hard-codes the minute pair as
unix_minute-1 and unix_minute in the Record Discovery steps, which contradicts
the Time Window Selection option check_older_records_first_on_startup; update
the wording so get_records() iterates over the selected minute pair (either
[unix_minute-1, unix_minute-2] when check_older_records_first_on_startup is
enabled on first attempt, or [unix_minute, unix_minute-1] otherwise) and make it
explicit that keypair_seed = SHA512(topic_hash + <selected_minute>)[..32], the
salt = SHA512("salt" + topic_hash + <selected_minute>)[..32], and the DHT call
get_mutable(signing_pubkey, salt) (with 10s timeout) are performed for each
minute in that chosen pair; apply the same correction to the later Record
Discovery paragraph as well.
In `@src/dht.rs`:
- Around line 156-169: The jitter calculation in the retry loop (involving
self.config.max_retry_jitter(), jitter, and retry_interval used in the
put_mutable retry) performs a modulo against
self.config.max_retry_jitter().as_nanos() as u64 which can truncate large
Durations or become zero and panic; fix by computing a safe non-zero bound
before doing the modulo (e.g., check/clamp max_retry_jitter() to a reasonable
max, convert to a wide integer type or use rand::Rng::gen_range(0..bound) /
Uniform to avoid narrowing, and only perform the random modulo when bound > 0),
update the jitter assignment and related logging in the put_mutable retry block
(variables: jitter, retry_interval) and scan for other occurrences of
rand::random::<u64>() % ... .as_nanos() as u64 to apply the same safe-bound
pattern.
In `@src/gossip/merge/bubble.rs`:
- Around line 89-96: The jitter calculation can truncate
self.max_jitter.as_nanos() (u128 -> u64) to zero and cause a modulo-by-zero
panic; update the jitter logic in the BubbleMerge code (the block referencing
self.max_jitter, self.base_interval, next_interval, and self.ticker.reset_after)
to ensure the modulus bound is never zero by either clamping the narrowed value
to at least 1 before using it in rand::random::<u64>() % bound or by switching
to a safe API (e.g. generate a u128 random and reduce, or use a ranged
generator) so the divisor is non-zero; apply the same fix to any other
occurrences found by the suggested ripgrep pattern.
In `@src/gossip/topic/bootstrap.rs`:
- Around line 225-244: When building bootstrap_nodes in the
records.iter().flat_map block, avoid unconditionally adding record.pub_key();
only push record.pub_key() into v if it is not the local publisher key (same
check used for active_peers, e.g. compare record.pub_key() to
record_publisher.pub_key().as_bytes()). Update the initial v population in the
flat_map (and use the same byte-equality guard used for active_peers) so an own
DHT record cannot be included as a self-join candidate before the
EndpointId::from_bytes conversion.
In `@src/gossip/topic/topic.rs`:
- Around line 84-115: The sync path should wait for the same bootstrap
completion future/logic used by the spawned task instead of calling
gossip_receiver().joined(); replace the wait in the !async_bootstrap branch with
awaiting wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await (or
otherwise awaiting the bootstrap_done via the same helper used in the spawned
task), handle Err the same way (log, cancel_token.cancel(), and return Err), and
then call spawn_workers(Arc::downgrade(&api), config,
cancel_token.clone()).await only after successful bootstrap; remove the separate
bootstrap.gossip_receiver().await?.joined().await? usage so error cleanup and
control flow are unified (referencing bootstrap_done, wait_for_bootstrap,
spawn_workers, and cancel_token).
In `@tests/tests.rs`:
- Around line 383-386: The current code uses expect(&format!(...)) after
awaiting tokio::time::timeout(..., handle) which allocates the formatted panic
string on the success path; change the call sequence so that the two expect
calls are replaced with unwrap_or_else (or map_err/unwrap_or_else) to defer
formatting until an error occurs — specifically update the handling around
tokio::time::timeout(...) and the join result from handle so that you call
unwrap_or_else or similar on the Result to format the "receiver {i} timed out"
and "receiver {i} panicked" messages only in the error branches, referencing the
existing variables/functions tokio::time::timeout, handle, received, and the
current expect invocations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a211bb06-8ee6-4f4f-acd5-dc124ee18e05
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
.github/workflows/test.ymlARCHITECTURE.mdCargo.tomlPROTOCOL.mdREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/mod.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/secret_rotation.rs (1)
59-81: 🧹 Nitpick | 🔵 TrivialHandle the
Laggederror from broadcast backpressure explicitly.The
while let Ok(event) = gossip_receiver.next().awaitpattern correctly exits on all errors, but silently treatsChannelError::Laggedthe same asChannelError::Closed. Since the broadcast receiver can lag under backpressure, consider matching on the error to log or surface theLaggedvariant before terminating—this helps distinguish backpressure issues from normal closure during development. Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/secret_rotation.rs` around lines 59 - 81, The gossip receiver loop in the tokio::spawn closure currently treats all receiver errors the same; change the loop that uses gossip_receiver.next().await inside the spawned task to explicitly match on the Result and handle Err variants: when Err(ChannelError::Lagged) (from the broadcast receiver) is returned, log or warn with the lag details (including the error) and continue or decide to recover, and for other errors (e.g., Closed) log and break/exit the loop; keep the existing handling of Ok(Event::Received) and Ok(Event::NeighborUp). Target the closure that spawns with tokio::spawn and the gossip_receiver variable/next() call and match on ChannelError::Lagged specifically to surface backpressure vs normal closure.src/gossip/merge/bubble.rs (1)
143-152: 🧹 Nitpick | 🔵 TrivialConsider converting
active_peertoEndpointIdonce for consistency.While
neighbors.contains(&active_peer)works due toEndpointIdimplementingBorrow<[u8; 32]>, convertingactive_peertoEndpointIdonce and reusing it throughout the closure improves clarity and reduces reliance on implicit borrow coercions.Suggested refactoring
- .filter_map(|&active_peer| { - if active_peer == [0; 32] - || neighbors.contains(&active_peer) - || active_peer == record.pub_key() - || active_peer.eq(self_pub_key.as_bytes()) - { - None - } else { - iroh::EndpointId::from_bytes(&active_peer).ok() - } - }) + .filter_map(|&active_peer| { + if active_peer == [0; 32] + || active_peer == record.pub_key() + || active_peer.eq(self_pub_key.as_bytes()) + { + return None; + } + + let endpoint = iroh::EndpointId::from_bytes(&active_peer).ok()?; + if neighbors.contains(&endpoint) { + None + } else { + Some(endpoint) + } + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/merge/bubble.rs` around lines 143 - 152, Convert the raw active_peer byte array to an iroh::EndpointId once at the start of the closure (using iroh::EndpointId::from_bytes(&active_peer) and returning None if that fails or if active_peer == [0; 32]), then reuse that EndpointId for all subsequent checks instead of mixing byte-array comparisons and borrow coercions; update the conditions that currently use neighbors.contains(&active_peer), active_peer == record.pub_key(), and active_peer.eq(self_pub_key.as_bytes()) to use the newly created EndpointId (or its .as_bytes() where appropriate) so the closure consistently works with a single EndpointId value and avoids implicit borrow coercion.
♻️ Duplicate comments (1)
src/gossip/topic/topic.rs (1)
84-110:⚠️ Potential issue | 🟡 MinorAwait
bootstrap_donein the synchronous path.The sync path still waits on
joined()while a spawned task consumesbootstrap_done. That can start workers before the bootstrap actor reports completion, and earlyjoined()errors bypass the explicit cancellation path.Refactor so
async_bootstrap == falseawaitswait_for_bootstrap(bootstrap_done, ...)directly, then starts workers; only spawn the coordinator for the async path.🔧 Suggested control flow
- tokio::spawn({ - let api = Arc::downgrade(&api); - let config = config.clone(); - let cancel_token = cancel_token.clone(); - async move { - if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { - tracing::warn!("bootstrap failed: {}", err); - return; - } - - if async_bootstrap { - tracing::debug!("Bootstrap completed, now spawning workers"); - if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { - cancel_token.cancel(); - tracing::warn!("failed to spawn workers: {}", err); - } - } - } - }); - if !async_bootstrap { tracing::debug!("Topic: waiting for bootstrap to complete"); - bootstrap.gossip_receiver().await?.joined().await?; + wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await?; if let Err(err) = spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await { tracing::warn!("failed to spawn workers: {}", err); cancel_token.cancel(); return Err(anyhow::anyhow!("failed to spawn workers: {}", err)); } tracing::debug!("Topic: bootstrap completed"); } else { + tokio::spawn({ + let api = Arc::downgrade(&api); + let config = config.clone(); + let cancel_token = cancel_token.clone(); + async move { + if let Err(err) = wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await { + tracing::warn!("bootstrap failed: {}", err); + return; + } + + tracing::debug!("Bootstrap completed, now spawning workers"); + if let Err(err) = spawn_workers(api, config, cancel_token.clone()).await { + cancel_token.cancel(); + tracing::warn!("failed to spawn workers: {}", err); + } + } + }); tracing::debug!("Topic: bootstrap started asynchronously"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gossip/topic/topic.rs` around lines 84 - 110, The synchronous path currently loses ownership of bootstrap_done to the spawned task so joined() can race with bootstrap completion; fix by only spawning the task when async_bootstrap is true and by awaiting wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await in the async_bootstrap == false branch before calling spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await; ensure the spawned task uses bootstrap_done only for the async path and that error handling/cancellation (cancel_token.cancel()) remains the same when spawn_workers returns Err.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ARCHITECTURE.md`:
- Line 179: The docs use the parameter name max_join_peer_count but the code
(BubbleMerge::new and the BubbleMergeActor struct) uses max_join_peers; update
them to match by choosing one canonical name and making both the ARCHITECTURE.md
entry (and any other doc occurrences) and the code signatures/field names
consistent — e.g., rename the doc entry to max_join_peers or refactor the code
to max_join_peer_count and update BubbleMerge::new and the BubbleMergeActor
field accordingly, plus any tests or usages that reference the old name.
- Around line 123-124: The pseudocode and docs currently state that both
unix_minute and unix_minute - 1 are always fetched but the implementation uses a
conditional window; update the pseudocode around the minute calculation
(variables minute, first_attempt) and the conditional flag
(check_older_records_first_on_startup / check_last_minute_first) so it
explicitly shows both cases: when check_older_records_first_on_startup is true
fetch get_records(unix_minute(minute) - 2) and get_records(unix_minute(minute) -
1) (i.e., current-2 and current-1), and when false fetch
get_records(unix_minute(minute) - 1) and get_records(unix_minute(minute)) (i.e.,
current-1 and current); ensure unix_minute and get_records are referenced
exactly and update the invariant text to match this conditional behavior.
In `@src/config.rs`:
- Around line 57-86: The three setters on TimeoutConfigBuilder
(join_peer_timeout, broadcast_timeout, broadcast_neighbors_timeout) currently
accept Duration::ZERO which will cause tokio::time::timeout-wrapped operations
to immediately elapse; update each setter to guard against Duration::ZERO (same
no-op pattern used by DhtConfigBuilder's put_timeout/get_timeout) by returning
self unchanged when timeout == Duration::ZERO, otherwise set the field and
return self, ensuring callers that pass zero don't overwrite sensible defaults.
---
Outside diff comments:
In `@examples/secret_rotation.rs`:
- Around line 59-81: The gossip receiver loop in the tokio::spawn closure
currently treats all receiver errors the same; change the loop that uses
gossip_receiver.next().await inside the spawned task to explicitly match on the
Result and handle Err variants: when Err(ChannelError::Lagged) (from the
broadcast receiver) is returned, log or warn with the lag details (including the
error) and continue or decide to recover, and for other errors (e.g., Closed)
log and break/exit the loop; keep the existing handling of Ok(Event::Received)
and Ok(Event::NeighborUp). Target the closure that spawns with tokio::spawn and
the gossip_receiver variable/next() call and match on ChannelError::Lagged
specifically to surface backpressure vs normal closure.
In `@src/gossip/merge/bubble.rs`:
- Around line 143-152: Convert the raw active_peer byte array to an
iroh::EndpointId once at the start of the closure (using
iroh::EndpointId::from_bytes(&active_peer) and returning None if that fails or
if active_peer == [0; 32]), then reuse that EndpointId for all subsequent checks
instead of mixing byte-array comparisons and borrow coercions; update the
conditions that currently use neighbors.contains(&active_peer), active_peer ==
record.pub_key(), and active_peer.eq(self_pub_key.as_bytes()) to use the newly
created EndpointId (or its .as_bytes() where appropriate) so the closure
consistently works with a single EndpointId value and avoids implicit borrow
coercion.
---
Duplicate comments:
In `@src/gossip/topic/topic.rs`:
- Around line 84-110: The synchronous path currently loses ownership of
bootstrap_done to the spawned task so joined() can race with bootstrap
completion; fix by only spawning the task when async_bootstrap is true and by
awaiting wait_for_bootstrap(bootstrap_done, cancel_token.clone()).await in the
async_bootstrap == false branch before calling
spawn_workers(Arc::downgrade(&api), config, cancel_token.clone()).await; ensure
the spawned task uses bootstrap_done only for the async path and that error
handling/cancellation (cancel_token.cancel()) remains the same when
spawn_workers returns Err.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 63949b19-e50e-45ea-af73-37328b31267b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
.github/workflows/test.ymlARCHITECTURE.mdCargo.tomlPROTOCOL.mdREADME.mdexamples/chat.rsexamples/chat_no_wait.rsexamples/e2e_test.rsexamples/full_config.rsexamples/secret_rotation.rsexamples/simple.rsexamples/without_mergers.rssrc/config.rssrc/crypto/keys.rssrc/crypto/mod.rssrc/crypto/record.rssrc/dht.rssrc/gossip/merge/bubble.rssrc/gossip/merge/message_overlap.rssrc/gossip/merge/mod.rssrc/gossip/mod.rssrc/gossip/receiver.rssrc/gossip/sender.rssrc/gossip/topic/bootstrap.rssrc/gossip/topic/mod.rssrc/gossip/topic/publisher.rssrc/gossip/topic/topic.rssrc/lib.rstests/test_gossip.rstests/tests.rs
💤 Files with no reviewable changes (1)
- tests/test_gossip.rs
| minute = first_attempt && check_last_minute_first ? -1 : 0 | ||
| recs = get_records(unix_minute(minute) - 1) + get_records(unix_minute(minute)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP --type=rust -C5 'check_older_records_first_on_startup|unix_minute'Repository: rustonbsd/distributed-topic-tracker
Length of output: 50393
🏁 Script executed:
find . -name "ARCHITECTURE.md" -o -name "architecture.md" | head -20Repository: rustonbsd/distributed-topic-tracker
Length of output: 96
🏁 Script executed:
head -n 160 ARCHITECTURE.md | tail -n +100Repository: rustonbsd/distributed-topic-tracker
Length of output: 1717
🏁 Script executed:
sed -n '110,130p' ARCHITECTURE.mdRepository: rustonbsd/distributed-topic-tracker
Length of output: 651
🏁 Script executed:
sed -n '113,125p' ARCHITECTURE.mdRepository: rustonbsd/distributed-topic-tracker
Length of output: 428
🏁 Script executed:
sed -n '1,130p' ARCHITECTURE.md | tail -n 40Repository: rustonbsd/distributed-topic-tracker
Length of output: 1344
🏁 Script executed:
sed -n '150,175p' src/gossip/topic/bootstrap.rsRepository: rustonbsd/distributed-topic-tracker
Length of output: 1285
Update pseudocode and documentation to clarify conditional fetch windows based on check_older_records_first_on_startup.
The documented invariant "Both unix_minute and unix_minute - 1 records are always fetched" contradicts the actual behavior. When check_older_records_first_on_startup is true, the implementation fetches current-2 and current-1; when false, it fetches current-1 and current. The pseudocode should reflect this conditional logic explicitly, or the documentation should clarify that the fetch window changes based on configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ARCHITECTURE.md` around lines 123 - 124, The pseudocode and docs currently
state that both unix_minute and unix_minute - 1 are always fetched but the
implementation uses a conditional window; update the pseudocode around the
minute calculation (variables minute, first_attempt) and the conditional flag
(check_older_records_first_on_startup / check_last_minute_first) so it
explicitly shows both cases: when check_older_records_first_on_startup is true
fetch get_records(unix_minute(minute) - 2) and get_records(unix_minute(minute) -
1) (i.e., current-2 and current-1), and when false fetch
get_records(unix_minute(minute) - 1) and get_records(unix_minute(minute)) (i.e.,
current-1 and current); ensure unix_minute and get_records are referenced
exactly and update the invariant text to match this conditional behavior.
| - Extract peer ids from discovered records. | ||
| - Exclude zeros, self, current neighbors. | ||
| - Join up to MAX_JOIN_PEERS_COUNT. | ||
| - Join up to max_join_peer_count (default 4). |
There was a problem hiding this comment.
Documented parameter name doesn't match the code.
The docs call the bubble-merge join cap max_join_peer_count, but the actual field/parameter in src/gossip/merge/bubble.rs is max_join_peers (see the BubbleMerge::new signature and BubbleMergeActor struct). Readers trying to map the ARCHITECTURE.md parameters to Config/constructor fields will get confused. Please unify the name.
📝 Proposed fix
-- Join up to max_join_peer_count (default 4).
+- Join up to `max_join_peers` (default 4).
@@
-- `max_join_peer_count` (default 4)
+- `max_join_peers` (default 4)Also applies to: 273-280
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ARCHITECTURE.md` at line 179, The docs use the parameter name
max_join_peer_count but the code (BubbleMerge::new and the BubbleMergeActor
struct) uses max_join_peers; update them to match by choosing one canonical name
and making both the ARCHITECTURE.md entry (and any other doc occurrences) and
the code signatures/field names consistent — e.g., rename the doc entry to
max_join_peers or refactor the code to max_join_peer_count and update
BubbleMerge::new and the BubbleMergeActor field accordingly, plus any tests or
usages that reference the old name.
| impl TimeoutConfigBuilder { | ||
| /// How long to wait when joining a peer. | ||
| /// | ||
| /// Default: 5s. | ||
| pub fn join_peer_timeout(mut self, timeout: Duration) -> Self { | ||
| self.timeouts.join_peer_timeout = timeout; | ||
| self | ||
| } | ||
|
|
||
| /// How long to wait when broadcasting messages. | ||
| /// | ||
| /// Default: 5s. | ||
| pub fn broadcast_timeout(mut self, timeout: Duration) -> Self { | ||
| self.timeouts.broadcast_timeout = timeout; | ||
| self | ||
| } | ||
|
|
||
| /// How long to wait when broadcasting to neighbors. | ||
| /// | ||
| /// Default: 5s. | ||
| pub fn broadcast_neighbors_timeout(mut self, timeout: Duration) -> Self { | ||
| self.timeouts.broadcast_neighbors_timeout = timeout; | ||
| self | ||
| } | ||
|
|
||
| /// Build the `TimeoutConfig`. | ||
| pub fn build(self) -> TimeoutConfig { | ||
| self.timeouts | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP -C3 '\b(timeout|timeouts|timeout_config)\b.*\b(join_peer_timeout|broadcast_timeout|broadcast_neighbors_timeout)\b' --type=rust
echo '---'
rg -nP -C5 'tokio::time::timeout\s*\(' src/gossip/sender.rsRepository: rustonbsd/distributed-topic-tracker
Length of output: 4067
Add zero-duration guards to TimeoutConfigBuilder setters for consistency and to prevent silent timeout failures.
The join_peer_timeout, broadcast_timeout, and broadcast_neighbors_timeout setters unconditionally accept Duration::ZERO, unlike DhtConfigBuilder's put_timeout and get_timeout which guard against it. These values are passed directly to tokio::time::timeout() in src/gossip/sender.rs (lines 47, 66, 94), so a zero duration will cause every wrapped operation to elapse immediately and fail silently.
Apply the same no-op pattern on Duration::ZERO used by the DHT builder:
Proposed fix
pub fn join_peer_timeout(mut self, timeout: Duration) -> Self {
+ if timeout > Duration::ZERO {
self.timeouts.join_peer_timeout = timeout;
+ }
self
}
pub fn broadcast_timeout(mut self, timeout: Duration) -> Self {
+ if timeout > Duration::ZERO {
self.timeouts.broadcast_timeout = timeout;
+ }
self
}
pub fn broadcast_neighbors_timeout(mut self, timeout: Duration) -> Self {
+ if timeout > Duration::ZERO {
self.timeouts.broadcast_neighbors_timeout = timeout;
+ }
self
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config.rs` around lines 57 - 86, The three setters on
TimeoutConfigBuilder (join_peer_timeout, broadcast_timeout,
broadcast_neighbors_timeout) currently accept Duration::ZERO which will cause
tokio::time::timeout-wrapped operations to immediately elapse; update each
setter to guard against Duration::ZERO (same no-op pattern used by
DhtConfigBuilder's put_timeout/get_timeout) by returning self unchanged when
timeout == Duration::ZERO, otherwise set the field and return self, ensuring
callers that pass zero don't overwrite sensible defaults.
Summary by CodeRabbit
Chores
New Features
Refactor
Bug Fixes
Tests
Documentation