ServiceConnect v7: clean-architecture rewrite — DI-first, async-first, STJ-on-the-wire, operator-grade telemetry#67
Open
twatson83 wants to merge 1514 commits into
Open
ServiceConnect v7: clean-architecture rewrite — DI-first, async-first, STJ-on-the-wire, operator-grade telemetry#67twatson83 wants to merge 1514 commits into
twatson83 wants to merge 1514 commits into
Conversation
Previously only handlerType.BaseType was inspected; multi-level subclass hierarchies (e.g. Concrete : Abstract : Aggregator<T>) were silently dropped from the registry — no error, no log, handler never fired. Walk the chain to typeof(object) so any depth of subclassing resolves the closed-generic base correctly.
IBus and BusHostedService permanently latch _stopped after Stop, and a subsequent Start throws. Pin the behaviour with a test and document the contract on the interface and the hosted-service xmldoc so container/orchestrator paths reusing the instance get a clear signal.
The xmldoc remarks for IBus.StartConsumingAsync state the stopped flag is latched permanently after StopConsumingAsync. The hosted-service pinning test uses a mock to simulate that throw; this test asserts the actual Bus implementation itself surfaces the InvalidOperationException so a refactor that removed the latch would not slip through.
Publish and Send truncate the destination first and append the operation suffix; Consume concatenated first then truncated, so " process" got sliced off when destination was at MaxTagValueLength. Match the Publish/Send order so the operation suffix is always preserved on the DisplayName.
The remarks block previously claimed the framework invokes ConfigureMapper twice per delivery. The mapper instance is built once in ProcessManagerProcessor and reused for both the initial saga lookup and the post-handler persistence find. The purity advice remains correct but the framing is wrong.
CLAUDE.md prohibits ticket/phase/task identifiers and 'before/after the fix' framing in source comments. Preserve the technical content; strip the meta-references. Affects 13 files across production and test code.
Two leftover doc gaps surfaced by the final full review: 1. The telemetry reference page still documented ServiceConnectActivitySource.Shutdown() as a public API and instructed callers to invoke it from ALC unloading code. The method became internal in the follow-up bundle (test-only reset helper). Remove the entry and the corresponding line from the public-surface stub. 2. The observability log-event table claimed "TopologyRecoveryEnabled = false; the application owns topology" — accurate for the original v-major bundle but reversed by the follow-up. Update the ConnectionRecovered row to describe the current behaviour: library auto-recovery replays the topology on the new channel.
…ckKey SagaLockKey.Equals(KeyValue, other.KeyValue) falls back to object.Equals when the mapped correlation property is a byte[] or reference type that doesn't override Equals(object). Two deliveries of the same business key from different deserialisation passes produce different boxed instances, miss the per-key SemaphoreSlim, and run handlers concurrently — defeating the per-saga serialisation the lock is meant to provide. Validate at first dispatch per (DataType, MessageType) pair, cache the verdict, and throw InvalidOperationException naming the offending type when the key falls back to reference equality. Registry-build-time validation isn't viable because ConfigureMapper is invoked per-delivery with the live handler instance and may legally read per-instance state.
…e, MessageType) The mapper expression is Expression<Func<TMessage, object>>, so the runtime type of the resolved value can legitimately vary across messages of the same MessageType (e.g. an IFoo-typed property returning Foo1 then Foo2). Keying the cache on (DataType, MessageType) pinned the first-seen verdict for all subsequent values — either silently allowing a later reference-equality type through, or rejecting a later value-equal type. The verdict is purely a function of the value's runtime type; key the cache by valueType alone. Tighten the surrounding catch so the validation's InvalidOperationException no longer needs a rethrow stair to escape the mapping-misconfiguration catch. Backfill the missing edge cases: enum, nullable value type, TimeSpan, generic reference type, and a class that inherits Equals from its base.
Bus.RouteAsync stamped HeaderKeys.RoutingSlipHopsCompleted into the header dictionary BEFORE the send middleware ran. ISendMessageMiddleware has full mutation access to context.Headers, so a buggy or hostile middleware could reset the counter to 0/1 and silently disable the cross-service amplification cap (MaxRoutingSlipHops) that the per-hop validator relies on. Carry the outbound count on SendContext.RoutingSlipHopsCompleted (init-only) and stamp it into headers inside OutboundHeaderBuilder, after middleware runs. Add RoutingSlipHopsCompleted to OutboundHeaderBuilder.OverwrittenHeaderKeys so caller-supplied entries are dropped with a warning like the other reserved framework headers. Thread the value from SendContext through SendMessagePipeline's terminal function via a new IProducer.SendAsync(string, Type, ReadOnlyMemory<byte>, int?, ...) DIM overload, which Producer overrides to forward routingSlipHopsCompleted into BuildHeaders. Third-party IProducer implementations fall back to the existing 4-arg SendAsync path via the DIM, matching the same pattern used for the routing-key overload.
The default-method overload of IProducer.SendAsync(..., int? routingSlipHopsCompleted, ...) forwarded to the 4-arg path and discarded the hop counter. Third-party transports that hadn't recompiled against the new overload would silently drop the cross-service amplification cap — the first-party RabbitMQ Producer overrides the DIM, but the safety net was missing for everyone else. Inject HeaderKeys.RoutingSlipHopsCompleted into a copy of the caller headers when the DIM body runs and a hop counter is supplied, so third-party transports preserve the wire stamp even without recompiling. Pin the cross-pipeline invariant with an integration test: a SendMessagePipeline running an ISendMessageMiddleware that mutates Headers[RoutingSlipHopsCompleted] results in the framework value (not the middleware's value) reaching the producer. Also bump StampedHeaderCount to 12 (RoutingSlipHopsCompleted is conditional but can push past 11), and reword SendContext.RoutingSlipHopsCompleted xmldoc to accurately describe the middleware contract instead of overstating the invariant.
StreamProcessor.InvokeHandlerAsync caught every OperationCanceledException and rethrew. An OCE bound to the user's own linked CTS (a handler that times out an internal subcall) would propagate up with that handler-CT attached; downstream metrics gate cancelled-classification on cancellationToken.IsCancellationRequested, so an unrelated OCE escaping a stream handler produced a misleading 'error.type=cancelled' tag on the consume span — a graceful-shutdown false alarm in dashboards. Mirror the HandlerProcessor fix: split the catch into a caller-CT branch (ThrowIfCancellationRequested with the caller's token) and an unrelated-OCE branch that logs, clears the dispatch flag, and rethrows with the caller's CT so the dispatch pipeline's when-filter evaluates correctly and classifies it as a handler error.
…anion The original Task-3 test asserted only Assert.NotEqual(unrelatedCts.Token, ex.CancellationToken) — the negative form of the contract. Tighten to pin the positive invariants: the escaping OCE carries the caller's CT, the inner exception is the original handler-thrown OCE with the unrelated CT, and the caller CT is not in a cancelled state on the no-cancel path. Add a companion test for the first-catch branch (caller IS cancelled + handler throws OCE with an unrelated token): the escaping OCE must carry the caller's token, not the handler's.
… window StopAsync awaited bus.StopConsumingAsync(cancellationToken) without enforcing ITransportConfiguration.GracefulShutdownTimeoutMilliseconds. A non-cooperative transport (third-party IConsumer, or a RabbitMQ host with a wedged channel) could block host shutdown until the host's outer kill-CT fired, leaving the grace window silently ignored at the host boundary. Race the stop against Task.Delay(GracefulShutdownTimeoutMs, cancellationToken); if the grace window elapses first, cancel the linked CTS, log a warning, and return so the host can continue shutting down.
…xhaustion The grace-bound StopAsync logged "did not complete within GracefulShutdownTimeoutMilliseconds" even when the real reason for the cancellation was the host's outer CT firing (operator Ctrl+C, container kill, host shutdown grace expired). Production diagnostics would attribute every outer-CT-driven shutdown to grace exhaustion incorrectly. Check cancellationToken.IsCancellationRequested inside the grace-task-winner branch; if the outer CT fired, fall through to await stopTask so the OCE propagates naturally without the misleading warning. Strengthen the original regression test to capture the linked CTS state, and add companion tests for outer-CT cancellation, GracefulShutdownTimeoutMilliseconds=0, and a cooperative consumer finishing inside the grace window.
DisposeAsync caught only OperationCanceledException and TimeoutException. CloseAsync awaits IProducer.SendBytesAsync for the close packet, which can throw AlreadyClosedException / BrokerUnreachableException / channel-shutdown exceptions; those leaked out of 'await using' and defeated the documented best-effort dispose contract. Broaden the catch to Exception and swallow. The broker has no state we need to release on a close-packet failure, and the receiver's eviction sweep reclaims the stream after StreamTimeout — there is no observability cost to swallowing at dispose time.
…rface + emit Activity event The Task 5 broaden-the-catch left two gaps: callers reading the IMessageBusWriteStream interface couldn't see the documented best-effort dispose contract, and a swallowed transport exception left no production diagnostic signal at all. Add an interface-level DisposeAsync override with xmldoc spelling out the contract (no transport/timeout/cancellation surfaces; receiver eviction sweep reclaims orphaned read state). In the implementation catch, emit an Activity event with exception type, message, and SequenceId so OTel exporters surface the close failure for operators. The Activity.Current null-check keeps the path zero-overhead when no tracing is configured. Also clean the dead producer/stream scaffolding in the new regression test.
…potent services.Insert(0, ServiceDescriptor.Singleton<IHostedService, MongoDbProcessManagerIndexInitializer>()) is position-deliberate — the initializer must run BEFORE BusHostedService to close the cold-start race — but non-idempotent. Two feature modules each calling UseMongoDbPersistence within one AddServiceConnect produced two initializer instances racing the same index-creation work. Guard the insert with a single-existence check on ImplementationType. TryAddEnumerable would append rather than position-0 insert and defeat the pre-BusHostedService ordering, so an explicit Any-check is the right shape.
…ctionary view The property returned the live Dictionary<,> typed as IReadOnlyDictionary; a caller resolving ITransportConfiguration from DI could downcast to Dictionary<,> and mutate after Freeze(), defeating SetClientSetting's ThrowIfFrozen() check. Wrap with a cached ReadOnlyDictionary<,> view, mirroring PipelineConfiguration's pattern. The view is live so SetClientSetting writes still surface to readers — only the downcast escape hatch is closed.
…s TLS SslConfigurationBuilder passed any non-zero AcceptablePolicyErrors straight through to RabbitMQ.Client with no observability. A dev-time workaround for a self-signed cert (RemoteCertificateChainErrors | NameMismatch) would silently weaken TLS in any environment the configuration was deployed to, and a custom CertificateValidationCallback that returns true would bypass validation entirely — neither produced a log signal. Thread the existing connection-factory logger into BuildSslOptions; warn once per build when AcceptablePolicyErrors != None or a custom validation callback is set, naming the bits/state so operators can audit.
OutgoingEventArgs was a public, non-abstract, non-sealed class with two sealed derivations (PublishEventArgs, SendEventArgs). A third-party subclass would have been broken by any future minor version that added a required member to the base — a foot-gun on a public surface that we lock for the v-major release. Make the base abstract with a protected ctor so consumers can only construct the two derivations. v7 is an explicit major-version break, so the API change is in-scope. Update the one direct-construction test to use SendEventArgs (which exercises the same Headers null-check path).
…havior
RabbitMqMessagingSystemAttributes split transport.Host on ',' OR ';'.
The RabbitMQ transport (Connection.cs / ProducerConnection.cs) splits on
',' only — a Host value "rabbit1;rabbit2" is one literal hostname for the
connection factory (which fails DNS), but telemetry reported
server.address=rabbit1. The span attribute lied about which broker the
client was addressing.
Drop the semicolon branch so telemetry matches transport. The
InlineData("rabbit1;rabbit2", "rabbit1") test was pinning the wrong
contract; update it to reflect the corrected behavior.
…en superseded by recovery
…o-cycle chaos A scatter-gather flow needs BOTH replies. Each reply travels through the handler-dispatch chain that can stack one publisher retry budget (~30s ack-wait + ~10s inter-attempt delay) plus the consumer-side retry-queue delay. A single ill-aligned chaos cycle is absorbed by the publish retry; two consecutive cycles bracketing the same request-reply window can push a single reply past the per-flow timeout. The framework's RequestReplyManager then evicts the correlation entry and any late-arriving reply is silently dropped (no entry to match against). Doubling the request-reply timeout gives the second reply enough wall- clock to ride out a two-cycle alignment. Observed in a 30-minute chaos soak with 25 kills: 4 of 284,536 flows hit the alignment; doubling the budget removes the alignment window entirely.
verify-all.sh runs unit tests, E2E tests, a 5-minute harness soak, and a 5-minute chaos soak in sequence. Each stage is skippable via SKIP_* env vars. Uses docker compose up --wait against the harness's existing docker-compose.yml so the broker is reachable before the harness starts. out/ and docs/ are now ignored: out/ is regenerated on every harness run, and docs/ holds working specs/plans that are committed deliberately during agent sessions (not auto-staged from the working tree). Tracked files in docs/ remain tracked.
Roslyn's IDE0031 (null propagation) rewrite for `if (x is not null) x.E += h` is `x?.E += h`, which requires C# 14 null-conditional compound assignment. ServiceConnect.Client.RabbitMQ multi-targets net8.0 (LangVersion=12) and net10.0 (LangVersion=14), so the suggested rewrite is invalid under the net8.0 compile. Combined with TreatWarningsAsErrors + EnforceCodeStyleInBuild, the warning escalated to a build error in the net10.0 pass on CI. Wrap the four event subscribe/unsubscribe blocks in `#pragma warning disable IDE0031` with a comment explaining the TFM split, rather than duplicating the bodies behind `#if NET10_0_OR_GREATER`.
A tag can point at any commit, and published NuGet packages can only be delisted, not unpublished, so the prior flow (restore -> build -> pack -> push) had no safety net if a non-master commit was tagged or master had regressed since CI last ran. Re-run both test projects against the exact tagged tree before pack, and upload the trx files as a 90-day audit artifact alongside the nupkgs.
New page under learn/operations covering RabbitMQ cluster connectivity:
host-list format (including the no-trim and same-port-per-host
constraints from the naive Host.Split(',')), automatic-recovery
behaviour cross-linked to the observability log table, recovery-tuning
knobs, and a quorum-queues recipe that wires x-queue-type=quorum
through Arguments / RetryQueueArguments / UtilityQueueArguments. Also
adds a sidebar entry under Operations and a cross-link from the
Transport section of the Configuration page so existing readers find
the new content.
Anchors the .gitignore "docs/" rule to "/docs/" so it only matches the
repo-root agent-spec folder, not website/src/content/docs/. Without
that, every new docs page would be silently ignored.
These files were tracked before /docs/ was added to .gitignore. Remove from the repo so the gitignore rule actually takes effect — keep them locally.
Repo Pages was previously in legacy mode serving /docs (which has been removed). Switched Pages to GitHub Actions build_type via API; update the workflow trigger and deploy gate to match the active branch.
Astro 6 requires Node >=22.12.0; runner was on 20.20.2.
ILeaseAwareTimeoutStore, IServiceConnectConnection, and IRegistryInitializer are in the sidebar but don't exist as interfaces in the codebase and have no corresponding content pages, so the links 404. Drop the nav entries.
… deadline helper CancelHelperPublishesAtDeadlineAsync raced the drain-deadline catch in DisposeAsync: both paths cancel shutdownPublishCts when the grace window expires, but the helper omitted the Volatile.Write that the drain-catch performs first. With FakeTimeProvider the helper's Task.Delay fires first, so the stalled retry-publish unblocked with OCE while _shutdownTimedOut was still 0, and AckOrNackAsync fell through to BasicNackAsync(requeue:true) instead of leaving the message unacked. The audit-stall sibling test happened to survive because the OCE is swallowed and ProcessAsync's later `return !_shutdownTimedOut()` had enough scheduling slack for the drain-catch to set the flag; the retry path rethrows so there is no slack.
releases.mdx: replace the deleted v7 sections with a single audience- action structure (Breaking / What's new / Hardening / Removed / Migration), summary depth with grouped highlights. ~150 lines vs the ~700-line scaffold; ticket IDs / phase labels / commit hashes stripped per the project's comment-style rule. migrating-v6-to-v7.mdx: new focused guide covering the mechanical conversions (DI, handlers, pipeline, fan-out, async) and the silent runtime bear traps (STJ strictness, Mongo aggregator + saga renames, TLS default, publisher confirms, exchange-name hash shift, reserved- header trust boundary, OTel semconv 1.x attribute rename). Linked from the top-level sidebar.
Cross-checked all 64 mdx pages against the v7 solution and patched the findings — 12 BLOCKERs (broken code, dead links, wrong signatures), 13 MAJORs (API drift, missing defaults, missing overloads), and 14 MINORs (wording, missing nuance). Highlights: - migrating-v6-to-v7: fix IFilter signature, IProcessHandler.HandleAsync signature, IStreamHandler.ExecuteAsync signature, SendToManyAsync ct param drop, SendRequestMultiAsync no-endpoints-array, branch URL, ActivitySource name. - learn/operations/clustering: HeartbeatTime default 60s -> 120s, fix GitHub org link. - learn/operations/error-handling: RequestSendCancelledException does NOT derive from ServiceConnectException; catch it separately. - learn/operations/hosting: BusHostedService races StopConsumingAsync against Task.Delay(GracefulShutdownTimeoutMilliseconds), not passes the timeout in. - learn/messaging-patterns/aggregator: no persistor = warning + drop per message, not registration failure. - learn/messaging-patterns/filters: ISendMessageMiddleware MUST be Singleton; IMessageProcessingMiddleware has no lifetime restriction. - learn/messaging-patterns/process-manager: RequestTimeoutAsync(ct) cancels the insert, not the delivery — pass CancellationToken.None when the schedule must survive shutdown. - reference/bus/ibus: RequestOptions.Timeout is int; sentinel is Timeout.Infinite (-1), not Timeout.InfiniteTimeSpan. RequestTimeoutAsync only needs ITimeoutStore, not EnableProcessManagerTimeouts=true. - reference/messages/options: add missing ReplyOptions section. - reference/handlers/event-args: OutgoingEventArgs is abstract with a protected constructor. - reference/healthchecks: document the 4th AddServiceConnectBus and AddServiceConnectConsumer overloads (configurable grace + TimeProvider); ProducerConnectionHealthCheck calls GetHealthSnapshot, not direct IsHealthy/HasAttemptedConnection reads. - reference/extension-points/transport/iconsumer: BrokerConsumer and KafkaConsumer skeletons now implement IsCancelledByBroker (abstract, no DIM default — examples wouldn't compile without it). - reference/extension-points/transport/iproducer: BrokerProducer and KafkaProducer skeletons reference 'body'/'packet' (param names), not the undefined 'message'; KafkaProducer.BuildMessage accepts ReadOnlyMemory<byte> instead of byte[]. - reference/extension-points/persistence/iprocessmanagerfinder: IIdentified lives in ServiceConnect.Interfaces, not ServiceConnect.Interfaces.Persistence. Plus MINORs across getting-started, request-reply, streaming, ibusconfiguration, itransportconfiguration, ifilter, ipropertymapper, telemetry, imessagedispatcher, imessageprocessor. Astro build verified locally (all 64 pages emit).
Cross-doc link-integrity sweep found four anchors that don't resolve against the slugs Starlight emits for the target headings: - pub-sub -> messages/#separate-commands-from-events: no such heading. The commands-vs-events discussion lives as a bolded paragraph inside "Designing contracts that age well"; repoint to that section. - request-reply -> ibus/#sendrequestasynct-treply (and the multi variant): github-slugger strips angle brackets and commas but preserves the space between type-parameter words, so the heading `SendRequestAsync<TRequest, TReply>` slugifies to `sendrequestasynctrequest-treply`, not `sendrequestasynct-treply`. - process-manager -> iprocessmanagerfinder/#findDataAsync: CamelCase link to a lowercase anchor, and missing the <T> typeparam letter. Correct slug is `finddataasynct`. Verified against the rendered HTML in website/dist/ and via a slug-aware link checker over every internal /ServiceConnect-CSharp/ link in the docs (67 unique links, 0 failures after this fix).
- ci: run on all branches and PRs (push/pull_request branches: ['**']) instead of only [master, 'v*'], so feature branches get CI. The '**' glob matches branches only, so CI does not double-fire on release tags. - docs: build/deploy only on master (was the v7-clean-architecture feature branch); deploy gate updated to refs/heads/master to match. - release: keep the v* tag trigger but gate the publish on the tagged commit being reachable from origin/master, so a tag pushed on a feature branch is rejected before anything builds or pushes to NuGet.org. Checkout fetches full history (fetch-depth: 0) so the ancestry walk has the commit graph.
- Add CONTRIBUTING.md covering project layout, build/test (verify-all.sh), coding conventions (multi-target C# 12/14, nullable, async/ConfigureAwait, naming, source-gen logging, exception hierarchy, comment style), test conventions, docs, the commit/PR workflow, and the maintainer release process. - README: link to CONTRIBUTING.md. - release.yml: compare the tag's base version (stripping any SemVer pre-release suffix) against Directory.Build.props <Version>, so a cut can go v7.1.0-rc.1 -> v7.1.0 without editing the props file between pre-releases. The on-master guard and tag-derived package version are unchanged.
…ceWindow When the close stalls and the grace-window deadline wins, the consumer host returns without awaiting the channel-host close, so the channel's DisposeAsync runs on an abandoned background task. The test verified that DisposeAsync was invoked immediately after the dispose completed, separated only by a single Task.Yield — which doesn't deterministically drain that background continuation, so the assertion raced thread-pool scheduling and failed intermittently (observed on both ubuntu and windows runners). Signal a TaskCompletionSource from the mock channel's DisposeAsync callback and await it (plus the dispose task) on a real-time budget before verifying, so the test waits for the exact event it asserts on. Production behaviour is unchanged; the grace-window "not completed before the deadline" assertions are retained. Verified: full RabbitMqConsumerHostTests class green (42/42) and the previously flaky test 40/40 across a stress loop.
Full review of the v7 docs site (64 pages) and README against the solution, fixing every confirmed issue (each verified against src/ before changing). BLOCKERs: - getting-started + core-concepts/handlers: the explicit-registration examples registered the HandlerReference list but omitted AddTransient<IMessageHandler<T>,THandler>(); with scanning disabled the dispatcher resolves nothing and silently drops every message. Add the registration and correct the handler-lifetime wording. - operations/clustering: opts.Host does not exist on RabbitMqOptions (the typed overload maps to ClientSettings); set Host via .ConfigureTransport(...). - migrating-v6-to-v7: three "v7" snippets did not compile (CancellationToken passed in the options-struct position; non-existent SendAsync(string,...) overload). Broken internal links (26): 24 under-stepped relative links across the extension-points reference subtree (resolved one path segment too shallow in URL space), the timeout #lease-semantics anchor, and a landing-page FeatureGrid card pointing at the removed MessageDeduplication feature (repointed to the idempotency page). MAJOR: RequestedHeartbeat -> HeartbeatTime client-setting key; ErrorQueueName/ AuditQueueName have fixed defaults, not QueueName-derived; PublishOptions/ SendOptions have no priority field; AddServiceConnect throws on a second call; exchange names are a sanitised type name + hash, not the plain full name; RequestTimeoutAsync documents NotSupportedException. MINOR: healthchecks default-overload comments (GetRequiredService, not ActivatorUtilities); README HealthChecks install line; releases v7.0.0-tag caveat; ConsumerCount/poll-interval defaults; options immutability wording; HandlerReference link target; Health Checks sidebar entry (was an orphan page); configuration knob-count consistency. Source comments: OutgoingFiltersBlockedException XML-doc now matches Bus.cs (every outgoing path throws on FilterAction.Stop); Directory.Build.props net10.0 is the current LTS, not the latest STS. Verified: site rebuilds (65 pages), 0 broken links/anchors via a rendered-HTML id check; ServiceConnect.Interfaces builds clean.
… fan-out)
Phase 4 of the cross-runtime interop work: make a v7-clean-architecture
service interoperate with the deployed C# master and the Node.js
implementation on the same RabbitMQ bus. master is the immutable reference.
- MessageTypeExchangeName.From: drop the "_<8 hex>" SHA-256 suffix and return
the bare FullName with dots removed, exactly matching master's
`type.FullName.Replace(".", string.Empty)` (Producer.cs / Bus.cs). The suffix
disambiguated the pathological "A.BC"/"AB.C" -> "ABC" collision, but master
accepts that collision and Node/master do not carry the suffix, so a v7
publisher/consumer previously used a different exchange and could not exchange
messages. Both call sites (producer publish, consumer bind) share this helper,
so the single change realigns both.
- Bus.PublishAsync: publish a derived message to each ancestor type's exchange
(walking BaseType up to, but excluding, Message), re-stamping TypeName/
FullTypeName per hop via SendContext.MessageType. This reproduces master's
recursive Publish<TBase> fan-out so a subscriber bound to a base-type exchange
receives derived messages. The prepared headers are snapshotted and cloned per
hop so each carries the one Bus-minted MessageId, as master does.
Audit confirmed the remaining wire surface (PascalCase headers incl. TypeName/
FullTypeName/MessageType=operation, body serialization with integer enums and no
$type, retry/error/audit topology, inbound header validation) already matches
master, so no other changes are needed.
Verified: UnitTests 1704/1704, SerializationCompatTests 48/48, EndToEndTests
136/136 (real broker via Testcontainers, incl. the polymorphic base-handler test).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pub/sub exchange name is `type.FullName.Replace(".", string.Empty)` with no
hash suffix or assembly metadata — matching the deployed C# master wire format
so v7, master, and Node share exchanges. Update the docs that still described
the old "sanitised type name + short hash suffix" / "FullName + assembly.Name
hash" scheme: endpoints, pub-sub, polymorphic-messages, the v6->v7 migration
note, and the release notes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A base-type subscriber now registers a HandlerReference for each CONCRETE subtype only — not for the base type. The producer fans every derived publish out to its own exchange and every ancestor exchange, so binding only the concrete exchanges delivers each event exactly once; the dispatcher's type-hierarchy walk routes the concrete delivery to the base-type handler, which receives the concrete instance. Previously the audit subscriber also registered the base type, which bound the base exchange too: under the fan-out it received each event twice, and the base-exchange copy (re-stamped to the base type) could not be deserialised when the base is abstract. The hierarchy walk also invokes every matching handler up the chain with no de-dup, so registering one handler against both base and derived double-invoked it even for a single delivery. No engine change — the producer fan-out and consumer dispatch are unchanged; this is the correct consumer-side wiring for them. - EndToEndTests: rewrite the polymorphic test to the concrete-only wiring and assert exactly-once delivery (catches double-bind / double-walk regressions). - examples/PolymorphicMessages: drop the base HandlerReference from the audit subscriber; verified it audits each event exactly once with no dead-letters. - docs/polymorphic-messages: rewrite "dispatch vs subscription" to teach concrete-subtype binding + the hierarchy walk, why not to bind the base, and the concrete-base alternative for auto-catching new subtypes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR merges
v7-clean-architectureintomasteras the v7 line of ServiceConnect.v7 is a clean rewrite, not an upgrade: the static
Busis gone, every public path is async withCancellationToken, the wire format is System.Text.Json, the public surface is shrunk to extension contracts only, and the solution is consolidated from 17 production projects to seven shipping NuGet packages. Reliability, security, and observability work landed in the same drop — header trust boundaries, cardinality and amplification caps, persistence-lease correctness, OTel semconv 1.x messaging attributes, and aServiceConnect.HealthCheckspackage.This is not source- or binary-compatible with v6. See the v7 entry in
website/src/content/docs/releases.mdxfor the full audience-oriented changelog (breaking changes, new features, hardening, removals, migration steps). A short summary follows.Highlights
Architecture and API
AddServiceConnect(...)DI extension + fluentServiceConnectBuilderreplace the staticBus.IBusisIAsyncDisposableand single-use (latched on stop/dispose).IBus,IProducer,IConsumer,IConsumeContext,IProcessManagerFinder,ITimeoutStore,IAggregatorPersistor, middleware, and filters are all async withCancellationTokenend-to-end. All sync-over-async paths removed from the RabbitMQ transport.HandleAsync(T msg, IConsumeContext ctx, CancellationToken ct)); the unsafe settableContext/Streamproperties are gone.Bus,MessageDispatcher, concrete persistors, concrete RabbitMQ classes, every*Configurationtype, all hosted services are nowinternal. Public API = interfaces inServiceConnect.Interfaces+ the builder/extension methods on the bus configuration root.AddOutgoingFilter<T>,AddBeforeConsumingFilter<T>,AddOnConsumedSuccessfullyFilter<T>(new),AddAfterConsumingFilter<T>,AddSendMessageMiddleware<T>,AddMessageProcessingMiddleware<T>.HandlerProcessor,ReplyProcessor,ProcessManagerProcessor,AggregatorProcessor,StreamProcessor) with pluggable registries — no per-message reflection.bus.CreateStream<T>(endpoint)→IMessageBusWriteStream.SendToManyAsync/PublishRequestAsyncreplaceSendOptions.EndPoints/RequestOptions.EndPoints.SendRequestMultiAsyncthrowsRequestTimeoutException(withPartialReplies) on under-delivery instead of returning silently.Transport (RabbitMQ)
Producerconnection, safe dispose, nack on retry failure, asyncReply/Route.Clientfacade +RabbitMqConsumerHost; audit/retry concerns extracted (MessageAuditPublisher,MessageRetryHandler,HeaderHelpers).SslEnabled = true, port 5671); plaintext outside loopback emits aWarning.PublisherAcknowledgements=falsewith a finitePublishTimeoutis a startup error.PublishOptions.RoutingKeyhonoured (IProducer.SupportsRoutingKeycapability flag for third-party transports).MessageIdsurvives publish retries;TimeoutExceptionis retriable under the at-least-once contract (channel reset + sameBasicProperties).RabbitMqOptions.MaxPublishWaitTimecaps the retry-loop budget.mandatory:true— topology gaps surface asPublishExceptionrather than silent drops.MessageTypeExchangeNameis now version-stable (type.FullName + assembly.GetName().Name); first v7 deploy on a broker shared with v6 will see new exchange names.Persistence
ServiceConnect.Persistence.MongoDb: bumped toMongoDB.Driver3.8.0; bundled persistor handlesGuidRepresentationMode = V3andRenderArgs<T>. Startup guards onWriteConcern.Unacknowledged, conflicting Guid-serializer registrations, and the legacy(Locked, Time)index.IProcessManagerTypeRegistryenables pre-startupCorrelationIdunique index creation. AggregatorNamepartition value changed (Aggregator<T>→ concrete type); generic-saga collection names sanitized — both require rename scripts before deploy.ServiceConnect.Persistence.InMemory: no more 2-day TTL drop; deep-clones on read; types areIDisposable.IAggregatorPersistorparameter/return types tightened toIHasCorrelationId/IReadOnlyList<IHasCorrelationId>. AddedCountResolvedAsyncandReleaseSnapshotAsync(additive default-interface-methods).IKeyValueStore/ICacheProviderswitch fromGet<>toTryGet<>.ServiceConnect.Persistence.SqlServerandServiceConnect.Persistence.Redisdropped — no replacement.Serialization
Newtonsoft.Jsonis no longer transitive.IMessageSerializerreduced to four methods (Serialize<T>(T, IBufferWriter<byte>),Deserialize<T>(ReadOnlyMemory<byte>),Deserialize(ReadOnlyMemory<byte>, Type), defaultReadOnlySequence<byte>overload).IProducerbody isReadOnlyMemory<byte>; streamingWriteAsynccascades toReadOnlyMemory<byte>.ServiceConnect.SerializationCompatTestsenforces v6↔v7 round-trip on every PR.Telemetry and operations
ServiceConnect.HealthCheckspackage:BusConsumingHealthCheck,ConsumerConnectionHealthCheck,ProducerConnectionHealthCheck. ConfigurablerecoveryGraceWindow(default 30s) absorbs transient broker flap; permanent broker-cancel bypasses grace.ActivitySource("ServiceConnect.Bus"); telemetry static state removed (options + attributes are method parameters now). Per-direction toggles.messaging.*instruments + ServiceConnect-specific counters (retry.attempts,retry.drops,publish.confirm_timeouts,audit.drops,outgoing_filters.blocked) + in-flight UpDownCounter.messaging.operation.type/messaging.operation.name;messaging.destination.namenow reflects broker exchange/routing key (not CLRFullName). Dashboards filtering on the legacy attributes must update.traceparent/tracestateinjected on outbound.MaxTagValueLength(default 256) +ExceptionMessageSanitiserfor cardinality / PII safety.ConnectionOpened/ProducerConnectionOpened/ConnectionRecovered/ConnectionLost) carry the connection'sMessageId.Security
DestinationAddress,MessageId,MessageType,TypeName,FullTypeName); reply routing resolves through registered handlers only (noType.GetTypeon caller-supplied wire data).DeepCloneswitched offNewtonsoft.JsonTypeNameHandling.Auto— removes the deserialisation-gadget surface andMongoDB.Bsoncoupling.Random.Sharedfor jitter; exponential backoff with overflow cap.Reliability
RetryCountheaders route to error queue. Unresolved message types are terminal. Reply-shaped messages route safely withoutIRequestReplyManagerregistered.Id; cancellation betweenUpdateManyandFindAsyncperforms best-effort lease cleanup;$facetaggregation eliminates the second round-trip.RemoveDataAsyncdistinguishesKeyNotFoundExceptionfromConcurrencyException; batch-size flush gates on resolved-count;OnTimerFiredregister-before-recheck closes disposal-snapshot race.UnregisteredAsyncevents ignored; queues bound beforeBasicConsume; parallel host disposal.Busno longer disposes transport singletons;DisposeAsyncis bounded byDisposeTimeout(default 30s).Build, packaging, license
net8.0andnet10.0on every package.net8.0will be dropped in the first major after Microsoft's EoL (2026-11-10).ServiceConnect,ServiceConnect.Interfaces,ServiceConnect.Client.RabbitMQ,ServiceConnect.Persistence.MongoDb,ServiceConnect.Persistence.InMemory,ServiceConnect.Telemetry,ServiceConnect.HealthChecks(new).dotnet pack; metadata centralised insrc/Directory.Build.props. SourceLink +.snupkgsidecar on every package.TreatWarningsAsErrors=true,EnforceCodeStyleInBuild=true, NetAnalyzers + VSTHRD + Meziantou repo-wide.Removed
ServiceConnect.Filters.MessageDeduplication(silently dropped legitimate redeliveries; replacement pattern inexamples/CustomFilterAndMiddlewareuses the newOnConsumedSuccessfullystage).ServiceConnect.Persistence.SqlServer,ServiceConnect.Persistence.Redis(no replacement).Bus,IProducer.DisconnectAsync,IProcessMessageMiddleware,SendOptions.EndPoints/RequestOptions.EndPoints,SendEventArgs.EndPoints(plural),byte[]/ReadOnlySpan<byte>overloads onIMessageSerializer,Get<TKey,TValue>onIKeyValueStore/ICacheProvider, pre-1.xmessaging.operationattribute, three separateActivitySources, the seven.nuspecfiles, the DocFX site.Docs and examples
website/, deployed to GitHub Pages from this branch. Replaces the prior DocFX site.examples/: Aggregator, CompetingConsumers, ContentBasedRouting, CustomFilterAndMiddleware, Filters, PointToPoint, PolymorphicMessages, ProcessManager, PublishSubscribe, RequestReply, RoutingSlip, ScatterGather, Streaming, StressHarness, Telemetry. Each shipsdocker-compose+run.sh/.ps1.