Skip to content

SCA-5930 Fix intermittent UA UPDATE "Read timed out" by disabling stale connection reuse#127

Open
ChenLuigi wants to merge 1 commit into
masterfrom
SCA-5930-fix-ua-stale-connection-reuse
Open

SCA-5930 Fix intermittent UA UPDATE "Read timed out" by disabling stale connection reuse#127
ChenLuigi wants to merge 1 commit into
masterfrom
SCA-5930-fix-ua-stale-connection-reuse

Conversation

@ChenLuigi

@ChenLuigi ChenLuigi commented Jun 29, 2026

Copy link
Copy Markdown
Member

Summary

Fixes the intermittent Failed to send request to WhiteSource server: Unexpected error. Response data is: Read timed out that hangs the Unified Agent's first Sending Update request for the full connection timeout (default 60 minutes) before the built-in retry recovers on a fresh connection in seconds.

Jira: SCA-5930 · Customer evidence: TKA-10393 (IBM ibmets, Cloud Software Group / Citrix SaaS-US — both strategic).

Root cause

WssServiceClientImpl builds its HTTP client with new DefaultHttpClient() and reuses persistent (keep-alive) connections across every call (orgFlags → checkPolicies → UPDATE → getRequestState), with no stale-connection check and no idle/TTL eviction.

When a pooled connection has been silently dropped by an upstream load balancer / proxy / firewall (half-open — no FIN/RST reaches the client), the next request — the large UPDATE POST, issued after the minutes-long dependency-resolution phase during which the connection sat idle — is written into a dead socket. No response ever arrives, so the socket read blocks for the entire wss.connectionTimeoutMinutes (default 60). The automatic retry opens a new connection and the identical payload succeeds in 2–10 s.

This is consistent across all three captured cases:

Case Version UPDATE issued "Read timed out" Gap Retry
IBM 26.3.2 / 2.9.9.100 11:40:08 UTC 12:40:10 UTC 60m02s success ~8s
Citrix 26.5.1 12:46:50 UTC 13:46:50 UTC 60m00s success in 2s
TensorFlow repro 26.5.1 (laptop-sleep artifact) success ~10s

The stack trace confirms the client is blocked in receiveResponseHeader with the request already sent. It is not a backend-performance issue — the backend processes the same payload in seconds, the failed request never reaches the server (no API_CALL, no CTX), and because the UA exits SUCCESS(0) after the retry, the failure is invisible to server telemetry.

Fix

Disable HTTP connection reuse via NoConnectionReuseStrategy on every client-creation path so each request uses a fresh connection and a stale/half-open socket can never be reused:

  • default constructor (new DefaultHttpClient())
  • ignore-certificate constructor
  • both setProxy() builder paths (reached via findDefaultProxy())

No timeout semantics are changed, so legitimately slow large updates are unaffected. The only cost is one extra connection setup (TLS handshake, ~tens of ms) per request — negligible for the agent's handful of requests per scan.

Why not just raise wss.connectionTimeoutMinutes?

The retry already succeeds in seconds, so 60 min is ample; a larger value only makes the failed attempt hang longer. The correct fix is to never reuse the dead connection in the first place.

Testing

  • mvn -pl wss-agent-client -am compileBUILD SUCCESS
  • wss-agent-client unit tests pass (plus wss-agent-api, wss-agent-utils).
  • The only failing tests are pre-existing SHA-1/MD5 hash assertions in the unrelated wss-agent-hash-calculator module, caused by local git CRLF→LF normalization of test fixtures — not touched by and cannot be affected by this change.

Rollout

This artifact (wss-agent-api-client) is consumed by the Unified Agent as a pinned dependency. After release, bump agent.api.version in the unified-agent pom.xml (currently 2.9.9.100) to the new version and rebuild the Fat JAR.

Follow-up (separate, not in this PR)

In setProxy() the proxy clients are rebuilt after setConnectionTimeout() runs, so proxied clients currently receive no socket timeout at all. Best fixed by moving timeouts to a per-request RequestConfig (works for all client types).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved HTTP connection handling to avoid reusing persistent connections, which can help reduce connectivity issues in some environments.
    • Updated proxy behavior so connection settings are applied consistently, including when authentication is used.

…le connection reuse

WssServiceClientImpl reused persistent keep-alive connections across requests with no
stale-connection check and no idle/TTL eviction. A pooled connection silently dropped by
an upstream load balancer / proxy / firewall (half-open) was reused for the large UPDATE
POST issued after the minutes-long resolution phase; the request was written into a dead
socket, no response arrived, and the read blocked for the full wss.connectionTimeoutMinutes
(default 60 min) before the built-in retry recovered on a fresh connection in seconds.

Disable HTTP connection reuse (NoConnectionReuseStrategy) on every client-creation path
(default constructor, ignore-certificate constructor, and both setProxy builder paths) so
each request uses a fresh connection and a stale/half-open socket can never be reused. No
timeout semantics changed, so legitimately slow large updates are unaffected; the only cost
is one extra connection setup per request, negligible for the agent's request volume.

Reported via TKA-10393 (IBM ibmets, Cloud Software Group / Citrix SaaS-US).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 32ef7320-9d22-4d62-a102-ffc01055ad78

📥 Commits

Reviewing files that changed from the base of the PR and between 514dbf6 and 9db723f.

📒 Files selected for processing (1)
  • wss-agent-client/src/main/java/org/whitesource/agent/client/WssServiceClientImpl.java

📝 Walkthrough

Walkthrough

WssServiceClientImpl is updated to disable HTTP connection reuse (keep-alive/persistent connections) across all client construction paths. NoConnectionReuseStrategy is imported and applied to the DefaultHttpClient in the main constructor, and to both HttpClientBuilder chains in setProxy()—the unauthenticated proxy path and the authenticated-proxy path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: disabling stale HTTP connection reuse to fix intermittent UA UPDATE read timeouts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch SCA-5930-fix-ua-stale-connection-reuse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants