Skip to content

Stability and safety improvements#32

Merged
karust merged 4 commits into
mainfrom
stability-safety-improvements
Jun 12, 2026
Merged

Stability and safety improvements#32
karust merged 4 commits into
mainfrom
stability-safety-improvements

Conversation

@karust

@karust karust commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a derived per-request timeout for non-/mega/*, non-/extract routes so hung engine requests return 504 request_timeout instead of running open-ended.
  • Harden /extract URL handling with public-network validation, HTTP/HTTPS-only targets, redirect re-validation, raw dial IP pinning, rendered-mode preflight, and extract.allow_private_networks as an explicit opt-out.
  • Stabilize protection paths by caching rate limiters, avoiding circuit-breaker penalties for cancellations/deadlines, and removing request-path panics from browser resource blocking / CLI block_resources parsing.
  • Add focused tests for request deadlines, extraction SSRF blocking, limiter caching, cancellation handling, and circuit-breaker behavior.
  • Fix integration tests timeout issue

Why

  • Prevent slow or hung search requests from exceeding the configured retry budget.
  • Reduce SSRF/private-network risk in extraction without breaking public bare-host extraction.
  • Keep rate limiting effective across repeated calls and avoid marking healthy engines unhealthy because a client/request deadline fired.
  • Make safety behavior deterministic and covered by unit tests.

Testing

  • make test
  • make lint
  • make test-integration (only for browser, proxy, captcha, or live-engine changes)

Checklist

  • I linked the related issue or explained why there is none.
  • I updated docs or examples for changed user-facing behavior.
  • I kept unit tests deterministic and free of browser/network dependencies.
  • I removed secrets, proxy credentials, and private logs from examples.

karust added 4 commits June 12, 2026 04:24
A rod panic during SearchImage propagated uncaught into fasthttp and killed the whole process: per-engine recover blocks only covered Search (5 of 6 engines had no recovery on SearchImage), and there was no Fiber
recover middleware.

Regression test: a panicking SearchImage returns 502 engine_internal and the server keeps serving.
Add `RequestTimeout` config that bounds wall-clock time of any request that does not manage its own deadline budget. It is derived from the engine timeout and retry budget via RequestTimeoutForRetries.
- breaker: stop counting client cancellations/deadlines (incl. bare
  rate-limiter wait errors) and circuit-open as engine failures
- rate limiting: cache limiters in SearchEngineOptions and rawEngine so
  pacing applies on raw/library paths; pool wrapper delegates
- /extract SSRF guard: public-IP-only policy with dial-time IP pinning,
  redirect re-validation, rendered-mode preflight, http/https allow-list,
  ErrTargetNotAllowed -> HTTP 400, extract.allow_private_networks
  escape hatch (default off)
- browser: no rod Must* on the request path; CLI parses block_resources
  without panicking
@karust karust merged commit ca3143c into main Jun 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant