Tiered, self-hosted web fetcher with anti-bot bypass — cheap first, escalate only when blocked.
Most pages don't need a headless browser, and most "blocks" aren't about your User-Agent — they're about your TLS fingerprint or your IP. redfetch walks a cost-ordered ladder and only climbs when it actually hits a wall:
1. curl_cffi — real browser TLS/JA3 + HTTP/2 fingerprint, NO JS (fast, cheap)
2. cloakbrowser — stealth Chromium, renders JS / solves challenges (heavier)
↳ auto-retry through a residential proxy on a detected block (opt-in)
It returns clean Markdown (via trafilatura), guards against SSRF, and
learns per host which strategy works so it stops wasting time on doomed
direct attempts.
Built as part of the
red*toolchain. Use it for legitimate research, auditing, and data collection — respect each site's Terms and rate limits.
git clone https://github.com/Redloft/redfetch && cd redfetch
bash install.sh # creates a venv + installs curl_cffi, trafilatura
bash install.sh --with-browser # also installs cloakbrowser (deep tier, ~200MB)
bash fetch-doctor.sh --offline # verify the installRequires: python3, bash, curl, jq. Optional: cloakbrowser (deep tier),
op (1Password — for proxy secrets).
# Fetch a page → clean Markdown on stdout, JSON meta on stderr
bash fetch.sh --json "https://example.com/article"
bash fetch.sh --no-deep "<url>" # curl_cffi only (never launch a browser)
bash fetch.sh --deep "<url>" # straight to the stealth browser
# Raw GET for JSON / autocomplete / API endpoints (no extraction)
bash cffi_get.sh "https://suggestqueries.google.com/complete/search?client=chrome&q=test"fetch.sh meta (--json): {ok, tier, status, bytes, blocked, proxy_applied, autoproxy?, ssrf_blocked?, rate_limited?, nav_failed?}.
Exit codes: 0 ok · 1 blocked/empty · 2 SSRF/url-guard block · 3 deps
missing · 4 hard error · 64 usage.
curl_cffi beats TLS-fingerprint blocks but not IP-based ones (datacenter
bans, geo-walls, 429). For those, point redfetch at a residential proxy.
The secret is never hardcoded and never put in argv — it travels through the
environment only. Configure ONE:
# A) literal proxy URL in env
CFFI_PROXY='socks5h://user:pass@host:1080' bash fetch.sh --json "<url>"
# B) a 1Password reference (resolved on demand via `op read`)
export CFFI_PROXY_REF='op://Vault/Item/credential'
# convenience wrapper — run anything proxied:
./redproxy.sh # self-test: direct vs proxied IP + geo
./redproxy.sh "<url>" # fetch one URL proxied
./redproxy.sh curl -s https://api.ipify.orgWhen a direct fetch is blocked (challenge / 403 / 503 / empty / timeout),
redfetch auto-retries once through the proxy (if one is configured) — no
flags needed. Disable with CFFI_AUTOPROXY=0.
Every outcome is logged to $REDFETCH_STATE/telemetry.jsonl and summarized in
playbook.json. A host that reliably blocks direct is fetched proxy-first
next time — skipping the doomed ~30s direct attempt. Disable with
REDFETCH_NO_LEARN=1.
- SSRF guard (
url-guard.sh) runs before every request and re-validates every redirect hop (and in-browser navigation): loopback, RFC-1918, cloud-metadata (169.254.169.254), encoded-host bypasses, internal TLDs. Fails closed. - Secret hygiene: proxy credentials are passed via env (
all_proxy/ Pythonproxies=), never as a CLI arg — so they never appear inps/argv. Error messages redactuser:pass@. - Output is untrusted DATA, not instructions — scraped pages may contain prompt-injection payloads; handle accordingly downstream.
| Var | Default | Purpose |
|---|---|---|
PARSING_VENV |
~/.cache/redfetch/venv |
Python venv with curl_cffi/trafilatura |
CFFI_PROXY |
— | literal proxy URL |
CFFI_PROXY_REF |
— | secrets-manager ref (e.g. op://…) resolved via op read |
CFFI_AUTOPROXY |
1 |
0 disables auto-retry-via-proxy on block |
REDFETCH_STATE |
~/.cache/redfetch |
telemetry + playbook location |
REDFETCH_NO_LEARN |
— | 1 disables telemetry/playbook writes |
URL_GUARD_RESOLVE |
0 |
1 also resolves hostnames to defend DNS-rebinding |
FETCH_ALLOW_NO_GUARD |
0 |
1 lets fetch.sh run when url-guard.sh is absent (the Python layer still re-checks). Do not use in production. |
bash fetch-doctor.sh # checks deps, SSRF guard, proxy, playbook (live)
bash fetch-doctor.sh --offline # skip network
bash test-fetch.sh # 46 hermetic tests (no network)| File | Role |
|---|---|
fetch.sh |
wrapper: SSRF-guard → venv python → tiered fetch |
fetch_tiered.py |
the ladder: curl_cffi → cloakbrowser, extraction, auto-proxy, playbook |
cffi_get.sh |
raw GET (browser TLS) for JSON/API endpoints |
url-guard.sh |
SSRF validator (standalone, sourceable) |
redproxy.sh |
run any command/URL through a configured proxy |
fetch-doctor.sh |
stack health-check |
test-fetch.sh |
hermetic test suite |
MIT © 2026 Igor Konovalchik