ThreadsAPI is a reusable Python library for collecting public Threads feed data using GraphQL replay. It is designed for maintainability, explicit configuration, and production-oriented error handling.
Disclaimer This project is provided only for educational, research, or entertainment purposes. Users are responsible for complying with applicable laws, platform terms of service, and data privacy requirements.
- Fetch public logged-out Threads feed data with multi-page pagination.
- Country filtering via standard ISO 3166-1 alpha-2 codes (validated with
pycountry). - Capture runtime replay parameters (tokens, cookies, request metadata).
- Strategy-aware token and doc_id refresh (HTTP-first in
auto, controlled fallback to browser). - Expand same-author thread continuations, including media in child posts.
- Export OpenSearch-friendly post documents.
- Optionally persist local session state to JSON.
This project uses uv and pyproject.toml as the dependency source of truth.
uv sync
uv run scrapling installIf browser assets need to be reinstalled:
uv run scrapling install --forcefrom threadsapi import ThreadsScraper
async with ThreadsScraper(concurrent=3) as scraper:
posts = await scraper.crawl_pages(
pages=3,
country="ID",
include_full_thread=True,
)from threadsapi import ScraperConfig, ThreadsScraper
config = ScraperConfig(
concurrent=3,
bootstrap_strategy="auto",
)
async with ThreadsScraper(config=config) as scraper:
posts = await scraper.crawl_pages(pages=2)Copy the example configuration and export or load it into your runtime environment:
cp .env.example .envfrom threadsapi import ScraperConfig
config = ScraperConfig.from_env()See .env.example for all supported environment variable names and default example values. ScraperConfig.from_env() reads variables already available in the process environment; load .env through your application runner or environment management tool when needed.
Copy the example file before customizing local values:
cp threadsapi.example.yaml threadsapi.yamlfrom threadsapi import ScraperConfig
config = ScraperConfig.from_yaml("threadsapi.yaml")See threadsapi.example.yaml for a complete configuration example.
ThreadsScraper(
concurrent=3,
bootstrap_strategy="auto", # "auto" | "http" | "browser"
auth_strategy="session", # "session" | "direct" | "auto"; inferred as "direct" when login credentials are provided
session_path="threads-session.json",
persist_session=False,
base_url="https://www.threads.com",
graphql_url="https://www.threads.com/graphql/query",
app_id=None,
asbd_id=None,
user_agent="...",
timeout_seconds=30,
browser_headless=True,
login_username=None,
login_password=None,
login_two_factor_code=None,
)You normally do not need to set mode. Without login credentials the scraper starts anonymous; with username/password it switches to authenticated direct login automatically.
app_id and asbd_id are public runtime header values observed from Threads web requests.
They are optional: when omitted, the scraper adopts them from captured runtime request headers
during bootstrap. These values are not required to be static — the library discovers them automatically.
The country parameter accepts ISO 3166-1 alpha-2 codes (e.g. "ID", "US", "JP") — validated via pycountry. Pass "world" or None to disable country filtering.
posts = await scraper.crawl_pages(pages=2, country="US")Invalid codes raise ValueError immediately. A valid ISO country may still return no public feed content from Threads; treat that as NO CONTENT, not as invalid input.
Bootstrap and token/doc_id refresh both respect the configured strategy:
auto(default): HTTP-first bootstrap and refresh, with browser fallback only when replay state is incomplete or HTTP fails.http: HTTP-only — lighter, but will raise errors if HTTP bootstrap cannot provide complete replay state.browser: always use browser capture — more resource-heavy but captures the richest replay state.
Refresh is triggered automatically on auth failures (401/403), expired doc_ids, or GraphQL execution errors.
For this GraphQL scraper, “login to Threads” means capturing an authenticated Threads web session and reusing the resulting cookies/runtime replay state for threads.com/graphql/query.
Direct web login posts username/password to Threads web login endpoints. It uses safe password encryption when key material is discoverable from Threads page/JS bundles, handles two_factor_required, and persists only authenticated web cookies/tokens — never the password or 2FA code.
The mobile Instagram Bloks login endpoint returns a mobile bearer token (Bearer IGT:2:<token>). That token is for mobile private API requests and does not provide the web cookies, lsd, fb_dtsg, headers, doc IDs, or captured variables required by this GraphQL replay client.
from threadsapi import ThreadsScraper, TwoFactorRequired
scraper = ThreadsScraper(
bootstrap_strategy="auto",
session_path="threads-session.json",
persist_session=True,
login_username="your_username",
login_password="your_password",
login_two_factor_code=None, # set to auto-submit or leave None
)
try:
await scraper.init()
except TwoFactorRequired as exc:
code = input("2FA code: ")
await scraper.complete_two_factor(exc.challenge, code)
await scraper.close()After the session is saved, reuse it normally. The persisted authenticated session is preferred automatically when it is still fresh:
async with ThreadsScraper(
bootstrap_strategy="auto",
session_path="threads-session.json",
persist_session=True,
) as scraper:
posts = await scraper.crawl_pages(pages=2, include_full_thread=True)From the TUI:
uv sync --extra tui
uv run python scripts/tui.pyOpen Config & Info, set the session path, enter username/password, then click Direct Login. If Threads requires 2FA, enter the code and click Verify 2FA. Use Cancel Login to stop an active login attempt.
To validate login, check Config & Info: authenticated sessions show Mode: authenticated and Auth cookies: yes. You can also use Account Search; search uses logged-in GraphQL variables, while public feed alone is not a login proof because it can work anonymously.
From environment or YAML, credentials are enough to enable direct authenticated login:
THREADSAPI_USERNAME=your_username
THREADSAPI_PASSWORD=your_passwordlogin:
username: your_username
password: your_passwordLimitations:
- Password encryption depends on runtime key material in Threads web JS; raises
PasswordEncryptionUnavailablewhen keys cannot be found. - Endpoint shape may change; this is experimental and may break without notice.
- Direct login failures never fall back to anonymous scraping.
ThreadsAPI uses typed exceptions for explicit handling:
ConfigError: invalid runtime configurationBootstrapError: failed HTTP/browser bootstrapAuthenticationError: direct web login failedInvalidCredentialsError: username/password rejectedTwoFactorRequired: login requires 2FA (carries achallengeforcomplete_two_factor())TwoFactorError: 2FA verification failedPasswordEncryptionUnavailable: safe password encryption could not be performedTransportError: terminal GraphQL transport failureGraphQLDecodeError: non-JSON or invalid GraphQL response bodyRateLimitError: retry budget exhausted for retryable rate-limit responses
Session JSON files contain sensitive cookies/tokens after login:
- Do not commit session files to Git.
- Keep them in a secure local/private environment.
- Do not share token/cookie values in logs, issues, or public channels.
Run example:
uv run python example.pyRun tests:
uv run python -m unittest discover -s tests -p "test_*.py"
uv run python -m py_compile threadsapi/*.py tests/test_*.py example.py scripts/test_countries.pyTest country availability quickly:
uv run python scripts/test_countries.py
uv run python scripts/test_countries.py ID US JP world --per-page 3threadsapi/
├── __init__.py # public exports
├── client.py # ThreadsScraper orchestration
├── session.py # token/session model + JSON persistence
├── registry.py # doc_id registry/discovery
├── bootstrap.py # HTTP/browser/session bootstrap
├── web_auth.py # direct Threads web login + 2FA
├── transport.py # GraphQL request lifecycle and retry handling
└── parsers.py # parsing and OpenSearch document mapping
| Issue | Resolution |
|---|---|
ModuleNotFoundError: No module named 'curl_cffi' |
Ensure scrapling[fetchers] is installed, then run uv sync |
| Missing browser executable | Run uv run scrapling install |
Deprecated Use StealthyFetcher.configure() warning |
Use StealthyFetcher.configure(...) and StealthyFetcher.async_fetch(...) |