ThreadsAPI

ThreadsAPI is a reusable Python library for collecting public Threads feed data using GraphQL replay. It is designed for maintainability, explicit configuration, and production-oriented error handling.

Disclaimer This project is provided only for educational, research, or entertainment purposes. Users are responsible for complying with applicable laws, platform terms of service, and data privacy requirements.

Key Capabilities

Fetch public logged-out Threads feed data with multi-page pagination.
Country filtering via standard ISO 3166-1 alpha-2 codes (validated with pycountry).
Capture runtime replay parameters (tokens, cookies, request metadata).
Strategy-aware token and doc_id refresh (HTTP-first in auto, controlled fallback to browser).
Expand same-author thread continuations, including media in child posts.
Export OpenSearch-friendly post documents.
Optionally persist local session state to JSON.

Installation

This project uses uv and pyproject.toml as the dependency source of truth.

uv sync
uv run scrapling install

If browser assets need to be reinstalled:

uv run scrapling install --force

Quick Start

from threadsapi import ThreadsScraper

async with ThreadsScraper(concurrent=3) as scraper:
    posts = await scraper.crawl_pages(
        pages=3,
        country="ID",
        include_full_thread=True,
    )

Configuration

Using `ScraperConfig`

from threadsapi import ScraperConfig, ThreadsScraper

config = ScraperConfig(
    concurrent=3,
    bootstrap_strategy="auto",
)

async with ThreadsScraper(config=config) as scraper:
    posts = await scraper.crawl_pages(pages=2)

From environment variables

Copy the example configuration and export or load it into your runtime environment:

cp .env.example .env

from threadsapi import ScraperConfig

config = ScraperConfig.from_env()

See .env.example for all supported environment variable names and default example values. ScraperConfig.from_env() reads variables already available in the process environment; load .env through your application runner or environment management tool when needed.

From YAML

Copy the example file before customizing local values:

cp threadsapi.example.yaml threadsapi.yaml

from threadsapi import ScraperConfig

config = ScraperConfig.from_yaml("threadsapi.yaml")

See threadsapi.example.yaml for a complete configuration example.

`ThreadsScraper` constructor options

ThreadsScraper(
    concurrent=3,
    bootstrap_strategy="auto",        # "auto" | "http" | "browser"
    auth_strategy="session",          # "session" | "direct" | "auto"; inferred as "direct" when login credentials are provided
    session_path="threads-session.json",
    persist_session=False,
    base_url="https://www.threads.com",
    graphql_url="https://www.threads.com/graphql/query",
    app_id=None,
    asbd_id=None,
    user_agent="...",
    timeout_seconds=30,
    browser_headless=True,
    login_username=None,
    login_password=None,
    login_two_factor_code=None,
)

You normally do not need to set mode. Without login credentials the scraper starts anonymous; with username/password it switches to authenticated direct login automatically.

app_id and asbd_id are public runtime header values observed from Threads web requests. They are optional: when omitted, the scraper adopts them from captured runtime request headers during bootstrap. These values are not required to be static — the library discovers them automatically.

Country Filtering

The country parameter accepts ISO 3166-1 alpha-2 codes (e.g. "ID", "US", "JP") — validated via pycountry. Pass "world" or None to disable country filtering.

posts = await scraper.crawl_pages(pages=2, country="US")

Invalid codes raise ValueError immediately. A valid ISO country may still return no public feed content from Threads; treat that as NO CONTENT, not as invalid input.

Bootstrap and Refresh Strategies

Bootstrap and token/doc_id refresh both respect the configured strategy:

auto (default): HTTP-first bootstrap and refresh, with browser fallback only when replay state is incomplete or HTTP fails.
http: HTTP-only — lighter, but will raise errors if HTTP bootstrap cannot provide complete replay state.
browser: always use browser capture — more resource-heavy but captures the richest replay state.

Refresh is triggered automatically on auth failures (401/403), expired doc_ids, or GraphQL execution errors.

Direct Web Login and Session Persistence

For this GraphQL scraper, “login to Threads” means capturing an authenticated Threads web session and reusing the resulting cookies/runtime replay state for threads.com/graphql/query.

Direct web login posts username/password to Threads web login endpoints. It uses safe password encryption when key material is discoverable from Threads page/JS bundles, handles two_factor_required, and persists only authenticated web cookies/tokens — never the password or 2FA code.

The mobile Instagram Bloks login endpoint returns a mobile bearer token (Bearer IGT:2:<token>). That token is for mobile private API requests and does not provide the web cookies, lsd, fb_dtsg, headers, doc IDs, or captured variables required by this GraphQL replay client.

from threadsapi import ThreadsScraper, TwoFactorRequired

scraper = ThreadsScraper(
    bootstrap_strategy="auto",
    session_path="threads-session.json",
    persist_session=True,
    login_username="your_username",
    login_password="your_password",
    login_two_factor_code=None,  # set to auto-submit or leave None
)

try:
    await scraper.init()
except TwoFactorRequired as exc:
    code = input("2FA code: ")
    await scraper.complete_two_factor(exc.challenge, code)

await scraper.close()

After the session is saved, reuse it normally. The persisted authenticated session is preferred automatically when it is still fresh:

async with ThreadsScraper(
    bootstrap_strategy="auto",
    session_path="threads-session.json",
    persist_session=True,
) as scraper:
    posts = await scraper.crawl_pages(pages=2, include_full_thread=True)

From the TUI:

uv sync --extra tui
uv run python scripts/tui.py

Open Config & Info, set the session path, enter username/password, then click Direct Login. If Threads requires 2FA, enter the code and click Verify 2FA. Use Cancel Login to stop an active login attempt.

To validate login, check Config & Info: authenticated sessions show Mode: authenticated and Auth cookies: yes. You can also use Account Search; search uses logged-in GraphQL variables, while public feed alone is not a login proof because it can work anonymously.

From environment or YAML, credentials are enough to enable direct authenticated login:

THREADSAPI_USERNAME=your_username
THREADSAPI_PASSWORD=your_password

login:
  username: your_username
  password: your_password

Limitations:

Password encryption depends on runtime key material in Threads web JS; raises PasswordEncryptionUnavailable when keys cannot be found.
Endpoint shape may change; this is experimental and may break without notice.
Direct login failures never fall back to anonymous scraping.

Error Model

ThreadsAPI uses typed exceptions for explicit handling:

ConfigError: invalid runtime configuration
BootstrapError: failed HTTP/browser bootstrap
AuthenticationError: direct web login failed
InvalidCredentialsError: username/password rejected
TwoFactorRequired: login requires 2FA (carries a challenge for complete_two_factor())
TwoFactorError: 2FA verification failed
PasswordEncryptionUnavailable: safe password encryption could not be performed
TransportError: terminal GraphQL transport failure
GraphQLDecodeError: non-JSON or invalid GraphQL response body
RateLimitError: retry budget exhausted for retryable rate-limit responses

Security Notes

Session JSON files contain sensitive cookies/tokens after login:

Do not commit session files to Git.
Keep them in a secure local/private environment.
Do not share token/cookie values in logs, issues, or public channels.

Development

Run example:

uv run python example.py

Run tests:

uv run python -m unittest discover -s tests -p "test_*.py"
uv run python -m py_compile threadsapi/*.py tests/test_*.py example.py scripts/test_countries.py

Test country availability quickly:

uv run python scripts/test_countries.py
uv run python scripts/test_countries.py ID US JP world --per-page 3

Project Structure

threadsapi/
├── __init__.py      # public exports
├── client.py        # ThreadsScraper orchestration
├── session.py       # token/session model + JSON persistence
├── registry.py      # doc_id registry/discovery
├── bootstrap.py     # HTTP/browser/session bootstrap
├── web_auth.py      # direct Threads web login + 2FA
├── transport.py     # GraphQL request lifecycle and retry handling
└── parsers.py       # parsing and OpenSearch document mapping

Troubleshooting

Issue	Resolution
`ModuleNotFoundError: No module named 'curl_cffi'`	Ensure `scrapling[fetchers]` is installed, then run `uv sync`
Missing browser executable	Run `uv run scrapling install`
Deprecated `Use StealthyFetcher.configure()` warning	Use `StealthyFetcher.configure(...)` and `StealthyFetcher.async_fetch(...)`

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
tests		tests
threadsapi		threadsapi
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
threadsapi.example.yaml		threadsapi.example.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ThreadsAPI

Key Capabilities

Installation

Quick Start

Configuration

Using `ScraperConfig`

From environment variables

From YAML

`ThreadsScraper` constructor options

Country Filtering

Bootstrap and Refresh Strategies

Direct Web Login and Session Persistence

Error Model

Security Notes

Development

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ThreadsAPI

Key Capabilities

Installation

Quick Start

Configuration

Using ScraperConfig

From environment variables

From YAML

ThreadsScraper constructor options

Country Filtering

Bootstrap and Refresh Strategies

Direct Web Login and Session Persistence

Error Model

Security Notes

Development

Project Structure

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `ScraperConfig`

`ThreadsScraper` constructor options

Packages