SiftMark

SiftMark is a zero-runtime-dependency Python CLI that turns public web pages into clean Markdown, JSON, JSONL, and link maps for AI agents.

It is built for people using OpenClaw, Hermes, Codex, Claude Code, Cursor, or any shell-capable agent that needs compact, citation-friendly web context without starting a browser stack.

Why It Exists

AI agents often need compact web context that is easy to cite, inspect, and reuse. Full browser automation is useful for complex workflows, but many research tasks only need a polite public-page distiller that can run anywhere Python runs.

SiftMark focuses on that narrow job:

one URL to clean Markdown for LLM context
small same-domain crawls to a reusable research bundle
JSON-LD, headings, links, images, and metadata extracted together
a portable SKILL.md generator for agent workflows
no runtime dependencies, no API key, no browser required

SiftMark is not an anti-bot bypass tool. It is a polite public-web distiller that respects robots.txt by default.

Install

From a clone:

git clone https://github.com/xiaokillua/siftmark.git
cd siftmark
python3 -m pip install -e .

Directly from GitHub:

python3 -m pip install "git+https://github.com/xiaokillua/siftmark.git"

Check it:

siftmark version

Quick Start

Fetch one page as Markdown:

siftmark fetch https://example.com

Fetch one page as JSON:

siftmark fetch https://example.com --format json --output example.json

Create a small crawl bundle:

siftmark crawl https://example.com --depth 1 --max-pages 10 --output ./example-bundle

The bundle contains:

example-bundle/
  README.md
  index.json
  links.csv
  pages.jsonl
  pages/
    example.com.md
    example.com.json

Generate an agent skill:

siftmark skill --target openclaw --output ./skills/siftmark-web-research
siftmark skill --target hermes --output ./skills/siftmark-web-research
siftmark skill --target codex --output ./skills/siftmark-web-research

Open the local research console:

siftmark ui

Run the stdio MCP server for compatible agent clients:

siftmark mcp

Demo

The local console gives the project a quick visual workflow:

siftmark ui

Open the page, enter a public URL, then use Fetch for one-page Markdown/JSON or Crawl for a small same-domain research bundle.

For agent clients, add SiftMark as a stdio MCP server:

{
  "mcpServers": {
    "siftmark": {
      "command": "siftmark",
      "args": ["mcp"]
    }
  }
}

See docs/MCP.md for details.

Python API

from siftmark import CrawlOptions, FetchOptions, crawl, fetch_page, write_bundle

page = fetch_page("https://example.com")
print(page.markdown)

result = crawl(
    "https://example.com",
    CrawlOptions(max_pages=5, depth=1, fetch=FetchOptions(timeout=10)),
)
write_bundle(result, "example-bundle")

CLI Reference

siftmark fetch URL [--format markdown|json] [--output PATH]
siftmark crawl URL [--depth N] [--max-pages N] [--output DIR]
siftmark skill [--target generic|openclaw|hermes|codex|claude-code] [--output DIR]
siftmark ui [--host 127.0.0.1] [--port 8765] [--no-open]
siftmark mcp
siftmark version

Useful flags:

--ignore-robots: skip robots.txt checks when you have permission
--user-agent: set a custom crawler identity
--max-bytes: cap page size before parsing
--external: allow off-domain links during crawls
--delay: add crawl delay between pages
--insecure: disable TLS verification only when your local Python certificate store is broken and you trust the target

Responsible Use

Use SiftMark only for public pages you are allowed to access. Respect robots.txt, terms of service, copyright, privacy, rate limits, and local laws. For JavaScript-heavy, login-only, paywalled, or protected pages, use a browser automation tool with explicit permission instead.

Roadmap

MCP client configuration examples for Claude Desktop, Cursor, Codex, and OpenClaw-compatible tools
optional Playwright adapter for JavaScript-rendered pages
selector memory for repeat extraction jobs
output templates for research reports and dataset cards
packaged examples for OSINT, docs migration, competitive research, and RAG prep

Development

python3 -m pip install -e .
python3 -m unittest discover -s tests

Release notes and PyPI preparation live in docs/PUBLISHING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
skills/siftmark-web-research		skills/siftmark-web-research
src/siftmark		src/siftmark
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SiftMark

Why It Exists

Install

Quick Start

Demo

Python API

CLI Reference

Responsible Use

Roadmap

Recommended GitHub Topics

Development

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SiftMark

Why It Exists

Install

Quick Start

Demo

Python API

CLI Reference

Responsible Use

Roadmap

Recommended GitHub Topics

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages