MCP server for web search and content extraction — free, self-hosted, zero-tracking.
WIE is a Model Context Protocol (MCP) server that gives AI agents (Claude, Gemini, GPT-4, etc.) the ability to search the web and extract page content. It sits between your AI client and the internet, using a local SearXNG instance as its search engine — no external API keys required.
AI Agent (Claude, Cursor, Zed...)
│
▼
MCP Server (WIE) ← port 8000
├── web_search()
├── web_search_advanced()
├── site_search()
├── fetch_page()
├── get_contents()
└── answer()
│
▼
SearXNG (local) ← port 8080
├── google
├── duckduckgo
├── bing
├── wikipedia
└── startpage
│
▼
Internet
SearXNG runs locally in Docker. All queries are private — no third-party tracking, no API keys needed.
- Docker and Docker Compose
- Python 3.11+ (only for STDIO local mode)
- An AI client that supports the MCP protocol
git clone https://github.com/your-user/WIE_MCP.git
cd WIE_MCPcp .env.example .envEdit .env and change at least SEARXNG_SECRET:
SEARXNG_SECRET=replace-with-a-random-secure-stringdocker compose up -dThis starts two containers:
- wie-mcp-server — the MCP server, on port
8000 - wie-searxng — SearXNG, on port
8080
Wait ~30 seconds for SearXNG to fully initialize.
# Check containers are running
docker ps
# View MCP server logs
docker logs wie-mcp-server
# View SearXNG logs
docker logs wie-searxngUse when the server is running via docker compose up. The server is available at http://localhost:8000/mcp.
For Claude Desktop, Cursor, Windsurf, VS Code Cline, LM Studio:
{
"mcpServers": {
"wie": {
"url": "http://localhost:8000/mcp"
}
}
}Reference files:
configs/http-remote.json,configs/lm-studio.json
Use when you want to run the MCP server directly in your terminal (SearXNG still needs to be running).
Prerequisites: install Python dependencies:
python -m venv venv
.\venv\Scripts\activate # Windows
# or: source venv/bin/activate # Linux/Mac
pip install -r requirements.txtConfiguration:
{
"mcpServers": {
"wie": {
"command": "python",
"args": ["-m", "src.server", "stdio"]
}
}
}Reference files:
configs/claude-desktop.json,configs/cursor.json,configs/zed.json,configs/windsurf.json,configs/vscode-cline.json
| Client | Configuration file path |
|---|---|
| Claude Desktop (Mac) | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Claude Desktop (Windows) | %APPDATA%\Claude\claude_desktop_config.json |
| Cursor | Settings → MCP → Add new server |
| Zed | .zed/settings.json |
| Windsurf | Settings → MCP → Add new server |
| VS Code + Cline | .vscode/mcp.json |
| LM Studio | Settings → MCP Servers |
WIE exposes 6 MCP tools:
Multi-engine search with source authority tier scoring (tiers 1–4) and optional FlashRank reranking.
web_search(
query="Python 3.13 release notes",
time_range="month", # hour | day | week | month | year (optional)
categories="news", # general | news | images | videos | it | science (optional)
safesearch="0", # "0"=off | "1"=moderate | "2"=strict (optional)
limit=10, # 1–20, default: 10
)Search with query expansion, domain filters, date filters, category targeting, and variable depth modes.
web_search_advanced(
query="impact of LLMs on software development",
search_type="deep", # see table below
num_results=15, # default: 10
category="research_paper", # see categories below
include_domains=["arxiv.org"], # only these domains
exclude_domains=["reddit.com"], # ignore these domains
start_published_date="2024-01-01", # YYYY-MM-DD
end_published_date="2025-01-01",
include_text=["transformer"], # page must contain these words
exclude_text=["tutorial"], # page must not contain these words
safesearch=0, # 0 | 1 | 2
enable_highlights=True, # extract relevant passages
highlight_sentences=3, # sentences per passage (default: 3)
enable_summary=False, # extractive summary per result
additional_queries=True, # use query expansion for deep modes
)Search types (search_type):
| Type | Query variations | Reranking | Highlights | Use case |
|---|---|---|---|---|
instant |
1 | ❌ | ❌ | Ultra-fast, top 3 results |
fast |
1 | ❌ | ❌ | Quick, single-pass search |
auto |
1 | ✅ | ✅ | Default — best balance |
deep_lite |
3 | ✅ | ✅ | Moderate research |
deep |
5 | ✅ | ✅ | Thorough research |
deep_reasoning |
7 | ✅ | ✅ | Complex investigation |
Categories (category):
| Category | Prioritized domains |
|---|---|
general |
All engines |
news |
BBC, Reuters, AP News, NYT, The Guardian |
research_paper |
arXiv, Nature, IEEE, ACM, NeurIPS |
company |
LinkedIn, Bloomberg, Crunchbase |
people |
LinkedIn, GitHub, Google Scholar |
financial_report |
SEC EDGAR |
product |
Product Hunt, G2, Capterra |
personal_site |
Medium, Dev.to, Substack |
code |
GitHub, GitLab, Stack Overflow |
video |
YouTube, Vimeo, TED |
image |
Unsplash, Flickr, Pexels |
Issues a site:domain query search — useful for finding official documentation or domain-specific content.
site_search(
query="async io concurrency",
site="docs.python.org",
time_range="year", # optional
limit=5, # default: 5
)Extracts clean text content from a web page. Tries curl-cffi (anti-bot stealth) first, falls back to nodriver (headless browser), then httpx.
fetch_page(
url="https://docs.python.org/3/whatsnew/3.13.html",
max_tokens=8000, # optional, default: 8000
)Returns: title, description, headings, main content, tables, JSON-LD structured data, and a link summary.
Parallel fetch of up to 20 URLs (max 3 concurrent). Can extract highlights and summaries per page.
get_contents(
urls=[
"https://arxiv.org/abs/2401.04012",
"https://github.com/openai/gpt-2",
],
highlight_query="large language model training", # optional
highlight_sentences=3, # default: 3
enable_summary=False, # default: False
max_tokens=8000, # per URL, default: 8000
)Fetches the provided URLs, extracts the most relevant passages for the question, and returns an extractive answer.
answer(
query="What is the maximum context window for Claude 3.5?",
urls=["https://docs.anthropic.com/en/docs/about-claude/all-releases"],
)All search results are classified into 4 reliability tiers:
| Tier | Emoji | Description | Examples |
|---|---|---|---|
| Tier 1 | 🟢 | Official / Definitive | github.com, docs.python.org, .gov, .edu |
| Tier 2 | 🔵 | Authoritative | wikipedia.org, stackoverflow.com, arxiv.org |
| Tier 3 | 🟡 | Reference | medium.com, reuters.com, dev.to |
| Tier 4 | ⚪ | General | Generic blogs, Reddit, SEO content |
All variables are configured in the .env file:
| Variable | Default | Description |
|---|---|---|
SEARXNG_HOST |
http://searxng:8080 |
Internal SearXNG URL |
SEARXNG_ENGINES |
google,duckduckgo,bing,wikipedia,startpage |
Active engines (comma-separated) |
SEARXNG_DEFAULT_CATEGORY |
general |
Default category when not specified |
SEARXNG_SAFESEARCH |
0 |
Safe search level: 0, 1, or 2 |
SEARXNG_SECRET |
(required) | SearXNG secret key — change before use |
SEARCH_DEFAULT_TYPE |
auto |
Default search type: instant, fast, auto, deep_lite, deep, deep_reasoning |
SEARCH_DEFAULT_LIMIT |
10 |
Default result limit (1–20) |
SEARCH_TIMEOUT_SECONDS |
10 |
Search timeout in seconds |
FETCH_TIMEOUT_SECONDS |
15 |
Page fetch timeout in seconds |
FETCH_MAX_CONTENT_LENGTH |
10000 |
Max characters extracted per page |
FETCH_TOKEN_BUDGET |
8000 |
Token budget per page |
MCP_SERVER_HOST |
0.0.0.0 |
Host address for the MCP server |
MCP_SERVER_PORT |
8000 |
MCP server port |
API_KEY |
(empty) | Optional API key to restrict server access |
WIE_MCP/
├── src/
│ ├── server.py # MCP server — registers all 6 tools
│ ├── config.py # Configuration via Pydantic Settings + .env
│ ├── constants.py # Domain tiers, search types, categories
│ ├── models.py # Pydantic schemas (request/response)
│ ├── errors.py # Typed error classes
│ ├── searxng_client.py # HTTP client for SearXNG
│ ├── tools/
│ │ ├── web_search.py # web_search tool
│ │ ├── web_search_advanced.py # web_search_advanced tool
│ │ ├── fetch_page.py # fetch_page tool
│ │ ├── get_contents.py # get_contents tool
│ │ ├── site_search.py # site_search tool
│ │ └── answer.py # answer tool
│ └── utils/
│ ├── dedup.py # Result deduplication and scoring
│ ├── highlights.py # Relevant passage extraction
│ ├── summarizer.py # Extractive summarization
│ ├── text.py # Sentence splitter
│ ├── query_expander.py # Query expansion for deep modes
│ ├── readability.py # Readable content extraction
│ └── truncation.py # Token-based truncation
├── configs/
│ ├── claude-desktop.json # Claude Desktop config (STDIO)
│ ├── cursor.json # Cursor config (STDIO)
│ ├── zed.json # Zed config (STDIO)
│ ├── windsurf.json # Windsurf config (STDIO)
│ ├── vscode-cline.json # VS Code + Cline config (STDIO)
│ ├── http-remote.json # HTTP config (Docker)
│ └── lm-studio.json # LM Studio config (HTTP)
├── searxng/
│ └── settings.yml # SearXNG configuration
├── docker-compose.yml # Starts wie-mcp-server + wie-searxng
├── Dockerfile # MCP server container image
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── pytest.ini # Test configuration
| Package | Purpose |
|---|---|
mcp |
MCP protocol (FastMCP) |
httpx |
Async HTTP requests |
pydantic + pydantic-settings |
Validation and configuration |
beautifulsoup4 + readability-lxml |
HTML parsing and content extraction |
curl-cffi |
HTTP client with bot-detection bypass |
nodriver |
Headless browser fallback |
flashrank |
Local relevance reranking (optional) |
uvicorn + starlette |
HTTP server for Streamable HTTP mode |
# Start everything
docker compose up -d
# Stream logs
docker compose logs -f
# Stop everything
docker compose down
# Rebuild after code changes
docker compose up -d --build
# Run tests
python -m pytest tests/ -v
# Run server locally (STDIO mode)
python -m src.server stdio
# Run server locally (HTTP mode)
python -m src.server http- Check containers are running:
docker ps - Wait ~30s after
docker compose up -dfor SearXNG to fully initialize - Check logs:
docker logs wie-searxng
- SearXNG may have misconfigured engines
- Check
searxng/settings.ymland ensure the engines are enabled
- Change the port in
docker-compose.yml:ports: - "8001:8000" # uses port 8001 on the host
- Update the client config URL to
http://localhost:8001/mcp
- Replace
localhostwith the IP of the machine running Docker:{ "url": "http://192.168.1.100:8000/mcp" }
GNU Affero General Public License v3 (AGPLv3) — LICENSE
Copyright © 2025–2026 Jonathan Lima