WIE — Web Investigator Engine

MCP server for web search and content extraction — free, self-hosted, zero-tracking.

WIE is a Model Context Protocol (MCP) server that gives AI agents (Claude, Gemini, GPT-4, etc.) the ability to search the web and extract page content. It sits between your AI client and the internet, using a local SearXNG instance as its search engine — no external API keys required.

🇧🇷 Versão em Português

How it works

AI Agent (Claude, Cursor, Zed...)
        │
        ▼
   MCP Server (WIE)             ← port 8000
   ├── web_search()
   ├── web_search_advanced()
   ├── site_search()
   ├── fetch_page()
   ├── get_contents()
   └── answer()
        │
        ▼
   SearXNG (local)              ← port 8080
   ├── google
   ├── duckduckgo
   ├── bing
   ├── wikipedia
   └── startpage
        │
        ▼
      Internet

SearXNG runs locally in Docker. All queries are private — no third-party tracking, no API keys needed.

Requirements

Docker and Docker Compose
Python 3.11+ (only for STDIO local mode)
An AI client that supports the MCP protocol

Quick start

1. Clone the repository

git clone https://github.com/your-user/WIE_MCP.git
cd WIE_MCP

2. Configure environment

cp .env.example .env

Edit .env and change at least SEARXNG_SECRET:

SEARXNG_SECRET=replace-with-a-random-secure-string

3. Start the services

docker compose up -d

This starts two containers:

wie-mcp-server — the MCP server, on port 8000
wie-searxng — SearXNG, on port 8080

Wait ~30 seconds for SearXNG to fully initialize.

4. Verify

# Check containers are running
docker ps

# View MCP server logs
docker logs wie-mcp-server

# View SearXNG logs
docker logs wie-searxng

MCP client configuration

HTTP mode (recommended with Docker)

Use when the server is running via docker compose up. The server is available at http://localhost:8000/mcp.

For Claude Desktop, Cursor, Windsurf, VS Code Cline, LM Studio:

{
  "mcpServers": {
    "wie": {
      "url": "http://localhost:8000/mcp"
    }
  }
}

Reference files: configs/http-remote.json, configs/lm-studio.json

STDIO mode (local, no Docker for MCP)

Use when you want to run the MCP server directly in your terminal (SearXNG still needs to be running).

Prerequisites: install Python dependencies:

python -m venv venv
.\venv\Scripts\activate      # Windows
# or: source venv/bin/activate  # Linux/Mac
pip install -r requirements.txt

Configuration:

{
  "mcpServers": {
    "wie": {
      "command": "python",
      "args": ["-m", "src.server", "stdio"]
    }
  }
}

Reference files: configs/claude-desktop.json, configs/cursor.json, configs/zed.json, configs/windsurf.json, configs/vscode-cline.json

Where to place the config

Client	Configuration file path
Claude Desktop (Mac)	`~/Library/Application Support/Claude/claude_desktop_config.json`
Claude Desktop (Windows)	`%APPDATA%\Claude\claude_desktop_config.json`
Cursor	Settings → MCP → Add new server
Zed	`.zed/settings.json`
Windsurf	Settings → MCP → Add new server
VS Code + Cline	`.vscode/mcp.json`
LM Studio	Settings → MCP Servers

Available tools

WIE exposes 6 MCP tools:

`web_search` — general web search

Multi-engine search with source authority tier scoring (tiers 1–4) and optional FlashRank reranking.

web_search(
    query="Python 3.13 release notes",
    time_range="month",   # hour | day | week | month | year (optional)
    categories="news",    # general | news | images | videos | it | science (optional)
    safesearch="0",       # "0"=off | "1"=moderate | "2"=strict (optional)
    limit=10,             # 1–20, default: 10
)

`web_search_advanced` — advanced search with filters

Search with query expansion, domain filters, date filters, category targeting, and variable depth modes.

web_search_advanced(
    query="impact of LLMs on software development",
    search_type="deep",               # see table below
    num_results=15,                   # default: 10
    category="research_paper",        # see categories below
    include_domains=["arxiv.org"],    # only these domains
    exclude_domains=["reddit.com"],   # ignore these domains
    start_published_date="2024-01-01",  # YYYY-MM-DD
    end_published_date="2025-01-01",
    include_text=["transformer"],     # page must contain these words
    exclude_text=["tutorial"],        # page must not contain these words
    safesearch=0,                     # 0 | 1 | 2
    enable_highlights=True,           # extract relevant passages
    highlight_sentences=3,            # sentences per passage (default: 3)
    enable_summary=False,             # extractive summary per result
    additional_queries=True,          # use query expansion for deep modes
)

Search types (search_type):

Type	Query variations	Reranking	Highlights	Use case
`instant`	1	❌	❌	Ultra-fast, top 3 results
`fast`	1	❌	❌	Quick, single-pass search
`auto`	1	✅	✅	Default — best balance
`deep_lite`	3	✅	✅	Moderate research
`deep`	5	✅	✅	Thorough research
`deep_reasoning`	7	✅	✅	Complex investigation

Categories (category):

Category	Prioritized domains
`general`	All engines
`news`	BBC, Reuters, AP News, NYT, The Guardian
`research_paper`	arXiv, Nature, IEEE, ACM, NeurIPS
`company`	LinkedIn, Bloomberg, Crunchbase
`people`	LinkedIn, GitHub, Google Scholar
`financial_report`	SEC EDGAR
`product`	Product Hunt, G2, Capterra
`personal_site`	Medium, Dev.to, Substack
`code`	GitHub, GitLab, Stack Overflow
`video`	YouTube, Vimeo, TED
`image`	Unsplash, Flickr, Pexels

`site_search` — search within a specific domain

Issues a site:domain query search — useful for finding official documentation or domain-specific content.

site_search(
    query="async io concurrency",
    site="docs.python.org",
    time_range="year",   # optional
    limit=5,             # default: 5
)

`fetch_page` — extract content from a URL

Extracts clean text content from a web page. Tries curl-cffi (anti-bot stealth) first, falls back to nodriver (headless browser), then httpx.

fetch_page(
    url="https://docs.python.org/3/whatsnew/3.13.html",
    max_tokens=8000,   # optional, default: 8000
)

Returns: title, description, headings, main content, tables, JSON-LD structured data, and a link summary.

`get_contents` — fetch content from multiple URLs

Parallel fetch of up to 20 URLs (max 3 concurrent). Can extract highlights and summaries per page.

get_contents(
    urls=[
        "https://arxiv.org/abs/2401.04012",
        "https://github.com/openai/gpt-2",
    ],
    highlight_query="large language model training",   # optional
    highlight_sentences=3,                              # default: 3
    enable_summary=False,                               # default: False
    max_tokens=8000,                                    # per URL, default: 8000
)

`answer` — direct answer from URLs

Fetches the provided URLs, extracts the most relevant passages for the question, and returns an extractive answer.

answer(
    query="What is the maximum context window for Claude 3.5?",
    urls=["https://docs.anthropic.com/en/docs/about-claude/all-releases"],
)

Source authority tiers

All search results are classified into 4 reliability tiers:

Tier	Emoji	Description	Examples
Tier 1	🟢	Official / Definitive	`github.com`, `docs.python.org`, `.gov`, `.edu`
Tier 2	🔵	Authoritative	`wikipedia.org`, `stackoverflow.com`, `arxiv.org`
Tier 3	🟡	Reference	`medium.com`, `reuters.com`, `dev.to`
Tier 4	⚪	General	Generic blogs, Reddit, SEO content

Environment variables

All variables are configured in the .env file:

Variable	Default	Description
`SEARXNG_HOST`	`http://searxng:8080`	Internal SearXNG URL
`SEARXNG_ENGINES`	`google,duckduckgo,bing,wikipedia,startpage`	Active engines (comma-separated)
`SEARXNG_DEFAULT_CATEGORY`	`general`	Default category when not specified
`SEARXNG_SAFESEARCH`	`0`	Safe search level: `0`, `1`, or `2`
`SEARXNG_SECRET`	(required)	SearXNG secret key — change before use
`SEARCH_DEFAULT_TYPE`	`auto`	Default search type: `instant`, `fast`, `auto`, `deep_lite`, `deep`, `deep_reasoning`
`SEARCH_DEFAULT_LIMIT`	`10`	Default result limit (1–20)
`SEARCH_TIMEOUT_SECONDS`	`10`	Search timeout in seconds
`FETCH_TIMEOUT_SECONDS`	`15`	Page fetch timeout in seconds
`FETCH_MAX_CONTENT_LENGTH`	`10000`	Max characters extracted per page
`FETCH_TOKEN_BUDGET`	`8000`	Token budget per page
`MCP_SERVER_HOST`	`0.0.0.0`	Host address for the MCP server
`MCP_SERVER_PORT`	`8000`	MCP server port
`API_KEY`	(empty)	Optional API key to restrict server access

Project structure

WIE_MCP/
├── src/
│   ├── server.py              # MCP server — registers all 6 tools
│   ├── config.py              # Configuration via Pydantic Settings + .env
│   ├── constants.py           # Domain tiers, search types, categories
│   ├── models.py              # Pydantic schemas (request/response)
│   ├── errors.py              # Typed error classes
│   ├── searxng_client.py      # HTTP client for SearXNG
│   ├── tools/
│   │   ├── web_search.py          # web_search tool
│   │   ├── web_search_advanced.py # web_search_advanced tool
│   │   ├── fetch_page.py          # fetch_page tool
│   │   ├── get_contents.py        # get_contents tool
│   │   ├── site_search.py         # site_search tool
│   │   └── answer.py              # answer tool
│   └── utils/
│       ├── dedup.py               # Result deduplication and scoring
│       ├── highlights.py          # Relevant passage extraction
│       ├── summarizer.py          # Extractive summarization
│       ├── text.py                # Sentence splitter
│       ├── query_expander.py      # Query expansion for deep modes
│       ├── readability.py         # Readable content extraction
│       └── truncation.py          # Token-based truncation
├── configs/
│   ├── claude-desktop.json    # Claude Desktop config (STDIO)
│   ├── cursor.json            # Cursor config (STDIO)
│   ├── zed.json               # Zed config (STDIO)
│   ├── windsurf.json          # Windsurf config (STDIO)
│   ├── vscode-cline.json      # VS Code + Cline config (STDIO)
│   ├── http-remote.json       # HTTP config (Docker)
│   └── lm-studio.json         # LM Studio config (HTTP)
├── searxng/
│   └── settings.yml           # SearXNG configuration
├── docker-compose.yml         # Starts wie-mcp-server + wie-searxng
├── Dockerfile                 # MCP server container image
├── requirements.txt           # Python dependencies
├── .env.example               # Environment variables template
└── pytest.ini                 # Test configuration

Key dependencies

Package	Purpose
`mcp`	MCP protocol (FastMCP)
`httpx`	Async HTTP requests
`pydantic` + `pydantic-settings`	Validation and configuration
`beautifulsoup4` + `readability-lxml`	HTML parsing and content extraction
`curl-cffi`	HTTP client with bot-detection bypass
`nodriver`	Headless browser fallback
`flashrank`	Local relevance reranking (optional)
`uvicorn` + `starlette`	HTTP server for Streamable HTTP mode

Useful commands

# Start everything
docker compose up -d

# Stream logs
docker compose logs -f

# Stop everything
docker compose down

# Rebuild after code changes
docker compose up -d --build

# Run tests
python -m pytest tests/ -v

# Run server locally (STDIO mode)
python -m src.server stdio

# Run server locally (HTTP mode)
python -m src.server http

Troubleshooting

"Connection refused" or "Cannot reach SearXNG"

Check containers are running: docker ps
Wait ~30s after docker compose up -d for SearXNG to fully initialize
Check logs: docker logs wie-searxng

No search results

SearXNG may have misconfigured engines
Check searxng/settings.yml and ensure the engines are enabled

Port already in use

Change the port in docker-compose.yml:

ports:
  - "8001:8000"   # uses port 8001 on the host

Update the client config URL to http://localhost:8001/mcp

Accessing from another machine

Replace localhost with the IP of the machine running Docker:
```
{ "url": "http://192.168.1.100:8000/mcp" }
```

License

GNU Affero General Public License v3 (AGPLv3) — LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
configs		configs
searxng		searxng
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README.pt-br.md		README.pt-br.md
docker-compose.yml		docker-compose.yml
mcp_config.json		mcp_config.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

WIE — Web Investigator Engine

How it works

Requirements

Quick start

1. Clone the repository

2. Configure environment

3. Start the services

4. Verify

MCP client configuration

HTTP mode (recommended with Docker)

STDIO mode (local, no Docker for MCP)

Where to place the config

Available tools

web_search — general web search

web_search_advanced — advanced search with filters

site_search — search within a specific domain

fetch_page — extract content from a URL

get_contents — fetch content from multiple URLs

answer — direct answer from URLs

Source authority tiers

Environment variables

Project structure

Key dependencies

Useful commands

Troubleshooting

"Connection refused" or "Cannot reach SearXNG"

No search results

Port already in use

Accessing from another machine

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`web_search` — general web search

`web_search_advanced` — advanced search with filters

`site_search` — search within a specific domain

`fetch_page` — extract content from a URL

`get_contents` — fetch content from multiple URLs

`answer` — direct answer from URLs

Packages