Skip to content

AgentCyberRange/WebExploitBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebExploitBench

Overview

WebExploitBench is a benchmark for evaluating real-world web exploitation in isolated targets. The complete dataset contains 15 self-contained web applications with 110 documented vulnerabilities. Each application includes the target environment, exploits, reports, and verifiers. WebExploitBench contains 0-day, 1-day, and synthetic vulnerabilities, covering common web vulnerability classes such as SQL injection, SSTI, authorization bypass, etc.

Important

This GitHub repository releases only a subset of WebExploitBench. The complete dataset is available on Hugging Face: https://huggingface.co/datasets/AgentCyberRange/WebExploitBench

Dataset

WebExploitBench is distributed in two layers:

  • GitHub (this repository) — the tooling (scripts/targetctl, scripts/fetch), this README, and a few targets you can run right after cloning (comfyui, prestashop).
  • Hugging Face — the full set of 15 targets.

Prerequisites: Docker Engine with Docker Compose v2, Git LFS, and huggingface_hub (for scripts/fetch; pip install -U huggingface_hub provides the hf CLI).

Pull the full dataset in place from the repository root:

scripts/fetch

scripts/fetch downloads the remaining target directories on top of this checkout. It only adds data — the repository's own README.md, LICENSE, and scripts/ are preserved — and it is resumable and safe to re-run. The dataset may be gated, so run hf auth login first if the download is refused.

To pull just one target instead of the full set, call the Hugging Face CLI directly with an --include filter (e.g. siyucms):

hf download AgentCyberRange/WebExploitBench --repo-type dataset \
  --local-dir . --include 'siyucms/*'

Local Target Management

You can stand up targets locally with the bundled tools, independent of any evaluation harness.

List the available targets:

scripts/targetctl list

Prepare all images:

scripts/targetctl build all

Start a target for manual testing:

scripts/targetctl up comfyui

The command publishes the application service on localhost and prints the accessible URL:

Accessible URLs for comfyui:
http://127.0.0.1:<port>

Inspect active targetctl-managed instances:

scripts/targetctl ps

And finally shut down the target:

scripts/targetctl down comfyui

Evaluating with CAGE

This dataset is the target set for the WebExploitBench benchmark in CAGE. CAGE initializes it as a git submodule at examples/agent_pentest_bench/datasets/web_exploit_bench and reads the sample index examples/agent_pentest_bench/datasets/agent_pentest_bench.json, so targets become available to the benchmark as you fetch them. CAGE drives AI coding agents against these targets and handles prompt levels, scoring, proxy tracing, and the run inspector. See the CAGE README and the examples/agent_pentest_bench/ guide for the full installation and evaluation workflow.

License

Original content in this repository is licensed under the Apache License 2.0. Third-party source code, container images, and dependencies remain subject to their respective licenses.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors