AgentOps Accelerator

Open-source framework and CLI for continuous evaluation, safety testing, and release readiness of Microsoft Foundry agents.
Can we ship it, and how do we know?

Overview

AgentOps Accelerator is an open-source framework and CLI that standardizes continuous evaluation, safety testing, and release readiness for enterprise AI agents on Microsoft Foundry. It connects Foundry Evaluations, ASSERT, the PyRIT-backed AI Red Teaming agent, Azure Monitor, and your CI/CD platform into one repeatable release loop, packaging every result into a stable evidence pack that proves the release is ready for production.

The output is a clear answer to the two questions reviewers actually ask: can we ship it, and how do we know?

Core outputs

Artifact	Produced by	Audience
`results.json`	`agentops eval run`	CI / automation
`report.md`	`agentops eval run`	PR reviewers
`.agentops/assert/latest.json`	`agentops assert run`	Evidence pack, CI gate
`.agentops/redteam/latest.json`	`agentops redteam run`	Evidence pack, CI gate
`evidence.json` / `evidence.md`	`agentops doctor --evidence-pack`	Release approver
Cockpit (localhost)	`agentops cockpit`	Engineer reviewing readiness

Exit-code contract

AgentOps commands exit with 0 when execution succeeded and every gate passed, with 2 when execution itself succeeded but a threshold, an ASSERT violation, a red-team attack-success rate, or a Doctor severity gate failed, and with 1 for runtime or configuration errors. Pipelines can rely on this contract without parsing output.

AgentOps and Microsoft Foundry

Foundry and AgentOps are designed to meet at the release boundary. Foundry is where teams create, deploy, run, observe, and investigate agents. AgentOps is the repo-side operating layer that turns those signals into a repeatable ship/no-ship workflow.

Moment	Foundry / Azure does	AgentOps adds
Build and version	Foundry portal, Foundry SDK/Toolkit, `microsoft-foundry` skill, azd	Pins the exact candidate in `agentops.yaml` and generates the PR/release gate around it
Evaluate and compare	Foundry Evaluations, `azd ai agent eval`, Rubric evaluator, and official CI actions/extensions	Keeps datasets and thresholds in the repo, records evidence, normalizes azd/Rubric outputs, and provides local/fallback runs for non-prompt targets
Probe safety	ASSERT framework, PyRIT-backed AI Red Teaming agent	Runs both as active CI steps via `agentops assert run` and `agentops redteam run`, normalizes verdicts, and gates the pipeline
Observe and investigate	Foundry Monitor, Traces, Azure Monitor, App Insights	Surfaces deep links, telemetry readiness, Doctor findings, and Cockpit navigation
Decide release	Branch protection, environments, approvals	Packages `evidence.json` / `evidence.md` for promotion review
Govern controls	ACS, Foundry Guardrails	References reviewed artifacts by path/hash/status without executing or applying the external controls
Improve from production	Production traces and Foundry datasets	Promotes reviewed trace learnings into regression candidates

The rhythm is simple: build and operate the agent in Foundry, keep the release contract in the repo, and let AgentOps connect the two into a clean review loop.

Quickstart

1) Install

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
python -m pip install --upgrade "agentops-accelerator[foundry] @ git+https://github.com/Azure/agentops.git@main"

This installs the current AgentOps source from GitHub. After the next package release, you can switch the install line back to agentops-accelerator[foundry] from PyPI.

2) Bootstrap

agentops init

This writes a single agentops.yaml at the project root and an AgentOps-managed workspace under .agentops/ for seed data, run history, and generated evidence. It is not a second .foundry/ project directory.

3) Configure your agent

Pick one of these forms for the agent: field - AgentOps classifies the target automatically:

agent: "my-rag:3"                          # Foundry prompt agent (name:version)
agent: "https://...services.ai.azure.com/.../agents/<id>"  # Foundry hosted endpoint
agent: "https://api.example.com/chat"      # any HTTP/JSON agent (ACA, AKS, custom)
agent: "model:gpt-4o"                       # raw Foundry model deployment

AgentOps supports both Foundry Prompt Agents and Hosted Agents as evaluation and readiness targets. Create and deploy them with Foundry tools, then reference the published candidate in agentops.yaml.

For the smoke dataset, create a Foundry prompt agent such as agentops-smoke and publish it with instructions that copy exact-answer requests verbatim:

If the user message starts with "Answer with exactly this sentence:",
copy only the sentence after that prefix. Do not add greetings,
markdown, citations, caveats, or explanations.

Evaluators come from dataset shape: context triggers RAG checks; tool_calls / tool_definitions trigger tool-use checks. Minimal config:

version: 1
agent: "agentops-smoke:2"  # Foundry saves the first published version as v2
dataset: .agentops/data/smoke.jsonl

4) Run

az login
$env:AZURE_AI_FOUNDRY_PROJECT_ENDPOINT = "https://<resource>.services.ai.azure.com/api/projects/<project>"
$env:AZURE_OPENAI_ENDPOINT = "https://<openai-resource>.openai.azure.com"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o-mini"
agentops eval analyze
agentops eval run
agentops doctor --evidence-pack

For Foundry targets, use either project_endpoint: in agentops.yaml or AZURE_AI_FOUNDRY_PROJECT_ENDPOINT. Config wins when both are set.

Outputs land in .agentops/results/latest/:

results.json - machine-readable (versioned, stable schema)
report.md - human-readable, PR-friendly

Release evidence lands in .agentops/release/latest/:

evidence.json - machine-readable production-readiness projection
evidence.md - PR/release summary

Capture the first successful run as a baseline:

New-Item -ItemType Directory -Force .agentops\baseline | Out-Null
Copy-Item .agentops\results\latest\results.json .agentops\baseline\results.json

To see a visible comparison, publish a new agent version with a prompt that paraphrases instead of copying exact-answer requests, update agentops.yaml to that new name:version, and compare against the baseline:

agentops eval run --baseline .agentops/baseline/results.json

The report grows a Comparison vs Baseline section with per-metric deltas.

Commands

Install optional extras as needed: [foundry] for eval runtime, [agent] for Doctor/Cockpit, and [mcp] for MCP.

agentops --version - show installed version.
agentops init - bootstrap config and seed data.
agentops eval analyze - check eval readiness.
agentops eval init - bootstrap an azd eval.yaml recipe and wire execution: azd.
agentops eval run [--baseline PATH] - run an evaluation.
agentops eval promote-traces --source FILE [--apply] - promote traces.
agentops report generate - regenerate report.md.
agentops workflow analyze - recommend CI/CD shape.
agentops workflow generate - generate CI/CD workflows.
agentops skills install - install Copilot or Claude skills.
agentops mcp serve - start the MCP server.
agentops doctor [--evidence-pack] - run readiness checks.
agentops cockpit - open the local Cockpit.
agentops agent serve - serve Doctor as a Copilot Extension.

AgentOps Cockpit

agentops cockpit opens a localhost command center for the current workspace. It combines eval history, Doctor findings, workflow status, and links to the matching Foundry and Azure Monitor views.

Cockpit sections, in display order:

Foundry connection - project, tenant, agent, App Insights.
Foundry launchpad - links for the agent, project, and telemetry.
Observability readiness - tracing, evals, red team, alerts.
AgentOps Doctor - latest Doctor findings.
Eval gate summary - local and CI gate history.
Quality gate summary - score trends and regressions.
Production signal - App Insights health snapshot.
CI/CD Pipelines - GitHub Actions status.
Next actions - contextual recommendations.

Documentation

Foundry Prompt Agent tutorial - use this when the Foundry target is agent: name:version. Walks the sandbox → dev journey with a PR gate.
Hosted or HTTP Agent tutorial - use this when the target is a Foundry hosted or HTTP endpoint URL. Same sandbox → dev journey for endpoint-based agents.
End-to-end tutorial - extends either of the above with the full sandbox → dev → qa → prod promotion, Foundry red-team scans, and trace-to-regression promotion.
Core concepts
How it works
Doctor explained
CI/CD with GitHub Actions
Built-in evaluator reference
Release process

Contributing

See CONTRIBUTING.md for architecture rules, testing, and contribution flow.

Name		Name	Last commit message	Last commit date
Latest commit History 603 Commits
.claude-plugin		.claude-plugin
.github		.github
.vscode		.vscode
docs		docs
examples/flat-quickstart		examples/flat-quickstart
infra/e2e		infra/e2e
media		media
plugins/agentops		plugins/agentops
scripts		scripts
src/agentops		src/agentops
tests		tests
tombstones/vscode		tombstones/vscode
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
icon.png		icon.png
launch.json		launch.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOps Accelerator

Overview

Core outputs

Exit-code contract

AgentOps and Microsoft Foundry

Quickstart

1) Install

2) Bootstrap

3) Configure your agent

4) Run

Commands

AgentOps Cockpit

Documentation

Contributing

About

Uh oh!

Releases 22

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentOps Accelerator

Overview

Core outputs

Exit-code contract

AgentOps and Microsoft Foundry

Quickstart

1) Install

2) Bootstrap

3) Configure your agent

4) Run

Commands

AgentOps Cockpit

Documentation

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 22

Uh oh!

Contributors

Uh oh!

Languages