Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
-
Updated
Jul 16, 2025 - Python
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
Exploring and improving the quality of ChatGPT-generated code for LeetCode programming tasks.
Repository-level automated code repair agent using SWE-Bench dataset
基于 AI Agent 服务自动化修复系统:Agent 自动读取错误日志,定位 Bug,生成补丁,运行测试,提交 PR,并通知开发者 Review。AI-powered auto-fix agent for web services: analyzes logs, patches code, runs tests, and creates pull requests automatically.
Trusted autonomy T&E runtime that links mission needs, hazards, scenarios, telemetry, evidence, verification reports, and hash-chained ledgers so AI/autonomous decisions can be reviewed instead of merely trusted.
A reliability layer for AI-built systems: detect failures (tests or runtime drift), reproduce, repair one ticket at a time behind an approval gate, and prove the fix. The safety boundaries most AI agents skip.
AI proposes. Humans decide. Source-available AI engineering OS/control plane for governed code change: policy gates, model routing, PR/CI evidence binding, replayable evidence bundles, chained receipts, multi-repo traceability, and human review.
Production-grade autonomous AI platform using LangGraph, Docker sandboxing, GitHub APIs, and multi-agent orchestration to generate, validate, and iteratively repair code patches from GitHub issues.
An autonomous SWE-agent: give it a repo + a bug, it locates the code, edits it, runs the tests, iterates, and returns a verified git diff. Works on a throwaway copy. Bounded, inspectable, fully test-mockable. Includes a seeded-bug fix-rate benchmark.
Autonomous GitHub issue → validated patch pipeline: LangGraph agents research, plan, and generate search-replace patches, validated in a Docker sandbox (pytest · ruff · mypy), with structured retry feedback and auto PR creation.
Broken code indentation repair tool with automatic language detection
Autonomous AI Code Repair Agent. Finds crashes, compiles code, and fixes bugs in real-time for Python, Rust, Go, & C++.
Library-aware TypeScript error recovery for LLM-generated code. Deterministic VS Code Quick Fix + single-file LLM mend. 98.6% on a real-world bench at <$0.005 per fix.
Benchmark pipeline for evaluating file-level localization in repository-level LLM repair on SWE-bench Verified tasks.
Evidence-grounded AI agent for Java code repair with LLM-guided patches and human-in-the-loop review.
Fixed-scope Codex automation services: repo cleanup, tests/CI repair, CSV/data workflows, PR review, and small AI prototypes.
Add a description, image, and links to the code-repair topic page so that developers can more easily learn about it.
To associate your repository with the code-repair topic, visit your repo's landing page and select "manage topics."