Skip to content

Pull requests: strands-agents/evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

feat: map type labels to native issue type area-community Repo health, governance, contributor process, release process, and CI dependency bumps enhancement New feature or request
#287 opened Jun 26, 2026 by yonib05 Member Loading…
fix(ci): sha-pin third-party GitHub Actions area-community Repo health, governance, contributor process, release process, and CI dependency bumps chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
#285 opened Jun 25, 2026 by max-rattray-aws Loading…
feat: add P1 model-output corruption effects area-chaos Chaos/fault injection: experiments, recovery strategy, partial completion, failure communication enhancement New feature or request
#284 opened Jun 24, 2026 by venkatkrish543re Loading…
1 of 9 tasks
ci: update langfuse requirement from <4,>=2.0.0 to >=2.0.0,<5 area-community Repo health, governance, contributor process, release process, and CI dependency bumps chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#283 opened Jun 24, 2026 by dependabot Bot Loading…
ci: update mypy requirement from <2.0.0,>=1.15.0 to >=1.15.0,<3.0.0 area-community Repo health, governance, contributor process, release process, and CI dependency bumps chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#282 opened Jun 24, 2026 by dependabot Bot Loading…
chore(detectors): added async detectors execution (WIP) area-detectors Failure detection and root cause analysis of agent sessions chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
#277 opened Jun 17, 2026 by poshinchen Contributor Loading…
7 of 9 tasks
feat(experiment): add verbosity-aware stack traces to evaluation error reasons area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-devx Developer experience: papercuts, confusing public APIs, error messages, ergonomics, usability enhancement New feature or request
#268 opened Jun 15, 2026 by AndyMc629 Loading…
9 tasks done
Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#229 opened May 14, 2026 by venkatkrish543re Draft
feat: Add EvaluationPlugin for agent invocation evaluation and retry area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#166 opened Mar 18, 2026 by afarntrog Contributor Loading…
5 of 7 tasks
feat: add OTel test semantic convention attributes to Experiment spans area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-tracing Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL enhancement New feature or request
#131 opened Feb 10, 2026 by anirudha Draft
feat: Optional Case specific Goal for GoalSuccessRateEvaluator area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#75 opened Dec 17, 2025 by dbermuehler Draft
7 tasks
feat: add ContextualFaithfulnessEvaluator area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#64 opened Dec 7, 2025 by stefanoamorelli Loading…
7 tasks done
Mapper for parsing langfuse traces to standard format area-tracing Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL enhancement New feature or request
#49 opened Nov 25, 2025 by deepakdalakoti Collaborator Loading…
6 tasks done
ProTip! Mix and match filters to narrow down what you’re looking for.