-
Notifications
You must be signed in to change notification settings - Fork 41
Issues
is:issue state:open
is:issue state:open
Issue creation is restricted in this repository
Search results
[FEATURE] CLI improvements
area-cliCLI commands (run, report, validate, diagnose) and console displayCLI commands (run, report, validate, diagnose) and console displayenhancementNew feature or requestNew feature or requestpythonPull requests that update python codePull requests that update python codetriageNeeds triage by maintainersNeeds triage by maintainersStatus: Open.#262 In strands-agents/evals;[FEATURE] Skill evaluation: with vs without skill batch-run, aggregation, comparison, analysis
area-evaluatorsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsarea-simulationConversation simulation: actor simulator, tool simulator, profiles, multi-turn interactionsConversation simulation: actor simulator, tool simulator, profiles, multi-turn interactionsenhancementNew feature or requestNew feature or requestStatus: Open.#247 In strands-agents/evals;[FEATURE] Strands-evals CLI
area-cliCLI commands (run, report, validate, diagnose) and console displayCLI commands (run, report, validate, diagnose) and console displayenhancementNew feature or requestNew feature or requestStatus: Open.#242 In strands-agents/evals;[FEATURE] Paired N-run experiments with batch aggregation and comparative analysis
area-coreCore eval framework: Case, Experiment, task handler, evaluation data storesCore eval framework: Case, Experiment, task handler, evaluation data storesarea-evaluatorsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsenhancementNew feature or requestNew feature or requestStatus: Open.#228 In strands-agents/evals;[FEATURE] [P0] Pipeline refinements
area-redteamRed teaming: adversarial generation, attack strategies, attack success evaluationRed teaming: adversarial generation, attack strategies, attack success evaluationenhancementNew feature or requestNew feature or requestStatus: Open.#226 In strands-agents/evals;[FEATURE] [P0] Attack strategies
area-redteamRed teaming: adversarial generation, attack strategies, attack success evaluationRed teaming: adversarial generation, attack strategies, attack success evaluationenhancementNew feature or requestNew feature or requestStatus: Open.#225 In strands-agents/evals;[FEATURE][P1] Red-Teaming: Tool-graph case generation, deterministic verification, adaptive red-teaming agent
area-generatorsAutomated experiment generation and topic planningAutomated experiment generation and topic planningarea-redteamRed teaming: adversarial generation, attack strategies, attack success evaluationRed teaming: adversarial generation, attack strategies, attack success evaluationenhancementNew feature or requestNew feature or requestStatus: Open.#222 In strands-agents/evals;[FEATURE][P0.5] Red-Teaming: Agent-topology attacks & skill-based red-teaming
area-redteamRed teaming: adversarial generation, attack strategies, attack success evaluationRed teaming: adversarial generation, attack strategies, attack success evaluationenhancementNew feature or requestNew feature or requestStatus: Open.#221 In strands-agents/evals;[FEATURE] Wrap / Deploy Strands-evals evaluators to Agentcore Code-base Evaluators
area-evaluatorsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsenhancementNew feature or requestNew feature or requestStatus: Open.#204 In strands-agents/evals;feat(experiment): ExperimentTrendAnalyzer — cross-run overall_score regression detection
area-coreCore eval framework: Case, Experiment, task handler, evaluation data storesCore eval framework: Case, Experiment, task handler, evaluation data storesenhancementNew feature or requestNew feature or requestStatus: Open.#186 In strands-agents/evals;[FEATURE] Built-in red teaming support
area-redteamRed teaming: adversarial generation, attack strategies, attack success evaluationRed teaming: adversarial generation, attack strategies, attack success evaluationenhancementNew feature or requestNew feature or requestStatus: Open.#177 In strands-agents/evals;[Experiment] Evals - Coding agent evaluation and iteration
area-evaluatorsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsEvaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metricsenhancementNew feature or requestNew feature or requestStatus: Open.#152 In strands-agents/evals;