diff --git a/aws-transform/POWER.md b/aws-transform/POWER.md index 5f3f0a16..9d53e1d4 100644 --- a/aws-transform/POWER.md +++ b/aws-transform/POWER.md @@ -2,9 +2,9 @@ name: "aws-transform" displayName: "AWS Transform" description: "Migrate, modernize, and upgrade codebases: .NET Framework to .NET 8/10, mainframe COBOL to Java, VMware VMs to EC2, SQL Server/Oracle/MySQL to Aurora, and Java/Python/Node.js version upgrades or AWS SDK migrations. Assess, plan, and execute code transformations from your IDE." -keywords: ["migrate", "modernize", "mainframe", "cobol", "vmware", "dotnet", ".net framework", "windows", "sql server", "oracle", "mysql", "aurora", "ec2 migration", "rehost", "lift-and-shift", "replatform", "legacy", "code upgrade", "sdk migration", "boto3", "java upgrade", "atx"] +keywords: ["migrate", "modernize", "mainframe", "cobol", "vmware", "dotnet", ".net framework", "windows", "sql server", "oracle", "mysql", "aurora", "ec2 migration", "rehost", "lift-and-shift", "replatform", "legacy", "code upgrade", "sdk migration", "boto3", "java upgrade", "atx", "continuous modernization", "AWS Transform - continuous modernization"] author: "AWS" -version: "2.0.0" +version: "2.1.0" --- # AWS Transform Power @@ -14,6 +14,7 @@ version: "2.0.0" Follow these steps IN ORDER. Do NOT skip ahead. Authentication is handled just-in-time — only when a chosen action actually needs it. Do NOT probe auth before the user has declared an intent. ``` +Step 0: Routing Gate → Identify workload + route (REQUIRED — see Workload Routing Gate) Step 1: Resume → Check .atx/context.json Step 2: Intent → Ask user what they want to do Step 3: Discovery → Scan workspace + query available agents @@ -25,6 +26,8 @@ Step 8: Tasks → Generate tasks.md Step 9: Execute → Run transforms, monitor, review diffs ``` +**Step 0 is the entry point for every request.** Read and apply the **Workload Routing Gate** section below (Steps A–D) before doing anything else. Step 1 (Resume) runs silently in parallel as bookkeeping, but the gate's classification — workload type, continuous modernization vs. workload-specific path — must be settled before Step 3 Discovery starts. VMware, SQL, and mainframe requests NEVER fall through to continuous modernization regardless of phrasing; .NET asks the three-way intent question before continuing. + **Discovery finds opportunities. Assessment produces detailed findings. Requirements come from the assessment — NOT from discovery.** **You CANNOT create requirements without an assessment report.** @@ -41,12 +44,80 @@ Step 9: Execute → Run transforms, monitor, review diffs - Never show options as text bullets — use AskUserQuestion - Never mix workflow descriptions with actual questions in the same numbered list, and never use count language like "two questions" when some items are informational steps rather than questions. Keep what-I-will-do separate from what-I-need-from-you. - Never modify code, upgrade dependencies, or run analysis manually — always use AWS Transform tooling +- Never probe `--help` to figure out a CLI invocation that the steering files already document. The capability-specific files under `steering/` (e.g. `workload-continuous-modernization-source.md`, `workload-continuous-modernization-analysis.md`, `workload-continuous-modernization-remediation.md`, custom transformation references) contain the canonical `atx ct …` and `atx custom …` commands with every required flag and example invocations — read the matching file and lift the command verbatim. The orchestrating files (`workload-continuous-modernization-guide.md`, `workload-continuous-modernization-setup.md`) explicitly point at them ("Use the `/source` skill for the exact commands"). `--help` is a fallback used ONLY when (a) no steering file covers the capability, or (b) a documented command demonstrably fails because the installed CLI version diverges from steering. Treat `--help` probes the user can see as a signal that the agent didn't read its own steering — that is the failure mode this rule prevents. - Never expose internal mechanics to the user. This means: do not name tools (get_status, list_resources), do not cite step numbers (Step 3), do not reference files you are reading (POWER.md, steering files, context.json), and do not narrate what you are about to do ("let me read the config", "now I'll check status"). Just do it silently and present the outcome in user terms. - Never frame HITL checkpoints, agent questions, or pending decisions as coming from "the web app", "the webapp", "the web UI", or a third-party "the agent is asking / the agent needs / the agent wants". The user is working with you in the IDE — you own the interaction. Present every checkpoint as your own first-person request, not a relayed message from elsewhere. **Wrong:** "The web app is asking how you want to deploy the landing zone." / "The agent is now asking about the replication subnet configuration." **Right:** "The next step is to choose how to deploy the landing zone." / "I need the replication subnet configuration to continue." - Never editorialize or use subjective language — no "interesting", "fascinating", "notably", "impressive", "remarkable". State findings as facts. Let users form their own opinions. - Never overclaim freshness. Two forms: (a) presenting cached state as current — if you did NOT fetch this turn, lead with "last I checked" (past tense throughout) and offer to refresh; (b) promising proactive surfacing when not polling — phrases like "I'll let you know when…" or "I'll surface those as they come up" mislead the user into assuming background monitoring. Say explicitly you don't watch in the background. See `steering/workflow.md` → Freshness & Source of Truth. - Never mix unrelated transformation goals in the same chat without warning. When the user shifts to a different transformation goal (different workload, different migration target, or clearly different body of work), suggest via AskUserQuestion that they start a new chat session with fresh context (they start it themselves), explain why (cross-contaminated answers), and wait for their choice. If the user declines, proceed to answer their question about the other job — do not refuse or redirect back to the original goal. Just avoid mixing cached state (e.g., don't apply VMware findings to the .NET question). See `steering/workflow.md` → Freshness & Source of Truth. - Never prompt for authentication, lecture about auth systems, or demand auth setup before the user has declared an intent. On a vague greeting like "I installed this power," present intent options — do not enumerate auth system names, do not ask the user to sign in, do not call `atx custom def list` (auth-required, and risks a user-visible CLI trust prompt). `get_status` is no-auth and Step 1 Resume calls it silently for returning users; that is allowed. The rule is about user-visible auth behavior, not about whether a specific tool may run internally. Auth prompts come from the tool a chosen action needs, framed around that action. +- Never quote specific pricing (dollar amounts, hourly rates, daily costs) or timing estimates (minutes, hours, ETAs) for AWS resources or analyses. Pricing depends on the customer's usage and AWS quotas. For pricing questions, redirect to https://aws.amazon.com/ec2/pricing/ and https://aws.amazon.com/transform/pricing/. + +--- + +## Step 0: Workload Routing Gate (apply BEFORE Step 1) + +**STOP. Before reading files, scanning workspaces, calling tools, or starting any workflow step, identify the workload first, then route.** This gate is **Step 0** in the MANDATORY SEQUENCE — it precedes Step 1 Resume. Run it the moment the user's intent becomes clear (immediately after Step 2 Intent if the user is fresh, or as soon as the resume message reveals their target if they're returning). Workload identification ALWAYS wins over keyword matching — do not let "analyze", "assess", "tech debt", or "security" phrasing override the rules below. + +> **Sequencing note.** Step 1 Resume's silent context refresh and Step 2 Intent's AskUserQuestion may run before Step 0's classification is final, because the gate often needs the user's first message to identify the workload. The contract is: by the time Step 3 Discovery starts, Step 0 MUST be settled. Never advance to Discovery, Scope, or any tool call that depends on workload type until the gate has produced a route. + +### Step A: Identify the workload + +Look for an explicit workload signal in the user's request — a named technology (`.NET`, `VMware`, `SQL Server`/`Aurora`/`Oracle`/`MySQL`, `mainframe`/`COBOL`), workload-specific terminology (Hyper-V, EC2 rehost, stored procs, CICS, JCL), or file/project signals already in the conversation. If no signal is present, treat the request as **workload-unspecified**. + +### Step B: Apply workload-specific routing + +Workload-specific rules ALWAYS win over the keyword list in Step C. Do not let "analysis" or "tech debt" phrasing override these. + +| Workload | Route | +| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **.NET** | AskUserQuestion: "For your .NET work, are you looking to **modernize to .NET 8/10** (port the code, change targets), **run an assessment for modernization** (scope the work, identify blockers, plan the port), or **analyze your repos for tech debt, security vulnerabilities, or CVEs**?" → "Modernize" or "Assessment for modernization" → continue with the standard MANDATORY SEQUENCE using `steering/workload-dotnet*.md`. → "Analyze for tech debt / security / CVEs" → route to continuous modernization (Step D). | +| **VMware** | Continue the standard MANDATORY SEQUENCE with `steering/workload-vmware*.md`. **NEVER route VMware requests to continuous modernization** — even when the user uses words like "analyze", "assess", or "find issues". VMware assessment is handled by the VMware workload agent. | +| **SQL / Database** | Continue with `steering/workload-sql*.md`. **NEVER route SQL/database requests to continuous modernization** — SQL Server, Oracle, MySQL, and Aurora migrations are handled by the SQL workload agent. | +| **Mainframe / COBOL** | Continue with `steering/workload-mainframe*.md`. **NEVER route mainframe requests to continuous modernization** — COBOL/CICS/JCL transformations are handled by the mainframe workload agent. | +| **Workload-unspecified** | Continue to Step C. | + +### Step C: Keyword-based routing (workload-unspecified only) + +This list applies **only** when the user has not named a workload. If any of the workload rules in Step B matched, you have already routed — do not re-evaluate against this list. + +If the user's workload-unspecified request matches any of these intents, route to **continuous modernization** — do NOT scan files yourself: + +- "analysis", "analyze", "find issues", "what's wrong", "tech debt", "security vulnerabilities", "CVEs" +- "what should I fix", "where do I start" +- "report", "dashboard", "compare", "trend" +- "remediate findings", "remediation", "fix what you found" +- "custom transform" +- "continuous modernization", "AWS Transform - continuous modernization" + +For deeper Custom-vs-continuous modernization nuance (named transformations, prior findings, mixed scope, adoption nudge), see `steering/workload-continuous-modernization-routing.md`. Do NOT use that file's keyword lists to override Step B — VMware, SQL, and mainframe never reach continuous modernization regardless of phrasing. + +### Step D: For continuous modernization requests + +Once routed to continuous modernization: + +1. **First-response telemetry notice (once per session).** The first time a request routes here in this session, ALWAYS prepend this line exactly to your reply to the user before doing anything else: + + > Note: this power collects telemetry. To opt out, see [here](https://docs.aws.amazon.com/transform/latest/userguide/transform-usage-telemetry.html). + + Show it exactly once per session. Do NOT repeat it on subsequent continuous modernization requests in the same session. If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session (see each `workload-continuous-modernization-*.md` file's Telemetry section for the flag format). +2. When invoking AWS Transform - continuous modernization (continuous modernization) commands, use `atx ct` (with a space). `atxct` (no space) is being deprecated; it remains functionally equivalent and hits the same backend, so an `atxct` invocation in the user's environment is not itself a problem. Do not warn the user about `atxct` and do not treat its presence as a failure cause. +3. Check if `atx ct` is installed AND up to date — run as a single command: + + ``` + INSTALLED=$(atx ct --version 2>/dev/null | head -1); LATEST=$(curl -fsSL "https://transform-cli.awsstatic.com/index.json" 2>/dev/null | grep -o '"latest"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*"latest"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/'); echo "Installed: ${INSTALLED:-not found}, Latest: ${LATEST:-unknown}" + ``` +4. If `INSTALLED` is empty or `LATEST` is newer: `curl -fsSL https://transform-cli.awsstatic.com/install.sh | bash && source ~/.bashrc` +5. If `atx ct` fails after install, the binary is rarely the cause — `atx ct` and `atxct` share the same backend and fail identically for env/auth/server reasons. Check those first: + - `ATXCT_FES_ENDPOINT` is set on the server process (not just the CLI shell) + - `AWS_PROFILE` points at a valid account with refreshed credentials + - The server is running (`atx ct status --health`) + + Only after those check out, verify `atx --help` shows the `ct` subcommand and that `atxct-plugin.mjs` is co-located with the `atx` binary. +6. Start the server using the [continuous-modernization-server.md](steering/workload-continuous-modernization-server.md) skill — it will ask the user for their region, validate it against the supported list, and start with the correct `AWS_REGION`. Wait 5s, then verify with `atx ct status --health`. +7. Then use the appropriate continuous modernization steering file — see `steering/workload-continuous-modernization-routing.md` and the `workload-continuous-modernization-*.md` files referenced from it. + +**When in doubt for a workload-unspecified request → continuous modernization.** This default applies ONLY after Step B has cleared — VMware, SQL, and mainframe never fall through to continuous modernization regardless of how the question is phrased; .NET only routes to continuous modernization after the user picks "analyze for tech debt / security / CVEs" in Step B's intent question (both "Modernize" and "Assessment for modernization" stay in the .NET workload). Once routed, do NOT manually read source files to find issues — that's what `atx ct analysis run` does. --- @@ -73,10 +144,19 @@ Check for `.atx/context.json` (workspace-relative). NEVER read `~/.aws/atx/kiro- ## Step 2: Intent +**If Step 0 routed the request to continuous modernization, skip this entire step.** continuous modernization has its own self-contained onboarding flow — hand off directly to `steering/workload-continuous-modernization-guide.md`. Its first prompt (Mode selection: Local vs. AWS Infrastructure) is the user's first visible question. Do NOT show the generic intent menu first, and do NOT mix in non-continuous modernization options like "Browse My Jobs" or "Start a Specific Transform" — those are AWS Transform top-level capabilities, not continuous modernization features. + +For every other route — VMware, SQL, Mainframe, and .NET (modernize or assessment-for-modernization) — use the generic intent menu below. The menu's options (Discover Workspace, Browse Jobs, Start a Specific Transform, Scan for Issues) are how those workloads enter the standard MANDATORY SEQUENCE's Discovery → Scope → Assessment phases. + +### Generic intent menu + AskUserQuestion: "What would you like to focus on?" The first user-visible action in this step is the AskUserQuestion — no auth-probing tool calls precede it, no auth lecture precedes it. (Step 1's silent job-refresh calls are not auth probes; they are a status check for a known prior session and do not surface to the user.) -With projects: [Discover This Workspace] [Browse My Jobs] [Start a Specific Transform] -No projects: [Browse My Jobs] [Open a Project Folder] [Start from Scratch] +With projects: [Discover This Workspace] [Browse My Jobs] [Start a Specific Transform] [Scan for Issues] +No projects: [Browse My Jobs] [Open a Project Folder] [Start from Scratch] [Scan for Issues] + + +**Routing.** Once the user picks an intent from this generic menu, re-run the **Workload Routing Gate** (Step 0) before doing anything else. That gate identifies the workload first, applies workload-specific rules (VMware/SQL/mainframe never continuous modernization; .NET asks modernize vs. assessment-for-modernization vs. analyze-for-tech-debt), and only then falls back to keyword-based continuous modernization routing for still-unspecified requests. Deeper Custom-vs-continuous modernization nuance (prior findings, mixed scope) lives in `steering/workload-continuous-modernization-routing.md` — but its keyword lists do NOT override the gate. **Just-in-time auth.** Once the user picks an intent, the next tool that action needs may require auth. If so, prompt for auth then, framed around the action the user just chose ("to browse your jobs, sign in to AWS Transform"). Which auth each MCP tool needs is reported by the MCP server — read it from the tool's description, `get_status`, or the error the tool returns. CLI transforms use AWS credentials only — do NOT prompt for sign-in for CLI-only intents, even when sign-in is unconfigured. If the user picks something that needs no service call (e.g., "Open a Project Folder"), do not probe auth. diff --git a/aws-transform/steering/AWSTransformInfrastructureExecutorAccessEC2.json b/aws-transform/steering/AWSTransformInfrastructureExecutorAccessEC2.json new file mode 100644 index 00000000..3a741db7 --- /dev/null +++ b/aws-transform/steering/AWSTransformInfrastructureExecutorAccessEC2.json @@ -0,0 +1,149 @@ +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "CFNRead", + "Effect": "Allow", + "Action": ["cloudformation:DescribeStacks", "cloudformation:DescribeStackEvents", "cloudformation:DescribeStackResources", "cloudformation:DescribeStackDriftDetectionStatus"], + "Resource": "arn:aws:cloudformation:*:*:stack/atx-*/*" + }, + { + "Sid": "CFNValidateTemplate", + "Effect": "Allow", + "Action": "cloudformation:ValidateTemplate", + "Resource": "*" + }, + { + "Sid": "EC2Desc", + "Effect": "Allow", + "Action": ["ec2:DescribeInstances", "ec2:DescribeImages", "ec2:DescribeVpcs", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DescribeKeyPairs", "ec2:DescribeRouteTables", "ec2:DescribeNatGateways", "ec2:DescribeInternetGateways"], + "Resource": "*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "EC2PowerState", + "Effect": "Allow", + "Action": ["ec2:StartInstances", "ec2:StopInstances"], + "Resource": "arn:aws:ec2:*:*:instance/*", + "Condition": {"StringEquals": {"ec2:ResourceTag/atx-remote-infra": "true"}} + }, + { + "Sid": "SSMRead", + "Effect": "Allow", + "Action": ["ssm:GetCommandInvocation", "ssm:ListCommands", "ssm:ListCommandInvocations", "ssm:DescribeInstanceInformation", "ssm:DescribeSessions"], + "Resource": "*", + "Condition": { + "StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"} + } + }, + { + "Sid": "SSMTgt", + "Effect": "Allow", + "Action": ["ssm:SendCommand", "ssm:StartSession"], + "Resource": "arn:aws:ec2:*:*:instance/*", + "Condition": {"StringEquals": {"ssm:resourceTag/atx-remote-infra": "true"}} + }, + { + "Sid": "SSMDocs", + "Effect": "Allow", + "Action": "ssm:SendCommand", + "Resource": "arn:aws:ssm:*::document/AWS-RunShellScript" + }, + { + "Sid": "S3Data", + "Effect": "Allow", + "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"], + "Resource": ["arn:aws:s3:::atx-source-code-*", "arn:aws:s3:::atx-source-code-*/*", "arn:aws:s3:::atx-ct-output-*", "arn:aws:s3:::atx-ct-output-*/*"], + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "KMSEncryptDecrypt", + "Effect": "Allow", + "Action": ["kms:Encrypt", "kms:Decrypt", "kms:GenerateDataKey"], + "Resource": "arn:aws:kms:*:*:key/*", + "Condition": { + "StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}, + "ForAnyValue:StringEquals": {"kms:ResourceAliases": "alias/atx-encryption-key"} + } + }, + { + "Sid": "SM", + "Effect": "Allow", + "Action": ["secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret"], + "Resource": "arn:aws:secretsmanager:*:*:secret:atx/*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "SMList", + "Effect": "Allow", + "Action": "secretsmanager:ListSecrets", + "Resource": "*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "SchedLifecycle", + "Effect": "Allow", + "Action": ["scheduler:CreateSchedule", "scheduler:DeleteSchedule", "scheduler:GetSchedule", "scheduler:UpdateSchedule"], + "Resource": "arn:aws:scheduler:*:*:schedule/atx-control-tower/*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "SchedGroupRead", + "Effect": "Allow", + "Action": "scheduler:GetScheduleGroup", + "Resource": "arn:aws:scheduler:*:*:schedule-group/atx-control-tower", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "SchedList", + "Effect": "Allow", + "Action": ["scheduler:ListSchedules", "scheduler:ListScheduleGroups"], + "Resource": "*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "IAMPassEC2InstanceRole", + "Effect": "Allow", + "Action": "iam:PassRole", + "Resource": "arn:aws:iam::*:role/atx-transform-role*", + "Condition": { + "StringEquals": { + "iam:PassedToService": "ec2.amazonaws.com", + "aws:ResourceAccount": "${aws:PrincipalAccount}" + } + } + }, + { + "Sid": "IAMPassSchedulerRole", + "Effect": "Allow", + "Action": "iam:PassRole", + "Resource": "arn:aws:iam::*:role/AtxSchedulerInvocationRole", + "Condition": { + "StringEquals": { + "iam:PassedToService": "scheduler.amazonaws.com", + "aws:ResourceAccount": "${aws:PrincipalAccount}" + } + } + }, + { + "Sid": "IAMReadRoles", + "Effect": "Allow", + "Action": ["iam:GetRole", "iam:ListAttachedRolePolicies", "iam:ListRolePolicies", "iam:GetRolePolicy"], + "Resource": "arn:aws:iam::*:role/*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "IAMReadInstanceProfile", + "Effect": "Allow", + "Action": "iam:GetInstanceProfile", + "Resource": "arn:aws:iam::*:instance-profile/*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${aws:PrincipalAccount}"}} + }, + { + "Sid": "STS", + "Effect": "Allow", + "Action": "sts:GetCallerIdentity", + "Resource": "*" + } + ] +} diff --git a/aws-transform/steering/workload-continuous-modernization-analysis.md b/aws-transform/steering/workload-continuous-modernization-analysis.md new file mode 100644 index 00000000..9b217669 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-analysis.md @@ -0,0 +1,237 @@ +--- +name: analysis +description: Run/start/restart/cancel/delete analyses (tech-debt-quick, tech-debt-comprehensive, security, agentic-readiness, modernization-readiness, custom). Detects vulnerabilities, outdated dependencies, migration opportunities, modernization candidates. Custom type runs any TD. +--- + +name: analysis + +# Analysis + +## Telemetry + +When running `atx ct analysis run` or `atx ct remediation create`, always include `--telemetry`. + +Format: `--telemetry "agent=,executionMode="` +- `agent` -- the AI assistant driving this session (lowercase, no spaces). Use the real assistant name -- e.g. kiro, claude, amazonq, copilot. +- `executionMode` -- `local` + +If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session. + +## Choose Compute (Before Running) + +**Explicit intent overrides repo count.** If the user's prompt contains words like "remotely", "on AWS", "on EC2", "on Fargate", "in the cloud", or "remote execution", route to the corresponding execution skill regardless of how many repos are in scope: + +- Mentions EC2 / "on an instance" → follow [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md) +- Mentions Batch / Fargate / "serverless" → follow [continuous-modernization-batch-execution](workload-continuous-modernization-batch-execution.md) +- Mentions "remotely" / "on AWS" / "in the cloud" (no specific compute) → ask which: EC2 or Batch (Fargate) + +**Otherwise**, for analyses with more than 9 repos, ask the customer: + +> "Do you want to run this locally, set up an EC2 instance in your AWS account, or submit to AWS Batch (Fargate)?" + +- **Local** -- proceed with the commands below +- **EC2** -- follow [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md) +- **Batch** -- follow [continuous-modernization-batch-execution](workload-continuous-modernization-batch-execution.md) + +## Commands + +```bash +# Run analysis (returns immediately with analysis ID) +atx ct analysis run --type --source [--repo ::] --telemetry "agent=,executionMode=local" + +# Run and wait for completion +atx ct analysis run --type --source [--repo ::] --wait --telemetry "agent=,executionMode=local" + +# Run custom analysis with a specific transformation definition +atx ct analysis run --type custom --transformation-name --source --repo :: --wait --telemetry "agent=,executionMode=local" + +# Run custom analysis with configuration (file://, JSON, or key=value) +atx ct analysis run --type custom --transformation-name -g "additionalPlanContext=Focus on auth module" --source --repo :: --wait --telemetry "agent=,executionMode=local" + +# Get details (JSON for parsing) +atx ct analysis get --id --json + +# List all +atx ct analysis list --json + +# Filter on the server-side index (fast). Combine as needed. +atx ct analysis list --status --json +atx ct analysis list --type --json +atx ct analysis list --status complete --type security --json + +# Category is filtered client-side (does not reduce the fetch); only narrows what's printed. +atx ct analysis list --category "Tech Debt" --json + +# Cancel or delete +atx ct analysis cancel --id +atx ct analysis delete --id [--cascade-findings] +``` + +## Security analysis prerequisite + +If the analysis type is `security`, `agentic-readiness`, or `modernization-readiness` (these all use the security-agent under the hood), the agent space resource must be provisioned in the customer's account before the analysis can run. The first run in any new account triggers `securityagent:CreateAgentSpace`, which requires admin credentials. After that one-time bootstrap, every subsequent run finds the existing agent space via `list-agent-spaces` and never needs `CreateAgentSpace` again. + +**The agent MUST run this check before submitting a security/agentic-readiness/modernization-readiness analysis:** + +```bash +AGENT_SPACE_ID=$(atx ct setup security-agent --status 2>/dev/null | jq -r '.agentSpaceId // ""') + +if [ -z "$AGENT_SPACE_ID" ]; then + # First-time bootstrap required: agent space not yet provisioned in this account. + echo "agent space not yet provisioned" +fi +``` + +**If `agentSpaceId` is empty**, the agent MUST stop and emit an admin handoff. The customer needs to run the analysis once with admin credentials (creates the agent space, populates `agentSpaceId` in the local config), then re-run with whatever credentials they normally use. + +**Profile-name guidance for the agent.** When emitting the bootstrap handoff command, the agent MUST use the placeholder `` rather than guessing a profile name from the customer's local AWS config, environment variables, or shell history. Substituting a wrong name leads to confusing AccessDenied errors. + +Suggested phrasing: + +> "Before I can run a `` analysis, the agent space resource needs to be provisioned in your account. This is a one-time bootstrap that requires admin credentials -- only the first security/agentic-readiness/modernization-readiness analysis ever in this account needs this. Run this with your admin profile: +> +> ```bash +> AWS_PROFILE= atx ct analysis run \ +> --type \ +> --source \ +> --repo "::" \ +> --telemetry "agent=,executionMode=local" +> ``` +> +> Pick any one repo from your source -- the bootstrap doesn't depend on which one. After it completes, your local `~/.atxct/shared/security_agent_config.json` will have `agentSpaceId` populated. Re-run your original request and the analysis will proceed without admin creds." + +The agent then STOPS this turn. On the next user turn, re-check `--status`; if `agentSpaceId` is now populated, proceed with the original analysis request. + +**If `agentSpaceId` is populated** (already bootstrapped), proceed with the analysis normally using whatever credentials the customer's local profile has. + +## Custom Analysis + +The `custom` type runs any transformation definition (TD) against a repository. Unlike other analysis types, custom analysis does not generate findings -- it executes the TD directly. + +**Required flags for `--type custom`:** + +- `--transformation-name ` -- Name of the TD in the registry + +**Optional flags:** + +- `-g, --configuration ` -- Configuration passed directly to the TD. Accepts: + - Key-value: `"additionalPlanContext=Upgrade to Java 17,buildCommand=mvn clean test"` + - JSON: `'{"additionalPlanContext":"Upgrade to Java 17"}'` + - File path: `"file:///path/to/config.json"` + +**Constraints:** + +- `--transformation-name` is only valid with `--type custom` +- `-g` is only valid with `--type custom` +- Custom analysis will not generate findings + +## TD Discovery and Recommendation + +When the user asks to run a custom analysis or mentions a capability not covered by built-in types (e.g., "generate sequence diagrams", "check code quality", "run compliance scan"), use TD discovery to find the right transformation: + +### Workflow + +1. **List available TDs:** Run `atx custom def list` to fetch all available transformation definitions (both AWS-managed and customer-owned custom TDs). +2. **Match intent to TD:** Based on the user's description, match their intent against TD names and descriptions. +3. **Recommend and confirm:** Present the matched TD(s) to the user with a brief description. Wait for confirmation before executing. +4. **Execute:** Run `atx ct analysis run --type custom --transformation-name --source --repo --wait` + +### When to use TD discovery vs built-in types + +- If the user's request clearly maps to a built-in type (`tech-debt-quick`, `tech-debt-comprehensive`, `security`, `agentic-readiness`, `modernization-readiness`), use that type directly -- do NOT use custom. +- If the request mentions a specific capability not covered by built-in types, or asks about custom/customer-owned TDs, use TD discovery. +- If the user explicitly names a TD, skip discovery and run it directly with `--type custom --transformation-name `. + +## Repo slug rules + +When passing `--repo` to `analysis run`: + +- **Qualified slug** (`::`): always works, doesn't need `--source`. +- **Bare repo name** (``): only works if `--source ` is also supplied. +- **Bare `--repo` without `--source`**: hard error (`Unqualified repo slug(s)`). Don't generate this combination. +- **Multiple repos**: must all share the same source. A run that mixes repos from different sources is rejected with `repos span multiple sources`. + +Prefer qualified slugs so the source is unambiguous. + +## Status Values + +When polling with `atx ct analysis get --id --json`, the `status` field is **lowercase**: + +- `running` -- in progress +- `complete` -- finished (check findings) +- `cancelled` -- user cancelled +- `failed` -- error occurred + +**Note:** It's `complete`, NOT `COMPLETED` or `completed`. + +## Artifacts + +After an analysis completes, its report artifacts can be listed and retrieved: + +```bash +# List all artifacts for an analysis +atx ct analysis list-artifacts --id --json + +# Get content of a specific artifact +atx ct analysis get-artifact --id --repo :: --name +``` + +### Artifact names by analysis type + +| Analysis Type | Artifact Names | +| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| tech-debt-comprehensive | `report`, `technical-debt-report/summary`, `technical-debt-report/outdated-components`, `technical-debt-report/maintenance-burden`, `technical-debt-report/remediation-plan` | +| agentic-readiness | `ara` (per repo); `_portfolio_ara` (portfolio-level) | +| modernization-readiness | `mod` (per repo); `_portfolio_mod` (portfolio-level) | + +## After Analysis Completes + +Once an analysis finishes, retrieve its findings by analysis ID and summarize for the user: + +```bash +# Get findings produced by a specific analysis +atx ct findings list --analysis-id --json + +# List artifacts to see available reports +atx ct analysis list-artifacts --id --json + +# Read a specific report +atx ct analysis get-artifact --id --repo :: --name report +``` + +## When an analysis returns 0 findings + +A `0 findings` result does NOT automatically mean the repo is clean. Each analysis type has its own scope. Do NOT report "clean" without running the right follow-up. + +| Type | What 0 findings means | What to do next | +| ------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `tech-debt-quick` | Metadata files didn't expose any stale versions. **Inconclusive** -- quick scan only inspects manifests. | Tell the user the result is inconclusive (metadata-only). Offer to run `tech-debt-comprehensive` for a code-level analysis. | +| `tech-debt-comprehensive` | Bedrock did not surface tech-debt issues. Repo is likely well-maintained, but other dimensions weren't checked. | Offer `security` for CVEs, `agentic-readiness` for AI-readiness, and `modernization-readiness` for modernization opportunities. Mention these are separate scans. | +| `security` | Security Agent didn't surface CVEs or vulnerable patterns. | Verify the Security Agent is healthy (`atx ct setup security-agent --status`). If healthy, offer `tech-debt-comprehensive` for non-security issues. | +| `agentic-readiness` | Repo did not show AI-readiness gaps at the framework level. | Offer `modernization-readiness` for cloud/infrastructure modernization or `tech-debt-comprehensive` for general code health. | +| `modernization-readiness` | Repo did not show modernization opportunities (infrastructure, application, data, security, operations dimensions). | Offer `agentic-readiness` for AI-integration scope or `tech-debt-comprehensive` for general code health. | + +### Sanity check before reporting "clean" + +If an analysis returns 0 findings on a repo that's obviously stale (Java 8, Node 14, Python 2, .NET Framework, an old `pom.xml` from 4+ years ago), do NOT report the repo as clean. Treat it as a signal that the analysis type was wrong for the question and offer a follow-up. + +## Listing analyses + +`atx ct analysis list` exposes three filters. Pick the narrowest combination the question allows. + +| Filter | Where it runs | Allowed values | +| ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | +| `--status` | server-side (GSI-backed, fast) | `pending`, `running`, `complete`, `cancelled`, `failed` | +| `--type` | server-side (GSI-backed, fast) | `tech-debt-quick`, `tech-debt-comprehensive`, `security`, `agentic-readiness`, `modernization-readiness`, `custom` | +| `--category` | client-side (does not reduce the fetch) | `"Tech Debt"`, `"Security"`, `"Agentic Readiness"` | + +**Recommended shapes:** + +- "What completed analyses do we have?" → `atx ct analysis list --status complete --json` +- "What security analyses ran?" → `atx ct analysis list --type security --json` +- "Find completed security runs" → `atx ct analysis list --status complete --type security --json` +- One specific run → `atx ct analysis get --id --json` (point lookup; cheaper than list). + +`--category` is a client-side grouping; e.g. `"Tech Debt"` matches both `tech-debt-quick` and `tech-debt-comprehensive`. Use it when the user wants both subtypes together. + +`--status` and `--type` accept only the canonical values above. Off-canonical input (e.g. `--status completed`, `--type tech-debt`) returns an `INVALID_INPUT` error. diff --git a/aws-transform/steering/workload-continuous-modernization-batch-execution.md b/aws-transform/steering/workload-continuous-modernization-batch-execution.md new file mode 100644 index 00000000..f2d5099a --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-batch-execution.md @@ -0,0 +1,1043 @@ +--- +name: batch-execution +description: Run continuous modernization analysis on AWS Batch (Fargate) using one container per submission. Each container runs `atx ct analysis run` (or `remediation create`) on the customer's logical source, then uploads artifacts via the upload script baked into the image at `/app/upload-ct-artifacts.sh`. +--- + +# continuous modernization Batch/Fargate Execution + +## ⚠️ MANDATORY: Permission Consent (MUST be first interaction with customer) + +**CRITICAL: The VERY FIRST thing the agent says after the customer chooses Batch/Fargate is the consent message below. Do NOT ask ANY questions (source, analysis type, region, etc.) before showing this message and getting confirmation. No exceptions.** + +"To run the analysis on Batch/Fargate, these resources are created in your account: AWS Transform API access, CloudFormation stacks, Batch compute environments and job queues, a Batch job definition, Lambda functions for job management, S3 buckets for source code and results, a KMS key for encryption, IAM roles for task execution, CloudWatch logs and dashboard, secrets for source credentials, and security agent resources for vulnerability scanning. You will need admin permissions to deploy. After that, executor-only permissions are needed (see `$HOME/.aws/atx/custom/remote-infra/AWSTransformInfrastructureExecutorAccessBatch.json`). Note: AWS Transform does NOT create VPCs, subnets, or NAT gateways — you provide those." + +If the customer says **yes** → proceed with the rest of the workflow. +If the customer says **no** → respond with: "You may encounter permission errors during the setup process. We'll continue, but some steps may fail if permissions are missing." Then proceed with the workflow. + +## Telemetry + +When running `atx ct analysis run` or `atx ct remediation create`, always include `--telemetry`. + +Format: `--telemetry "agent=,executionMode="` + +- `agent` — the AI assistant driving this session (lowercase, no spaces). Use the real assistant name — e.g. kiro, claude, amazonq, copilot. +- `executionMode` — `fargate` + +If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session. + +Run continuous modernization analysis or remediation on AWS Batch (Fargate) with **one container per submission** (default). Each container starts an `atx ct server`, sets up the source's local config + credentials, and runs analysis or remediation across all repos in the source. For analysis on any provider, and for remediation on `local`-provider sources, artifacts are uploaded to S3. For remediation on `github` / `gitlab` sources, the backend pushes to a result branch — no S3 upload needed. Multiple parallel containers per batch are supported via the submission patterns (B/C) when the customer wants multiple analysis types on one source or multiple sources in one batch. + +## When to Use + +- Analyzing or remediating one or more sources at scale on AWS-managed compute (no EC2 to provision) +- Running multiple analysis types in parallel on the same source +- Running analysis/remediation across multiple sources in parallel (e.g., team-A's github + team-B's gitlab) +- Running per-repo parallel analysis (one container per repo via `--repo`) when the customer wants maximum parallelism or per-repo failure isolation +- Customer already has the Custom CDK stack deployed (reuses infrastructure) +- Source contains many repos (atx ct parallelizes up to 8 inside a single container; use per-repo Pattern D for more parallelism) + +## Architecture + +``` +Customer's local machine + ↓ atx ct source add (registers source with the backend) + ↓ "Run analysis on Batch" + ↓ aws lambda invoke atx-trigger-batch-jobs (one job per submission) +AWS Batch (Fargate) -- single container per submission + └── Container (public.ecr.aws/d9h8z6l7/aws-transform:latest) + ├── JOB_COMMAND (analysis): + │ - Install / upgrade atx ct CLI + │ - Start atx ct server + │ - github / gitlab: place token in ~/.atxct/sources//_token + │ - local: pull repo bundle from S3, discovery scan with --path override + │ - atx ct analysis run --type --source --wait + │ └─ CT server clones each repo as needed (github / gitlab) and performs analysis + │ - curl + /app/upload-ct-artifacts.sh -- zips each repo's working dir to S3 + │ └─ skipped for tech-debt-quick (no analysis artifacts to capture) + └── Container exits +``` + +This pattern keeps `atx ct analysis run` as the unit of work. Source attribution is preserved end-to-end — findings carry the customer's source name. + +## Provider Compatibility + +| Provider | Container setup | Analysis output | Remediation flag | Remediation output | +| ---------- | ------------------------------------------------------------------------------------------------ | ------------------------- | -------------------- | ---------------------------------------------------- | +| **github** | Place `github_token` (no `source add` in container) | `code.zip` per repo in S3 | NO `--local` | Result branch pushed to source repo and PR is opened | +| **gitlab** | Place `gitlab_token` (no `source add` in container) | `code.zip` per repo in S3 | NO `--local` | Result branch pushed to source repo and MR is opened | +| **local** | Pull bundle from S3, `discovery scan --path` (idempotent — creates/updates local config + scans) | `code.zip` per repo in S3 | `--local` (required) | `code.zip` per repo in S3 | + +For github / gitlab, the customer must register the source on their own machine first via `atx ct source add --provider github|gitlab --org --token [--url ]`. The container only injects the token at runtime; everything else (provider type, base URL, identifier) comes from the backend's source record at clone time. + +## Step 0: Detect Infrastructure, Then Branch (Provision vs Operate) + +This is the entry gate for every Batch run. It is read-only and safe under any +credentials, including ReadOnly. + +**Network contract:** AWS Transform creates Batch, Lambdas, S3, KMS, IAM, and +security groups via CloudFormation. It does NOT create VPCs, subnets, NAT gateways, +or internet gateways — you provide those. If you don't specify a VPC, deployment +will fail rather than auto-provision. + +**Constraints:** + +- You MUST run this detection BEFORE asking the user anything about jobs, because + the answer decides whether the user needs to provision first or can submit work now. +- You MUST run: + + ```bash + aws cloudformation describe-stacks --stack-name AtxInfrastructureStack \ + --query 'Stacks[0].StackStatus' --output text 2>/dev/null || echo "NOT_DEPLOYED" + ``` + +- If the status is `CREATE_COMPLETE` or `UPDATE_COMPLETE`, You MUST treat the stack + as live and proceed to the OPERATE lifecycle (Step 1 onward — job submission). +- If the status is `NOT_DEPLOYED` or any non-complete state, You MUST enter the + PROVISION lifecycle (Step 0a–0c below) and You MUST NOT attempt to submit jobs, + because the Batch infrastructure does not yet exist. +- You MUST NOT run any provisioning command yourself in this step; detection is + read-only. +- You MUST NEVER run `aws ec2 create-default-vpc` or create any VPC, subnet, NAT + gateway, internet gateway, or route on the user's behalf. If the account has no + suitable VPC, surface the situation to the user and refuse to proceed. + +### Step 0a: Clone repo and discover VPCs + +**Constraints:** + +- You MUST clone (or update) the infra repo FIRST, before any reference to files + inside it: + + ```bash + ATX_INFRA_DIR="$HOME/.aws/atx/custom/remote-infra" + [ -d "$ATX_INFRA_DIR" ] || git clone -b atx-remote-infra --single-branch \ + https://github.com/aws-samples/aws-transform-custom-samples.git "$ATX_INFRA_DIR" + ``` + +- You MUST then discover the user's available VPCs by running: + + ```bash + aws ec2 describe-vpcs --query 'Vpcs[*].[VpcId,IsDefault,Tags[?Key==`Name`].Value|[0]]' --output table + ``` + +- Show the results to the user. If no VPCs exist (or none have private subnets with + NAT for Fargate), direct the user to the helper script: + + ``` + A utility script is available to create a Fargate-ready VPC with private subnets, + NAT gateway, and security group: + + cd "$HOME/.aws/atx/custom/remote-infra" && ./create-vpc.sh + + Run this from another terminal with admin credentials. It will print the VPC, + subnet, and security group IDs to use in cdk.json. + ``` + + You MUST NOT run `create-vpc.sh` yourself — present it for the user to run from + another terminal. After they run it, ask them for the output values to continue. +- Ask the user which VPC to use (MANDATORY). You MUST NOT recommend, suggest, or + guide the user toward any VPC (including the default VPC). Present all VPCs + neutrally without commentary on which is "simplest", "easiest", or "looks + pre-configured". The user must make their own informed choice. +- After the user picks a VPC, show its subnets and security groups: + + ```bash + aws ec2 describe-subnets --filters "Name=vpc-id,Values=" \ + --query 'Subnets[*].[SubnetId,AvailabilityZone,MapPublicIpOnLaunch,Tags[?Key==`Name`].Value|[0]]' --output table + aws ec2 describe-security-groups --filters "Name=vpc-id,Values=" \ + --query 'SecurityGroups[*].[GroupId,GroupName,Description]' --output table + ``` + +- Then ask the user to select: + 1. `existing_subnet_ids`: which subnets (MANDATORY). + 2. `existing_security_group_id`: which security group (MANDATORY). + 3. Source provider: `github`, `gitlab`, `bitbucket`, or `local`. +- You MUST refuse to proceed without explicit VPC, subnet, and security group + selection — there is no default or fallback path. +- You MUST NOT choose the VPC, subnets, or security group on the user's behalf, even + if only one option exists or one appears obvious. Always ask and wait for the user + to explicitly state their selection before writing to cdk.json. +- You MUST NOT write VPC/subnet/SG values to cdk.json until the user has explicitly + confirmed their choice. Show them what you will write and get a "yes" before + proceeding. + +### Step 0b: Validate network inputs and rewrite cdk.json + +**Constraints:** + +- You MUST verify, and report results to the user, the following BEFORE rewriting + config, because each is a silent deploy-time or runtime failure if wrong: + - The supplied subnets are in availability zones `${REGION}a` and `${REGION}b`, + because the stack hardcodes those AZs (`lib/infrastructure-stack.ts`). + - The subnets have egress (NAT gateway or VPC endpoints), because the stack does + NOT provision NAT. Tasks need outbound reach to the `atx ct` backend, ECR, S3, + and Secrets Manager. + - If the source is internal/self-hosted: a route exists from those subnets to the + internal git host (VPN / Direct Connect / peering). + - The security group's egress rules permit all of the above. +- You MUST NOT create NAT gateways, routes, VPNs, Direct Connect, or any other + network infrastructure on the user's behalf, because these are production network + changes with cost and blast-radius implications that require explicit human action. + If a precondition is missing, You MUST surface it and hand it to the user or their + network team. +- You MUST rewrite `$ATX_INFRA_DIR/cdk.json` `context` keys from the user's answers + and You MUST show the diff before deploy: + + ```json + "existingVpcId": "", + "existingSubnetIds": [""], + "existingSecurityGroupId": "" + ``` + +- You SHOULD leave `prebuiltImageUri` unchanged unless the user needs a runtime not + in the pre-built image, because blanking it switches to the Docker-required custom + image path. + +### Step 0c: Hand off the deploy (admin-gated) — DO NOT run it yourself + +**Constraints:** + +- You MUST present the deploy command for the user to run, and You MUST NOT run it + yourself, because the stack creates IAM roles and therefore requires admin / + role-creation permissions the agent should not assume. +- You MUST include this caveat verbatim in intent: "This stack creates IAM roles, so + deploying requires admin / role-creation permissions (`iam:CreateRole`, + `iam:PutRolePolicy`, `iam:PassRole`, instance profiles). Run it with an admin + identity. ReadOnly or runtime credentials are sufficient for everything afterward." +- You MUST present the command in in-session form so its output returns to the + conversation: + + ``` + ! cd "$HOME/.aws/atx/custom/remote-infra" && ./setup.sh + ``` + +- After deploy, You MUST direct the user to attach the executor policy + (`$HOME/.aws/atx/custom/remote-infra/AWSTransformInfrastructureExecutorAccessBatch.json`) + to their IAM role/user, so day-to-day job submission needs only least-privilege. +- You MUST stop the PROVISION lifecycle here and wait for the user to confirm the + deploy succeeded before entering the OPERATE lifecycle. +- On success You SHOULD persist `{executionModel:"batch", stackName, region, source, + byoNetwork}` to `.atx/context.json`, so a later session detects and reuses the stack. + +## Step 1: Verify Source and Enumerate Repos + +Before submitting jobs, confirm a source is registered locally (see [continuous-modernization-source.md](workload-continuous-modernization-source.md)) and run discovery (see [continuous-modernization-discovery.md](workload-continuous-modernization-discovery.md)) to get the list of repos. This determines what the container will analyze (Step 5). + +First, list existing registered sources to show the customer what's available: + +```bash +atx ct source list +``` + +Show the list to the customer and ask: + +1. **Which source to analyze?** (pick from the list above) +2. **Source type:** GitHub, GitLab, or local +3. **Repos to analyze:** all repos in source, or a specific subset +4. **Analysis type:** `tech-debt-comprehensive`, `tech-debt-quick`, `security`, `agentic-readiness`, `modernization-readiness` + +If the list is empty or the customer wants to register a new source first, run `atx ct source add` via the [continuous-modernization-source](workload-continuous-modernization-source.md) skill, then return here. + +Once the source is selected, run discovery and enumerate: + +```bash +LOGICAL_SOURCE_NAME="" + +# Run discovery +atx ct discovery scan --source "$LOGICAL_SOURCE_NAME" + +# Enumerate the discovered repos +mapfile -t REPOS < <(atx ct repository list --source "$LOGICAL_SOURCE_NAME" --json | jq -r '.items[].full_name') +REPO_COUNT=${#REPOS[@]} +``` + +Works for all source providers (github, gitlab, local) — no provider-specific API calls. See [continuous-modernization-discovery](workload-continuous-modernization-discovery.md) for scan details. + +If the customer wants only a subset of repos, filter `${REPOS[@]}` before submitting. + +## Step 2: Prep Credentials + +Give the user the relevant command below to run in their own terminal — do not ask them to paste the token into this chat. + +**GitHub HTTPS — store the PAT** (the container fetches it from Secrets Manager at job start): + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "atx/github-token" \ + --secret-string "$TOKEN" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/github-token" \ + --secret-string "$TOKEN"; }; unset TOKEN +``` + +**GitHub SSH — store the private key:** + +```bash +aws secretsmanager create-secret --name "atx/ssh-key" \ + --secret-string "$(cat )" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/ssh-key" \ + --secret-string "$(cat )" +``` + +**GitLab HTTPS — store the PAT** (separate secret, the container fetches it from Secrets Manager at job start): + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "atx/gitlab-token" \ + --secret-string "$TOKEN" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/gitlab-token" \ + --secret-string "$TOKEN"; }; unset TOKEN +``` + +**Bitbucket — store the API token** (the container fetches it from Secrets Manager at job start). Email and username are injected into the container command directly (not secrets — they're non-sensitive identifiers): + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "atx/bitbucket-token" \ + --secret-string "$TOKEN" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/bitbucket-token" \ + --secret-string "$TOKEN"; }; unset TOKEN +``` + +**Private package registries** (if the analysis builds the project): see [custom-remote-execution#private-package-registries](custom-remote-execution.md#private-package-registries) for the `atx/credentials` JSON pattern. + +## Step 2b: Validate Credentials (MANDATORY) + +**MANDATORY**: The agent MUST verify that the required secret exists in Secrets Manager BEFORE proceeding to Step 4 (Confirm and Submit). Do NOT submit jobs without confirming the credential is present. If the secret is missing, give the user the command to run in their own terminal to create it (do not ask them to paste the token into this chat). + +The required secret depends on the source provider: + +| Provider | Required Secret | +| ------------- | ------------------------ | +| **github** | `atx/github-token` | +| **gitlab** | `atx/gitlab-token` | +| **bitbucket** | `atx/bitbucket-token` | +| **local** | (none — no token needed) | + +**Step A — Check secret exists:** + +```bash +# Replace with the provider-specific secret from the table above +aws secretsmanager describe-secret --secret-id --region 2>&1 +``` + +- If `ResourceNotFoundException` → inform the user that the secret is missing. Give them the command below to run in their own terminal (do not ask them to paste the token into this chat): + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "" \ + --secret-string "$TOKEN" --region 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "" \ + --secret-string "$TOKEN" --region ; }; unset TOKEN +``` + +**Step B — Confirm SCM configuration with user (MANDATORY for non-local providers):** + +Before submitting any batch job (analysis or remediation), ask the user to confirm the SCM provider config that will be injected into the container. An incorrect identifier, email, or username causes clone failures inside the container after the setup phase. + +Present the config to the user via AskUserQuestion: + +| Provider | Fields to confirm | +| --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **github** | `identifier` (GitHub org or username that owns the repos) | +| **gitlab** | `identifier` (GitLab group or username that owns the repos) | +| **bitbucket cloud** | `identifier` (Bitbucket workspace), `email` (Atlassian account email for API auth), `username` (Bitbucket username for git clone — visible in clone URLs at bitbucket.org) | +| **bitbucket self-hosted (Data Center)** | `identifier` (Bitbucket project key), `base_url` (instance URL, e.g. `https://bitbucket.corp.example.com`) | + +Example confirmation prompt: + +> "I'll use this config for the Batch job: +> +> - Provider: github +> - Identifier: `github-username` +> +> Is this correct?" + +## Step 3: Prep Local Sources (local source only) + +The single container needs all repos on disk. The skill zips ALL repos in the source as one bundle and uploads it to the managed source bucket: + +```bash +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +ZIP_NAME="-bundle" # name the bundle (e.g., my-bundle) + +# Zip all repos as siblings in one archive (preserves .git in each) +cd /path/to/repos +zip -qr "/tmp/${ZIP_NAME}.zip" */ -x '*/node_modules/*' +aws s3 cp "/tmp/${ZIP_NAME}.zip" "s3://atx-source-code-${ACCOUNT_ID}/repos/${ZIP_NAME}.zip" +rm -f "/tmp/${ZIP_NAME}.zip" +``` + +> **Important:** The zip MUST include each repo's `.git/` directory. atx ct's local-provider discovery scanner identifies repos by the presence of `.git`. If the customer pre-emptively excludes `.git`, the scan finds zero repos and analysis fails with `Available: (none)`. + +The skill runs this on the customer's machine. GitHub and GitLab sources don't need this step — the container clones directly from the source provider via PAT. + +## Step 3b: Security Analysis Prerequisites (security type only) + +If `ANALYSIS_TYPE` is `security` (or `agentic-readiness` / `modernization-readiness` which depend on it), the security agent must be configured before submitting. Skip this step for all other analysis types. + +**Constraints:** + +- You MUST verify `~/.atxct/shared/security_agent_config.json` exists locally. If + missing, the customer must run security agent setup first: + + ```bash + atx ct analysis configure-security + ``` + + See [continuous-modernization-setup](workload-continuous-modernization-setup.md) for details. +- You MUST confirm the config contains valid fields before proceeding: + + ```bash + jq -r '.s3Bucket, .role_arn // .roleArn' ~/.atxct/shared/security_agent_config.json + ``` + + If either field is empty or null, the security agent setup is incomplete — ask the + customer to re-run `atx ct analysis configure-security`. +- The CDK stack's `ATXBatchJobRole` already includes `securityagent:*` permissions, + S3 access to the security agent bucket, and `iam:PassRole` on the security agent + role. No additional IAM modifications are needed for the Batch path (unlike the EC2 + path which requires manual role augmentation). +- The security agent config is injected into the container at runtime via base64 + encoding in the job command (handled by `sec_config_inject()` in Step 5). No + manual sync to S3 or instance is needed. + +## Step 4: Confirm and Submit + +**Gate checks** (only the checks relevant to the source provider must pass): + +| Check | github | gitlab | bitbucket cloud | bitbucket self-hosted | local | +| ------------------------------------------------------------- | ------------------ | ------------------ | ----------------------------- | --------------------- | ----- | +| **Credentials (Step 2b-A):** secret exists in Secrets Manager | `atx/github-token` | `atx/gitlab-token` | `atx/bitbucket-token` | `atx/bitbucket-token` | skip | +| **SCM config (Step 2b-B):** confirmed with user | identifier | identifier | identifier + email + username | identifier + base_url | skip | + +Tell the customer what will happen and wait for explicit confirmation. The exact prompt depends on provider: + +**For GitHub:** + +> "I'll submit a Batch job to run `` on Fargate against your GitHub source ``. The container will: +> +> - Place your GitHub PAT (from Secrets Manager `atx/github-token`) — your existing source `` is preserved (no new source created) +> - Run `atx ct analysis run --source ` — atx ct will clone each repo and analyze it +> +> Continue?" + +**For GitLab:** + +> "I'll submit a Batch job to run `` on Fargate against your GitLab source ``. The container will: +> +> - Place your GitLab PAT (from Secrets Manager `atx/gitlab-token`) — your existing source `` is preserved (no new source created) +> - Run `atx ct analysis run --source ` — atx ct will clone each repo and analyze it +> +> Continue?" + +**For Bitbucket Cloud:** + +> "I'll submit a Batch job to run `` on Fargate against your Bitbucket source ``. The container will: +> +> - Place your Bitbucket API token (from Secrets Manager `atx/bitbucket-token`) and inject email/username into config.json +> - Run `atx ct analysis run --source ` — atx ct will clone each repo and analyze it +> +> Continue?" + +**For Bitbucket Data Center:** + +> "I'll submit a Batch job to run `` on Fargate against your Bitbucket Data Center source ``. The container will: +> +> - Place your HTTP Access Token (from Secrets Manager `atx/bitbucket-token`) and inject base_url into config.json +> - Run `atx ct analysis run --source ` — atx ct will clone each repo and analyze it +> +> Continue?" + +**For Local:** + +> "I'll zip your repos at `` (with `.git` included) into a single bundle and upload to `s3://atx-source-code-${ACCOUNT_ID}/repos/.zip`, then submit a Batch job to run `` on Fargate. The container will: +> +> - Download + unzip the bundle +> - Register a new local source named `` in the backend (pointing at the container's `/home/atxuser/repos`) +> - Run discovery to enumerate the repos +> - Run `atx ct analysis run --source ` — atx ct analyzes all repos +> - Upload artifacts to `s3://atx-ct-output-${ACCOUNT_ID}//::/code.zip` +> +> Continue?" + +Do NOT submit until the customer confirms. + +## Step 5: Submit Per-Repo Batch Jobs + +Build one entry per repo and invoke `atx-trigger-batch-jobs`. The Lambda has strict input validation — to pass it, the per-job command must: + +- Start with an allowed prefix (`atx ct` or `atx custom def *`) +- Contain no `$VAR` references, no `$(...)` command substitution, no `${VAR}` brace expansion, no `()` subshells, no `{}` braces, no `*` glob, no `^` regex anchor, no backticks, no `-c` flag (e.g., `sh -c '...'`) +- Contain only ASCII characters — no em-dash `—`, en-dash `–`, smart quotes `""` `''`, or any other Unicode (these get rejected by the Lambda's character allowlist) +- Be on a single line (no `\\` continuations, since post-`tr` they become literal escape-space characters) + +The skill therefore substitutes ALL variables (account ID, source name, repo name, etc.) into the command string locally, before submission. Runtime values that aren't known until the container runs (e.g., the analysis ID printed by `atx ct analysis run`) are extracted into temp files and fed to subsequent commands via `xargs -I VAR command VAR ...`. + +```bash +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +BATCH_NAME="atxct-$(date +%s)" +ANALYSIS_TYPE="" # tech-debt-quick | tech-debt-comprehensive | security | agentic-readiness | modernization-readiness | custom +AGENT="" # AI assistant name (kiro, claude, amazonq, etc.) +LOGICAL_SOURCE_NAME="" # the source already registered with atx ct (used as-is) +GITHUB_ORG="" # github source only -- used to build config.json for remediation chains +GITLAB_GROUP="" # gitlab source only -- used to build config.json for remediation chains +BITBUCKET_WORKSPACE="" # bitbucket source only -- workspace (Cloud) or project key (DC) +BITBUCKET_EMAIL="" # bitbucket cloud only -- email for API auth +BITBUCKET_USERNAME="" # bitbucket cloud only -- username for git clone/push +BITBUCKET_BASE_URL="" # bitbucket DC only -- e.g. https://bitbucket.corp.example.com (empty for Cloud) +ZIP_NAME="" # local source only -- name of the bundle uploaded in Step 3 + +# Security analysis only -- base64-encode the security agent config for injection. +# Not a secret (contains resource identifiers only), so no Secrets Manager needed. +SEC_CONFIG_B64="" +if [[ "${ANALYSIS_TYPE}" == "security" ]]; then + SEC_CONFIG_B64=$(base64 < ~/.atxct/shared/security_agent_config.json | tr -d '\n') +fi + +# Upload script -- /app/upload-ct-artifacts.sh is baked into the container image. +# It iterates analysis.repos[] (or remediation.repos.keys[]), zips each repo's +# working directory, and uploads to s3:////::/code.zip. +# Auto-detects analysis vs remediation, resolves per-provider repo path. + +# Helper: security config injection snippet. Returns the command fragment to inject +# security_agent_config.json into the container when ANALYSIS_TYPE is "security". +# Empty string for non-security types. +sec_config_inject() { + if [[ -n "${SEC_CONFIG_B64}" ]]; then + echo " && mkdir -p /home/atxuser/.atxct/shared && echo ${SEC_CONFIG_B64} | base64 -d > /home/atxuser/.atxct/shared/security_agent_config.json" + fi +} + +# GitHub source (analysis) -- inject token (no source-add, no discovery), then run analysis. +# Source must be pre-registered locally by the customer (Step 1 verifies this). +# atx ct internally clones each repo as needed. Analysis details: see [continuous-modernization-analysis.md](workload-continuous-modernization-analysis.md). +# Server log is written to ~/.aws/atx/logs/server.log so it gets included in the upload zip for debugging. +build_command_github() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id atx/github-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/github_token${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --wait --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log${upload}" +} + +# GitLab source (analysis) -- same injection pattern as GitHub, just different secret/file names. +# atx ct's async provider resolution queries the backend for the source's provider type, +# so we don't need source-add or config.json injection in the container -- the locally-registered +# source's metadata (provider=gitlab, base_url, identifier) is fetched at clone time. +build_command_gitlab() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id atx/gitlab-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/gitlab_token${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --wait --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log${upload}" +} + +# Bitbucket source (analysis) -- inject token + config.json with email/username (Cloud) or base_url (DC). +# atx ct's async provider resolution queries the backend for the source's provider type. +# Cloud needs email (API auth) and username (git auth). DC needs base_url only. +build_command_bitbucket() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local config_json + if [[ -n "${BITBUCKET_BASE_URL}" ]]; then + # Data Center: provider_config has base_url + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + # Cloud: provider_config has email and username + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + local CONFIG_B64=$(echo "${config_json}" | base64) + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --wait --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log${upload}" +} + +# Local source -- sync repo bundle from atx-source-code, unzip, source-add + discovery, run analysis. +# The bundle is a single zip containing all repos as subdirectories (zipped in Step 3). +# Source semantics: see [continuous-modernization-source.md](workload-continuous-modernization-source.md). Discovery: see [continuous-modernization-discovery.md](workload-continuous-modernization-discovery.md). Analysis: see [continuous-modernization-analysis.md](workload-continuous-modernization-analysis.md). +build_command_local() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/repos && aws s3 cp s3://atx-source-code-${ACCOUNT_ID}/repos/${ZIP_NAME}.zip /tmp/${ZIP_NAME}.zip && unzip -q /tmp/${ZIP_NAME}.zip -d /home/atxuser/repos/ && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} --path /home/atxuser/repos${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --wait --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log${upload}" +} + +# Single container per submission. Source-level analysis runs on ALL repos in the source +# (atx ct internally parallelizes repos within the container). +CMD=$(build_command_local) # or build_command_github, build_command_gitlab, build_command_bitbucket +JOB_NAME="atxct-${BATCH_NAME}" +JOBS_JSON=$(jq -nc --arg cmd "$CMD" --arg name "$JOB_NAME" '[{command: $cmd, jobName: $name}]') +PAYLOAD=$(jq -nc --arg bn "$BATCH_NAME" --argjson jobs "$JOBS_JSON" '{batchName: $bn, jobs: $jobs}') + +aws lambda invoke --function-name atx-trigger-batch-jobs \ + --payload "$PAYLOAD" \ + --cli-binary-format raw-in-base64-out /dev/stdout +``` + +The Lambda returns a `batchId`. Track it for status polling (Step 6). + +### Security analysis: automatic concurrency enforcement + +**When `ANALYSIS_TYPE` is `security`, jobs are automatically routed to a dedicated job queue (`atx-security-job-queue`)** backed by a compute environment capped at 5 concurrent tasks (`maxvCpus = 5 * fargateVcpu`). This means: + +- Submit ALL security jobs in a single batch — no chunking, no polling, no terminal blocking +- AWS Batch queues them and only runs 5 at a time +- As one completes, the next in queue starts immediately (no wasted slots) +- The Security Agent's concurrency limit is respected at the infrastructure level + +The Lambda detects security jobs by checking if the command contains `--type security` and routes them to `atx-security-job-queue` automatically. No special handling needed in the submission script. + +**This automatic routing applies only to security analysis.** For all other analysis types (tech-debt-quick, tech-debt-comprehensive, agentic-readiness, modernization-readiness), jobs go to the general queue with the full 128-job concurrency. + +### Submission patterns + +The Lambda's `jobs: [...]` array supports up to 128 jobs per batch. Jobs in a batch run **in parallel** (no `dependsOn` between them — all dispatch simultaneously, scheduled by AWS Batch as queue capacity allows). Pick the pattern that matches the workload: + +| Pattern | When to use | Jobs per batch | +| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- | +| **A. Single analysis on one source** (default above) | One source, one analysis type. Parallelism comes from atx ct's internal handling of repos within the container. | 1 | +| **B. Multiple analysis types on one source** | E.g., quick + comprehensive on the same repos. Findings persist independently with their own analysis_id, queryable separately. | N (one per type) | +| **C. Multiple sources in one batch** | E.g., team-A's github source + team-B's gitlab source. Each source gets its own container with its own logical source name and credentials. | N (one per source, or one per source × type combo) | +| **D. Per-repo parallel analysis** | Customer wants maximum parallelism — one container per repo. Each container analyzes a single repo via `--repo`. Useful when repos are large or the customer wants per-repo isolation/failure boundaries. | N (one per repo) | + +**Pattern B example** — quick + comprehensive on the same source, parallel containers: + +```bash +ANALYSIS_TYPE="tech-debt-quick" +JOB1_CMD=$(build_command_github) + +ANALYSIS_TYPE="tech-debt-comprehensive" +JOB2_CMD=$(build_command_github) + +JOBS_JSON=$(jq -nc \ + --arg cmd1 "$JOB1_CMD" --arg name1 "atxct-quick-${BATCH_NAME}" \ + --arg cmd2 "$JOB2_CMD" --arg name2 "atxct-comp-${BATCH_NAME}" \ + '[ + {command: $cmd1, jobName: $name1}, + {command: $cmd2, jobName: $name2} + ]') +PAYLOAD=$(jq -nc --arg bn "$BATCH_NAME" --argjson jobs "$JOBS_JSON" '{batchName: $bn, jobs: $jobs}') + +aws lambda invoke --function-name atx-trigger-batch-jobs \ + --payload "$PAYLOAD" \ + --cli-binary-format raw-in-base64-out /dev/stdout +``` + +**Pattern C example** — different sources (and providers) in parallel: + +```bash +LOGICAL_SOURCE_NAME="team-a-github" +JOB1_CMD=$(build_command_github) + +LOGICAL_SOURCE_NAME="team-b-gitlab" +JOB2_CMD=$(build_command_gitlab) + +JOBS_JSON=$(jq -nc \ + --arg cmd1 "$JOB1_CMD" --arg name1 "atxct-team-a-${BATCH_NAME}" \ + --arg cmd2 "$JOB2_CMD" --arg name2 "atxct-team-b-${BATCH_NAME}" \ + '[ + {command: $cmd1, jobName: $name1}, + {command: $cmd2, jobName: $name2} + ]') +PAYLOAD=$(jq -nc --arg bn "$BATCH_NAME" --argjson jobs "$JOBS_JSON" '{batchName: $bn, jobs: $jobs}') + +aws lambda invoke --function-name atx-trigger-batch-jobs \ + --payload "$PAYLOAD" \ + --cli-binary-format raw-in-base64-out /dev/stdout +``` + +**Pattern D** — per-repo parallel analysis (one container per repo): + +The `--repo` flag requires the repo's `slug` value from `atx ct repository list`. Always run `atx ct repository list --source --json` and use the `.slug` field (format: `::`). + +Per-repo build command variants — same as the source-level builders but insert `--repo ` before `--wait`: + +```bash +# GitHub source (per-repo analysis) -- one container per repo for maximum parallelism. +# repoSlug must be set before calling (from `atx ct repository list --json | jq -r '.items[].slug'`). +build_command_github_per_repo() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id atx/github-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/github_token${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --repo ${repoSlug} --wait 2>&1 | tee /tmp/run.log${upload}" +} + +# GitLab source (per-repo analysis) +build_command_gitlab_per_repo() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id atx/gitlab-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/gitlab_token${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --repo ${repoSlug} --wait 2>&1 | tee /tmp/run.log${upload}" +} + +# Bitbucket source (per-repo analysis) +build_command_bitbucket_per_repo() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local config_json + if [[ -n "${BITBUCKET_BASE_URL}" ]]; then + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + local CONFIG_B64=$(echo "${config_json}" | base64) + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --repo ${repoSlug} --wait 2>&1 | tee /tmp/run.log${upload}" +} + +# Local source (per-repo analysis) +build_command_local_per_repo() { + local upload="" + if [[ "${ANALYSIS_TYPE}" != "tech-debt-quick" ]]; then + upload=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + local sec_inject=$(sec_config_inject) + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/repos && aws s3 cp s3://atx-source-code-${ACCOUNT_ID}/repos/${ZIP_NAME}.zip /tmp/${ZIP_NAME}.zip && unzip -q /tmp/${ZIP_NAME}.zip -d /home/atxuser/repos/ && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} --path /home/atxuser/repos${sec_inject} && atx ct analysis run --type ${ANALYSIS_TYPE} --source ${LOGICAL_SOURCE_NAME} --repo ${repoSlug} --wait 2>&1 | tee /tmp/run.log${upload}" +} +``` + +**Pattern D usage** — iterate repos and build per-repo jobs: + +```bash +JOBS="[]" +while IFS= read -r REPO; do + [ -z "$REPO" ] && continue + repoSlug="$REPO" + REPO_SLUG=$(echo "$REPO" | tr '/:' '-') + CMD=$(build_command_github_per_repo) # or _gitlab_per_repo, _bitbucket_per_repo, _local_per_repo + JOBS=$(echo "$JOBS" | jq --arg cmd "$CMD" --arg name "$REPO_SLUG" '. + [{command: $cmd, jobName: $name}]') +done <<< "$REPOS" + +PAYLOAD=$(jq -nc --arg bn "$BATCH_NAME" --argjson jobs "$JOBS" '{batchName: $bn, jobs: $jobs}') + +aws lambda invoke --function-name atx-trigger-batch-jobs \ + --payload "$PAYLOAD" \ + --cli-binary-format raw-in-base64-out /dev/stdout +``` + +**Why each piece is shaped this way:** + +| Concern | How addressed | +| ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| Lambda allowlist requires `atx ...` prefix | Command starts with `atx ct --version > /dev/null 2>&1 ;` (allowlist token; exit code intentionally suppressed) | +| Image's `atx ct` is pre-baked | The container image `public.ecr.aws/d9h8z6l7/aws-transform:latest` ships with the production CLI pre-installed; no runtime install step is required. If the image lacks the CLI, the job fails fast with `command not found` and exit code 127. | +| atx ct 3.1+ requires Node 22 | `source nvm.sh && nvm use 22` switches Node version | +| Env from background subshell doesn't reach foreground | PATH + nvm setup in foreground BEFORE backgrounding the server | +| Server needs to be up before commands | `atx ct server > /tmp/server.log 2>&1 &` then `sleep 15` | +| AID capture without `$()` | `atx ct analysis run 2>&1 \| tee /tmp/run.log` then `grep ... \| xargs -I AID` | +| Per-repo scoping | Use `--repo ` (singular). Do NOT use `--repos` (does not exist). The slug comes from `atx ct repository list --json \| jq -r '.items[].slug'` (format: `::`). | +| Customer's source name preserved (no per-repo suffix) | GitHub/GitLab: token-file injection bypasses `source add`'s backend conflict. Local: `discovery scan --path` overrides the path on the existing source without conflicting (idempotent across runs). | +| Re-running batch on the same source | Local provider: `discovery scan --path` updates the registered source's path without 409 errors. GitHub/GitLab: token injection has no equivalent conflict — works every run. | + +**Container's IAM role (`ATXBatchJobRole`)** auto-provides credentials for `aws s3 cp`, `aws secretsmanager get-secret-value`, and `atx ct` backend calls. Defined in the Custom CDK stack (`lib/infrastructure-stack.ts`) — no per-job credential setup needed. + +**MCP configuration (optional):** If the customer has a local MCP config (`~/.aws/atx/mcp.json`), include it on each job: + +```bash +MCP_CONFIG=$(cat ~/.aws/atx/mcp.json 2>/dev/null || echo "null") +# Add "mcpConfig": $MCP_CONFIG to each job entry above +``` + +### Remediation jobs (instead of analysis) + +**Gate check**: Before submitting any remediation batch job, the agent MUST confirm that Step 2b was completed for the relevant provider (see gate checks table in Step 4). Local providers skip credentials and SCM config checks. + +To run a remediation that fixes findings produced by a prior analysis, swap the `analysis run` block for `remediation create --ids` (see [continuous-modernization-remediation.md](workload-continuous-modernization-remediation.md) and [continuous-modernization-findings.md](workload-continuous-modernization-findings.md)). + +The remediation flow differs by source provider: + +| Provider | Pattern | Output | +| ------------- | ------------------------------------------------------------------------------------- | ---------------------------------------------------- | +| **github** | NO `--local` — backend pushes to a result branch | Result branch pushed to source repo and PR is opened | +| **gitlab** | NO `--local` — backend pushes to a result branch | Result branch pushed to source repo and MR is opened | +| **bitbucket** | NO `--local` — backend pushes to a result branch | Result branch pushed to source repo and PR is opened | +| **local** | `--local` (required) — transform runs in the container, working dir is captured to S3 | `code.zip` per repo in S3 | + +`atx ct remediation create` does NOT support `--wait`, so we poll until terminal status (`complete`, `failed`, or `cancelled`). Use `while true` (no iteration cap) — AWS Batch's job timeout is the upper safety net. + +### Three remediation flag combinations + +`atx ct remediation create` accepts three valid flag combinations. Pick based on whether findings have `.fix` populated: + +| Combination | When to use | Capture pattern | +| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | +| `--ids X` (alone) | Findings have `.fix` populated (typical for `tech-debt-quick`). Backend uses each finding's `.fix.transform_name` to pick the transformation. | `jq -r '.[] \| select(.fix != null) \| .id'` | +| `--ids X --transformation-name Y` | Findings WITHOUT `.fix` (typical for `tech-debt-comprehensive`, `security` issues without auto-fix). Customer overrides with an explicit transformation that applies to those findings. | `jq -r '.[] \| select(.category == "Java") \| .id'` (filter by category, not `.fix`) | +| `--transformation-name Y --repo Z` | No findings dependency. Run a transformation directly on a specific repo. | (no findings capture) | + +Set `TRANSFORMATION_NAME` to use the override; leave it empty to rely on each finding's `.fix`. + +```bash +# On customer's laptop: collect finding IDs to remediate + +# Pattern 1 (default): all auto-remediable findings (.fix populated) +FINDING_IDS=$(atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json \ + | jq -r '.[] | select(.fix != null) | .id' \ + | tr '\n' ',' | sed 's/,$//') +TRANSFORMATION_NAME="" + +# Pattern 2 (hybrid): specific category of findings + explicit transformation override +# Example: all Java-category findings, regardless of .fix populated. Apply java-version-upgrade. +# FINDING_IDS=$(atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json \ +# | jq -r '.[] | select(.category == "Java") | .id' \ +# | tr '\n' ',' | sed 's/,$//') +# TRANSFORMATION_NAME="AWS/java-version-upgrade" + +# Or filter by severity / repo / specific finding IDs + +REMEDIATION_NAME="multi-fix-$(date +%s)" # unique per submission + +# GitHub source (remediation) -- no --local, no upload. Backend dispatches a GitHub Actions +# workflow that runs the transform and pushes a result branch to the source repo. +# The customer reviews and opens a PR in github. +# +# WORKAROUND: atx ct discovery scan throws SETUP_REQUIRED for github sources when local +# config.json is missing -- even when the github_token file is present. +# We inject a minimal config.json via base64 to satisfy the local-config check. +build_command_remediation_github() { + local CONFIG_B64=$(printf '{"provider":"github","identifier":"%s"}' "${GITHUB_ORG}" | base64) + # Optional transformation override (for findings without .fix populated, e.g. tech-debt-comprehensive) + local TRANSFORM_FLAG="" + [ -n "${TRANSFORMATION_NAME}" ] && TRANSFORM_FLAG="--transformation-name ${TRANSFORMATION_NAME} " + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/github-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/github_token && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} && atx ct remediation create --ids ${FINDING_IDS} ${TRANSFORM_FLAG}--name ${REMEDIATION_NAME} --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 > /tmp/rid.txt && while true ; do cat /tmp/rid.txt | xargs -I RID atx ct remediation status --id RID > /tmp/status.txt ; grep -qE 'complete|completed|failed|cancelled' /tmp/status.txt && break ; sleep 30 ; done" +} + +# GitLab source (remediation) -- same as github with gitlab_token. Backend pushes to a +# result branch and MR is opened. +# Same SETUP_REQUIRED workaround as github -- minimal config.json injected via base64. +build_command_remediation_gitlab() { + local CONFIG_B64=$(printf '{"provider":"gitlab","identifier":"%s"}' "${GITLAB_GROUP}" | base64) + # Optional transformation override (for findings without .fix populated, e.g. tech-debt-comprehensive) + local TRANSFORM_FLAG="" + [ -n "${TRANSFORMATION_NAME}" ] && TRANSFORM_FLAG="--transformation-name ${TRANSFORMATION_NAME} " + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/gitlab-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/gitlab_token && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} && atx ct remediation create --ids ${FINDING_IDS} ${TRANSFORM_FLAG}--name ${REMEDIATION_NAME} --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 > /tmp/rid.txt && while true ; do cat /tmp/rid.txt | xargs -I RID atx ct remediation status --id RID > /tmp/status.txt ; grep -qE 'complete|completed|failed|cancelled' /tmp/status.txt && break ; sleep 30 ; done" +} + +# Bitbucket source (remediation) -- inject token + config.json, run remediation. +# Same pattern as github/gitlab -- config.json injected via base64. +# Cloud: email + username in provider_config. DC: base_url in provider_config. +build_command_remediation_bitbucket() { + local config_json + if [[ -n "${BITBUCKET_BASE_URL}" ]]; then + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + local CONFIG_B64=$(echo "${config_json}" | base64) + # Optional transformation override (for findings without .fix populated, e.g. tech-debt-comprehensive) + local TRANSFORM_FLAG="" + [ -n "${TRANSFORMATION_NAME}" ] && TRANSFORM_FLAG="--transformation-name ${TRANSFORMATION_NAME} " + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} && atx ct remediation create --ids ${FINDING_IDS} ${TRANSFORM_FLAG}--name ${REMEDIATION_NAME} --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 > /tmp/rid.txt && while true ; do cat /tmp/rid.txt | xargs -I RID atx ct remediation status --id RID > /tmp/status.txt ; grep -qE 'complete|completed|failed|cancelled' /tmp/status.txt && break ; sleep 30 ; done" +} + +# Local source (remediation) -- `--local` runs the transform in the container; results are +# zipped from the container's working dir and uploaded to S3. +build_command_remediation_local() { + # Optional transformation override (for findings without .fix populated, e.g. tech-debt-comprehensive) + local TRANSFORM_FLAG="" + [ -n "${TRANSFORMATION_NAME}" ] && TRANSFORM_FLAG="--transformation-name ${TRANSFORMATION_NAME} " + echo "atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15 ; mkdir -p /home/atxuser/repos && aws s3 cp s3://atx-source-code-${ACCOUNT_ID}/repos/${ZIP_NAME}.zip /tmp/${ZIP_NAME}.zip && unzip -q /tmp/${ZIP_NAME}.zip -d /home/atxuser/repos/ && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} --path /home/atxuser/repos && atx ct remediation create --ids ${FINDING_IDS} ${TRANSFORM_FLAG}--name ${REMEDIATION_NAME} --local --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 > /tmp/rid.txt && while true ; do cat /tmp/rid.txt | xargs -I RID atx ct remediation status --id RID > /tmp/status.txt ; grep -qE 'complete|completed|failed|cancelled' /tmp/status.txt && break ; sleep 30 ; done && cat /tmp/rid.txt | xargs -I RID /app/upload-ct-artifacts.sh RID atx-ct-output-${ACCOUNT_ID}" +} +``` + +Then call `build_command_remediation_github`, `build_command_remediation_gitlab`, `build_command_remediation_bitbucket`, or `build_command_remediation_local` instead of the analysis builder. Everything else (Lambda invoke, polling, status retrieval) is identical to analysis. + +**Why `while true` instead of a bounded loop:** + +| Concern | Resolution | +| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | +| Remediation duration is unpredictable | No iteration cap — poll until terminal status | +| Container could hang forever on a stuck remediation | AWS Batch's Job Definition timeout (default 12h, configurable) is the upper bound | +| Customer wants to cancel | `aws batch terminate-job` from outside; or `atx ct remediation cancel` server-side | + +The container exits ONLY when: + +1. Status reaches `complete`/`failed`/`cancelled` → upload runs (local only) → exit 0 +2. Batch hits its job timeout → forced kill → exit non-zero (no upload) + +Keep your Job Definition timeout generous for remediation jobs. + +## Step 6: Poll Status + +Poll every 60 seconds for the first 10 polls, then every 5 minutes. Report only on status changes. + +```bash +aws lambda invoke --function-name atx-get-batch-status \ + --payload "{\"batchId\":\"${BATCH_ID}\"}" \ + --cli-binary-format raw-in-base64-out /dev/stdout +``` + +Job statuses: `SUBMITTED`, `PENDING`, `RUNNABLE`, `STARTING`, `RUNNING`, `SUCCEEDED`, `FAILED`. + +## Step 7: Get Findings and Artifacts + +**Findings** are persisted by the analysis runner during execution and queryable via: + +```bash +atx ct findings list --source --json +``` + +All findings are persisted under the customer's `LOGICAL_SOURCE_NAME` (single container does the analysis). One query gets everything (see [continuous-modernization-findings.md](workload-continuous-modernization-findings.md)): + +```bash +atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json +``` + +**S3 artifacts** are uploaded by `/app/upload-ct-artifacts.sh` (baked into the container image). Analysis artifacts are written for any provider; remediation artifacts are only written for `local`-provider remediations (github/gitlab remediations push a result branch instead — no S3): + +``` +s3://atx-ct-output-{account-id}/// + code.zip -- the working directory after the analysis or remediation completes, + including a result branch with auto-committed changes (e.g., + `atx-result-staging-` for analysis documentation, or the + remediation's branch for `--local` runs). The customer can `git log` + and `git diff` to review what the bot changed. `.git/` is preserved + for this reason. + Excludes node_modules/, .env*, *.pem, *.key, .aws/. + logs.zip -- cherry-picked debug logs: + ATX CLI debug logs, error log, conversation transcript, + plan.json, validation_summary.md. +``` + +To download: + +```bash +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) + +# All artifacts for one analysis +aws s3 sync s3://atx-ct-output-${ACCOUNT_ID}/${ANALYSIS_ID}/ ./artifacts/ + +# Just one repo's reports +aws s3 cp s3://atx-ct-output-${ACCOUNT_ID}/${ANALYSIS_ID}/${REPO_SLUG}/code.zip ./ +``` + +Surface findings to the user as the primary result. Reference S3 artifacts only if the user asks for raw reports/logs or you need to debug a finding. + +**Cancellation note:** If a Batch job is terminated mid-run, the upload step does not run and that container's artifacts are lost. Findings already persisted survive (the analysis runner pushes them mid-flight, before the upload step). + +## Cancellation + +```bash +# Cancel one job +aws lambda invoke --function-name atx-terminate-job \ + --payload "{\"jobId\":\"\"}" \ + --cli-binary-format raw-in-base64-out /dev/stdout + +# Cancel all jobs in a batch +aws lambda invoke --function-name atx-terminate-batch-jobs \ + --payload "{\"batchId\":\"\"}" \ + --cli-binary-format raw-in-base64-out /dev/stdout +``` + +## Container Customization + +The Batch container is the pre-built `public.ecr.aws/d9h8z6l7/aws-transform:latest` image, which includes Java 8/11/17/21/25, Python 3.8-3.14, Node.js 16-24, Maven, Gradle, common build tools, AWS CLI v2, and the AWS Transform CLI (including `atx ct`) pre-installed. The JOB_COMMAND runs `atx ct --version` as a smoke test at job start — no runtime install step is required. + +For continuous modernization analyses, the pre-built image's defaults handle every runtime need. No customization required. + +If a customer brings their own TD that requires a runtime or tool not in the pre-built image (e.g., Rust, Go, .NET on Linux), follow the Custom Image Path in [custom-remote-execution](custom-remote-execution.md#custom-image-path-docker-required): + +1. Clone the infrastructure repo (already done if Custom is set up) +2. Edit the Dockerfile to add the required runtime/tool — see [Adding Languages or Tools](custom-remote-execution.md#adding-languages-or-tools) +3. Re-run `./setup.sh` from the cloned directory + +## Runtime Version Switching + +For remediation runs that target a specific language version (e.g., `AWS/java-version-upgrade` targeting Java 21), pass the version as an environment variable on each job in the `jobs` array: + +```json +{ + "command": "...", + "jobName": "...", + "environment": { + "JAVA_VERSION": "21", + "NODE_VERSION": "22", + "PYTHON_VERSION": "3.13" + } +} +``` + +Available versions: + +- **Java**: 8, 11, 17, 21, 25 (Amazon Corretto) +- **Python**: 3.8-3.14 (accepts `3.13` or `13`) +- **Node.js**: 16, 18, 20, 22, 24 + +For analyses (tech-debt-comprehensive, agentic-readiness, modernization-readiness), runtime switching is generally not needed. Pass these env vars only when running remediation TDs that need a specific target version. + +See [custom-remote-execution#version-switching-at-runtime](custom-remote-execution.md#version-switching-at-runtime) for the full reference. + +## Limits + +- Max 128 concurrent Batch jobs (per the existing CDK config) +- **Max 5 concurrent Batch jobs for `--type security` (infrastructure-enforced)** — the Security Agent backend caps concurrent code-review executions at 5. A dedicated compute environment (`atx-fargate-security`, `maxvCpus = 5 * fargateVcpu`) and job queue (`atx-security-job-queue`) enforce this at the AWS Batch level. The Lambda automatically routes security jobs to this queue — no client-side chunking needed. Submit all security jobs in one batch; AWS Batch queues excess jobs and runs them as slots free up. +- Max job duration: defined by the CDK stack +- Bedrock throughput is per-account — running many parallel continuous modernization containers shares the quota; large batches may throttle +- Backend (atx ct API) rate limits at ~30+ concurrent calls. Step 5's chunked-submit pattern (chunks of 8) keeps within limits + +## Error Handling + +| Error | Cause | Fix | +| ------------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------------------------------------------- | +| Job stuck in `RUNNABLE` | No Fargate capacity | Wait or check service quota | +| Job fails with auth error | Task role missing Bedrock, SecurityAgent, or atx-ct-output access | Update task role; customer re-runs `setup.sh` | +| Container can't fetch PAT | `atx/github-token` (or `atx/gitlab-token`) secret missing | Customer creates the secret (see Step 2) | +| Container can't fetch SSH key | `atx/ssh-key` secret missing | Customer creates the secret (see Step 2) | +| Container can't read local source zip | S3 path incorrect or zip not uploaded | Verify `s3://atx-source-code-${ACCOUNT_ID}/repos/.zip` exists | +| `atx ct discovery scan` fails | Source registration failed (bad PAT, bad path, etc.) | Check container logs; fix credentials or path | +| `atx ct analysis run` clone fails | PAT expired, repo private to a different account, etc. | Verify customer's PAT has access to the repo | +| Findings missing after analysis | Server crashed before persisting | Check CloudWatch logs for errors | +| Artifacts missing from S3 | Upload script failed | Check container logs for `[date] Uploaded` or `Skip` lines from the staging script | + +## Pricing + +Direct customer to: + +- AWS Batch / Fargate pricing: https://aws.amazon.com/fargate/pricing/ +- AWS Transform agent minutes: https://aws.amazon.com/transform/pricing/ + +Do NOT quote specific dollar amounts or time estimates. + +## Cleanup + +After every Batch run completes, prompt the user with the following: + +> Your remote infrastructure is still deployed in your AWS account. All services +> are pay-per-use only — there are no fixed costs when idle. You can leave it in +> place for future analyses, or tear it down now. +> +> For pricing details: https://aws.amazon.com/transform/pricing/ +> +> If you tear down: +> +> - Batch compute, Lambda functions, IAM roles, log groups — all deleted +> - S3 buckets, KMS key, and VPC/network resources are preserved +> - You'll need to re-run setup next time you use remote mode +> +> Would you like to keep the infrastructure or tear it down? + +If the user chooses to tear down: + +**Constraints:** + +- You MUST NOT run `teardown.sh` yourself, because it deletes IAM roles and + CloudFormation stacks which requires admin permissions. +- You MUST present the command for the user to run from another terminal: + + ```bash + cd "$HOME/.aws/atx/custom/remote-infra" && ./teardown.sh + ``` + +- You MUST tell the user: "Teardown requires the same admin permissions used for + setup. Run this from another terminal with admin credentials. The script will show + what will be deleted and ask for confirmation before proceeding. Use `--dry-run` to + preview without deleting." + +If the user chooses to keep it: "Infrastructure will stay deployed. Next time you run a remote analysis, everything will be ready immediately." diff --git a/aws-transform/steering/workload-continuous-modernization-discovery.md b/aws-transform/steering/workload-continuous-modernization-discovery.md new file mode 100644 index 00000000..098dee03 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-discovery.md @@ -0,0 +1,40 @@ +--- +name: discovery +description: Scan/discover repositories from GitHub orgs, Gitlab group/user, or local folders. Lists repos with language, branch, and workflow info. +--- + +# Discovery + +## Prerequisites + +Check if the server is running with `atx ct status --health`. If any command fails with a connection error, use the `server` skill to start the server. + +## Local sources: path is set at `source add` time + +For local sources, the directory path is provided when the source is first added (`atx ct source add --provider local --name --path `). It's stored on the source and reused automatically by subsequent `discovery scan --source ` calls — no `--path` needed at scan time. + +**Prerequisite:** The source must have been added with `--path` first. If `discovery scan` errors with `Source "" has no rootPath configured`, this machine doesn't have a local rootPath yet for that source (typically because the source was originally added on another machine — `rootPath` is machine-specific). Resolve by running `atx ct source add --provider local --name --path ` on this machine, OR by passing `--path ` to the scan command (which will set and store the rootPath locally). + +**Override:** Pass `--path ` to `discovery scan` ONLY when you want to overwrite the stored path. This silently changes the source's `rootPath`. Confirm with the user before passing `--path` to a previously-registered local source. + +**Path must be a parent directory:** The path (whether at `source add` or `discovery scan`) must point to a directory that *contains* git repos as subdirectories — not to a repo itself. The scanner looks for child directories with `.git`. If the path points directly at a single repo, the scan returns 0 repos silently. If a user reports 0 repos found, verify their path points to the parent (e.g. `/home/user/repos`) not a repo directly (e.g. `/home/user/repos/my-app`). + +## Commands + +```bash +# Scan a local source (path was set at `source add` time and is reused automatically) +atx ct discovery scan --source + +# Override the stored rootPath (overwrites the source's path -- confirm with user first) +atx ct discovery scan --source --path + +# Scan a GitHub source (use the bare name from `source add --name`) +atx ct discovery scan --source + +# Check scan status +atx ct discovery status --source +``` + +## After discovery completes + +When discovery finds many repos, offer to label a group of repos for targeted analysis: "Want to label a group of repos to focus your analysis? For example, you can label repos by team, priority, or migration wave, then run analysis on just that group." Use the `/source` skill's repository commands to apply labels. This is optional — skip if the user wants to analyze everything. diff --git a/aws-transform/steering/workload-continuous-modernization-ec2-execution.md b/aws-transform/steering/workload-continuous-modernization-ec2-execution.md new file mode 100644 index 00000000..9bb41040 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-ec2-execution.md @@ -0,0 +1,2259 @@ +--- +name: ec2-execution +description: Run continuous modernization analysis or remediation on a single EC2 instance with one long-running atx-ct container. Customer launches the instance once (or reuses an existing one); agent submits work via SSM SendCommand. Each submission auto-polls and auto-uploads artifacts. +--- + +# continuous modernization EC2 Execution + +## Telemetry + +When running `atx ct analysis run` or `atx ct remediation create`, always include `--telemetry`. + +Format: `--telemetry "agent=,executionMode="` +- `agent` -- the AI assistant driving this session (lowercase, no spaces). Use the real assistant name -- e.g. kiro, claude, amazonq, copilot. +- `executionMode` -- `ec2` + +If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session. + +Run continuous modernization analysis or remediation on a single EC2 instance with **one long-running atx-ct container**. The customer provisions the instance via a **CloudFormation stack** (atomic deploy/rollback, single-command teardown) -- or reuses an existing one. The container hosts `atx ct server` and stays up between submissions. The agent submits work via SSM SendCommand -- each submission runs in the background on the instance and includes auto-upload of artifacts to S3 (for analysis on any provider, and for `--local` remediation; github/gitlab remediations push a result branch instead -- no S3 upload). + +## When to Use + +- Re-running multiple analyses against the same source(s) on the same compute +- Customer prefers a persistent dev box they can Session-Manager into +- Avoiding Batch cold-start (Fargate provisioning adds latency per job) +- Workloads that benefit from a warm container (multiple analyses without re-installing the CLI) + +For one-shot or fan-out workloads (many sources analyzed in parallel), use [continuous-modernization-batch-execution](workload-continuous-modernization-batch-execution.md) instead. + +## Architecture + +``` +Customer's local machine EC2 instance (CFN-managed) + ↓ atx ct source add ┌────────────────────────────────┐ + ↓ aws s3api create-bucket (idempotent) │ atx-ct container (long-running)│ + ↓ aws cloudformation create-stack │ - atx ct server (running) │ + │ │ +[Setup, once via CFN] │ CFN stack contains: │ + CFN provisions: IAM role, profile, │ - 1× EC2 instance │ + security group, EC2 instance. │ - 1× IAM role + profile │ + UserData installs Docker, pulls image, ────┼─→ - 1× security group │ + starts container, signals CREATE_COMPLETE │ (S3 buckets are outside the │ + │ stack -- persist across │ + │ delete-and-recreate) │ +[Per submission, via SSM] │ │ + build_command_*() returns a nohup'd ───────┼─→ background script: │ + chain; SSM returns immediately │ 1. fetch token from SecMgr │ + │ 2. atx ct analysis run │ +[Status check, on demand] │ 3. poll status until done │ + ssm_run "atx ct analysis get --id X" ──────┼─→ 4. /app/upload-ct-artifacts │ + └────────────────────────────────┘ +[Teardown] + aws cloudformation delete-stack + → instance + IAM + SG removed atomically + (S3 buckets and Secrets Manager entries persist) +``` + +The container persists across submissions. Customer can run many analyses (or one analysis followed by remediation) without re-provisioning anything. The build_command_*() functions encapsulate the entire submit → poll → upload flow as one nohup'd script, so the agent stays free during long-running work. CFN's `CreationPolicy` ensures `CREATE_COMPLETE` only fires after UserData verifies the container is up -- there's no "is the container ready yet?" race. + +## Provider Compatibility + +| Provider | Container setup at job time | Analysis output | Remediation flag | Remediation output | +|---|---|---|---|---| +| **github** | Fetch token from `atx/github-token` (Secrets Manager) → place in `~/.atxct/sources//github_token` | `code.zip` per repo in S3 | NO `--local` | Result branch pushed to source repo and PR is opened | +| **gitlab** | Fetch token from `atx/gitlab-token` → place in `~/.atxct/sources//gitlab_token` | `code.zip` per repo in S3 | NO `--local` | Result branch pushed to source repo and MR is opened | +| **bitbucket** | Fetch token from `atx/bitbucket-token` → place in `~/.atxct/sources//bitbucket_token` + inject `config.json` with email/username (Cloud) or base_url (DC) | `code.zip` per repo in S3 | NO `--local` | Result branch pushed to source repo and PR is opened | +| **local** | Pull bundle from S3 (Step 7), `discovery scan --path /home/atxuser/repos` | `code.zip` per repo in S3 | `--local` (required) | `code.zip` per repo in S3 | + +For github / gitlab / bitbucket, the customer must register the source on their own machine first via `atx ct source add --provider github|gitlab|bitbucket --org --token `. The token is then stored in Secrets Manager (Step 4) and fetched by the container at job time. atx ct's async provider resolution queries the backend for source metadata (provider type, base URL, identifier) at clone time. Bitbucket additionally requires a `config.json` with email/username (Cloud) or base_url (Data Center) injected into the container -- unlike github/gitlab where the backend has all needed metadata. + +## Multi-Worker Support + +**Default behavior (`WorkerCount=5`): five parallel containers** named `atx-ct-1` through `atx-ct-5`. Each runs its own atx ct server. This default provides headroom for parallel work without requiring later resize (which is destructive; see "Changing WorkerCount Requires Redeploy"). The agent targets a specific worker by setting `WORKER_NUM` (1..WorkerCount) before calling SSM helpers or `build_command_*()`. + +For customers with simpler workloads or tighter cost constraints, override `WORKER_COUNT=1` (single container named `atx-ct`, legacy behavior) or any value in 1-5. The auto-sized instance type and disk shrink to match. + +### Choosing WorkerCount + +| Customer intent | WorkerCount | +|---|---| +| "Scan my source" / "tech-debt-quick on source X" / "find vulnerabilities" | 1 is sufficient | +| Default (no specific parallelism request) | 5 (provides headroom; saves a future redeploy if customer later wants parallel work) | +| "Run tech-debt AND security in parallel" on the same source | 2+ (one per analysis type) | +| "Analyze sources A, B, C concurrently" | 3+ (one per source) | +| "Run a separate analysis on each repo in this source in parallel" | N where N is the repo count, capped at 5 | +| 6+ truly parallel jobs | Use the Batch path instead (Fargate scales to 64 concurrent) | + +If the customer's intent is unclear, ASK: "Do you want one analysis covering everything (single AID, simpler reporting), or N separate analyses running in parallel (one AID per item)?" Default to 5 if they have no preference; they pay for headroom but avoid a destructive resize later. + +When proposing a WorkerCount for a new CFN stack in the consent prompt, ALWAYS include this warning: "Note: WorkerCount is fixed at stack-create time. Changing it later requires the admin to redeploy the stack (which causes downtime)." This warning does NOT apply to the existing-instance path; there's no stack to redeploy, and WorkerCount can be changed by stopping/starting containers. + +### Sizing + +Each worker uses ~3-4 vCPU average and ~4-8 GB RAM for typical single-repo analyses, peaking at ~16 GB for monorepos. The skill auto-picks based on WorkerCount: + +- 1 worker: m5.2xlarge, 50 GB +- 2-4 workers: m5.4xlarge, 100-200 GB +- 5 workers: m5.8xlarge, 250-500 GB + +The pre-deploy confirmation prompt shows the resolved config before the customer commits. Customer can override `INSTANCE_TYPE` and `VOLUME_SIZE` env vars to deviate from the auto-pick. + +For monorepos (>5 GB working tree per repo) or running many source-wide analyses concurrently, override to `INSTANCE_TYPE=m5.12xlarge` (48 vCPU, 192 GB RAM) and `VOLUME_SIZE=1000`. The default sizing assumes typical single-repo fan-out work. Each worker has its own isolated filesystem inside its container, so repos cloned by worker 1 are not visible to worker 2. + +### Bridge Networking + +Multi-worker uses bridge networking (no `--net=host`) so each container has its own network namespace. The atx ct CLI to server communication is intra-container (`localhost`) so nothing outside needs to reach the server. SSM `docker exec` works the same as before. Single-worker (`WorkerCount=1`) also uses bridge mode, so there is no behavioral difference visible to the customer. + +### Changing WorkerCount Requires Redeploy + +**WorkerCount is fixed at stack-create time.** To change it, the customer runs a stack update which is destructive: the EC2 instance and EBS volume are REPLACED. + +```bash +# Stack update: REPLACEMENT update because UserData contains ${WorkerCount} +aws cloudformation update-stack --stack-name atx-runner \ + --use-previous-template \ + --parameters \ + ParameterKey=WorkerCount,ParameterValue=$NEW_COUNT \ + ParameterKey=InstanceType,UsePreviousValue=true \ + ParameterKey=VpcId,UsePreviousValue=true \ + ParameterKey=SubnetId,UsePreviousValue=true \ + ParameterKey=VolumeSizeGB,UsePreviousValue=true \ + --capabilities CAPABILITY_NAMED_IAM \ + --region $REGION +``` + +**What is preserved:** +- Findings, AIDs, RIDs (live on the atx ct backend, independent of EC2) +- S3 artifacts already uploaded (`atx-ct-output-${ACCOUNT_ID}//...`) +- IAM role, security group (in-place CFN updates) +- Source registrations in the backend +- Secrets in Secrets Manager (managed outside the stack) + +**What is LOST with the EBS volume:** +- Cloned repo caches in `~/.atxct/sources//repos/` +- Build tool caches (Maven `.m2`, Gradle, npm, etc.) +- atx ct internal artifact cache +- Any in-flight wrapper scripts (analyses CONTINUE on the backend, but the EC2-side polling and S3 upload die) + +**Before redeploying, the agent should:** + +1. **Check for in-flight work** across all workers: + ```bash + for i in $(seq 1 $WORKER_COUNT); do + if [ "$WORKER_COUNT" -eq 1 ]; then CONT="atx-ct"; else CONT="atx-ct-${i}"; fi + ssm_run "sudo docker exec $CONT atx ct analysis list --json | jq '.items[] | select(.status == \"running\" or .status == \"pending\") | .id'" 2>/dev/null + ssm_run "sudo docker exec $CONT atx ct remediation list --json | jq '.items[] | select(.status == \"running\" or .status == \"pending\") | .id'" 2>/dev/null + done + ``` + +2. **Warn the customer**: in-flight analyses on the backend will COMPLETE, but the EC2-side wrapper (polling + S3 upload) will be killed. Customer can re-trigger artifact upload after redeploy via (substitute `atx-ct` for `atx-ct-1` if WorkerCount=1): + ```bash + ssm_run "sudo docker exec atx-ct-1 /app/upload-ct-artifacts.sh atx-ct-output-${ACCOUNT_ID}" + ``` + +3. **Recommend Batch path** if the customer's workload is HIGHLY VARIABLE in parallelism. EC2 multi-worker is best when WorkerCount is set once and rarely changed. For elastic demand (e.g., "1 baseline, occasional spike to 8"), Batch's Fargate scaling is cheaper and avoids the redeploy. + +## Fan-out: Run Analysis on Each Repo in Parallel + +When the customer says "run analysis on each repo in source X in parallel" (one AID per repo, all running concurrently), the agent fans out N analyses across the available workers. The pattern below handles three cases automatically: (1) fresh stack provisioning, (2) running on an existing stack with enough workers, and (3) running on an existing stack where REPO_COUNT exceeds WORKER_COUNT (chunked round-robin distribution). + +```bash +# ────────────────────────────────────────────────────────────────────────── +# Step 1: Discover how many repos are in the source +# ────────────────────────────────────────────────────────────────────────── +# `discovery scan` outputs human-readable text (no --json flag in current CLI). +# Format of repo lines: " / " +# We extract the basename (after the last "/") because the --repo flag in Step 5 +# expects "::" format (without the org prefix). +REPOS=$(atx ct discovery scan --source "$LOGICAL_SOURCE_NAME" 2>&1 \ + | awk 'NF >= 2 && $1 ~ /\// { sub(/.*\//, "", $1); print $1 }') +REPO_COUNT=$(echo "$REPOS" | wc -l | tr -d ' ') +[ -z "$REPOS" ] && REPO_COUNT=0 # echo "" | wc -l returns 1, guard against empty +echo "Source $LOGICAL_SOURCE_NAME has $REPO_COUNT repos" + +if [ "$REPO_COUNT" -eq 0 ]; then + echo "ERROR: No repos found in source $LOGICAL_SOURCE_NAME. Verify the source is registered and discovery scan succeeded." + exit 1 +fi + +# ────────────────────────────────────────────────────────────────────────── +# Step 2: Determine WORKER_COUNT (read from existing stack, existing instance, or pick for new) +# ────────────────────────────────────────────────────────────────────────── +if aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION >/dev/null 2>&1; then + WORKER_COUNT=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION \ + --query 'Stacks[0].Parameters[?ParameterKey==`WorkerCount`].ParameterValue' --output text 2>/dev/null) + WORKER_COUNT=$(echo "$WORKER_COUNT" | xargs) # strip whitespace defensively + [ -z "$WORKER_COUNT" ] || [ "$WORKER_COUNT" = "None" ] && WORKER_COUNT=1 + echo "Using existing stack '$STACK_NAME' (WorkerCount=$WORKER_COUNT)" +elif [ -n "$INSTANCE_ID" ]; then + # Existing instance (no CFN stack). WORKER_COUNT was set in Step C.1 based on + # customer's choice and the instance's vCPU/RAM capacity. Default to 1 if not set. + WORKER_COUNT="${WORKER_COUNT:-1}" + echo "Using existing instance '$INSTANCE_ID' (WorkerCount=$WORKER_COUNT)" +else + # No stack yet, no existing instance: pick WORKER_COUNT to match REPO_COUNT, capped at 5 + WORKER_COUNT=$(( REPO_COUNT > 5 ? 5 : REPO_COUNT )) + echo "Will provision new stack with WorkerCount=$WORKER_COUNT" + # ... agent runs the "Create New Instance" flow with this WORKER_COUNT ... +fi + +# ────────────────────────────────────────────────────────────────────────── +# Step 3: Decide strategy based on REPO_COUNT vs WORKER_COUNT +# ────────────────────────────────────────────────────────────────────────── +REPOS_PER_WORKER=$(( (REPO_COUNT + WORKER_COUNT - 1) / WORKER_COUNT )) # ceiling + +if [ "$REPO_COUNT" -le "$WORKER_COUNT" ]; then + echo "Strategy: 1:1 fan-out ($REPO_COUNT repos across $WORKER_COUNT workers, $((WORKER_COUNT - REPO_COUNT)) idle)" +elif [ "$REPO_COUNT" -le $((WORKER_COUNT * 2)) ]; then + echo "Strategy: chunked (slight overflow, no infra change needed)" + echo " $REPO_COUNT repos across $WORKER_COUNT workers, ~${REPOS_PER_WORKER} repos per worker" + echo " Alternative: ASK customer if they prefer the Batch path." +else + echo "WARNING: $REPO_COUNT repos significantly exceeds $WORKER_COUNT workers." + echo " Strongly recommend the Batch path (handles up to 64 concurrent Fargate tasks)." + # Agent should pause here and ask the customer to choose chunked vs Batch +fi + +# ────────────────────────────────────────────────────────────────────────── +# Step 4: Round-robin distribution of repos across workers +# ────────────────────────────────────────────────────────────────────────── +# Round-robin (NOT chunked-by-REPOS_PER_WORKER) ensures even utilization. +# Example for 7 repos / 5 workers: +# workers 1-2 get 2 repos each (run sequentially within the worker) +# workers 3-5 get 1 repo each (then idle) +declare -A WORKER_REPOS +i=0 +for REPO in $REPOS; do + WORKER_NUM=$(( (i % WORKER_COUNT) + 1 )) + WORKER_REPOS[$WORKER_NUM]+="$REPO " + i=$((i + 1)) +done + +# ────────────────────────────────────────────────────────────────────────── +# Step 4.5: For local provider, pre-sync the repos bundle to ALL workers +# ────────────────────────────────────────────────────────────────────────── +# Local provider analyses require the repos bundle to be present in each +# worker's filesystem (containers are filesystem-isolated). github/gitlab/ +# bitbucket fetch repos via API at job time, so no pre-sync needed. +# +# IMPORTANT: this sync is IDEMPOTENT but SOURCE-AWARE. We track which source's +# repos are present via /home/atxuser/repos/.atx_source_marker: +# - If marker matches the current source AND repo dirs are present: skip sync +# (preserves atx ct's result-staging branches from prior analyses on this source) +# - If marker is missing, mismatched, or no repo dirs exist: wipe and re-sync +# (prevents source A's repos from contaminating source B's analysis) +# Counting uses `find -type d` so stray files (e.g., .DS_Store, lock files) don't +# falsely satisfy the "has repos" check. +if [ "$PROVIDER" = "local" ]; then + echo "Checking local-provider bundle state on all $WORKER_COUNT workers..." + ssm_run "for c in \$(sudo docker ps --filter name=atx-ct --format '{{.Names}}'); do \ + CURRENT_SOURCE=\$(sudo docker exec \$c cat /home/atxuser/repos/.atx_source_marker 2>/dev/null || echo ''); \ + REPO_COUNT=\$(sudo docker exec \$c bash -c 'find /home/atxuser/repos -mindepth 1 -maxdepth 1 -type d 2>/dev/null | wc -l' || echo 0); \ + if [ \"\$CURRENT_SOURCE\" = \"${LOGICAL_SOURCE_NAME}\" ] && [ \"\$REPO_COUNT\" -gt 0 ]; then \ + echo \" \$c: has ${LOGICAL_SOURCE_NAME} repos (\$REPO_COUNT dir(s)), skipping sync to preserve atx ct state\"; \ + else \ + echo \" \$c: syncing ${LOGICAL_SOURCE_NAME} bundle from S3 (was: \${CURRENT_SOURCE:-empty})\"; \ + sudo docker exec \$c bash -c 'rm -rf /home/atxuser/repos && mkdir -p /home/atxuser/repos /tmp/zips && \ + aws s3 sync s3://atx-source-code-${ACCOUNT_ID}/repos/ /tmp/zips/ && \ + for zip in /tmp/zips/*.zip; do unzip -q -o \"\\\$zip\" -d /home/atxuser/repos/; done && \ + echo ${LOGICAL_SOURCE_NAME} > /home/atxuser/repos/.atx_source_marker'; \ + fi; \ + done" +fi + +# ────────────────────────────────────────────────────────────────────────── +# Step 5: Submit ALL workers via a SINGLE SSM command (each handles its assigned repos serially) +# ────────────────────────────────────────────────────────────────────────── +# Why a single SSM command + double-fork instead of N per-worker submissions: +# SSM's process tracking holds the command slot until all descendants exit +# (the cgroup is empty). Per-worker SSM commands would each stay InProgress +# until each wrapper completes, saturating the SSM agent's worker pool +# (CommandWorkersLimit default 5) and blocking subsequent status-check calls. +# The `( ( bash X & ) & )` double-fork orphans the wrapper to init (PID 1), +# letting the SSM command return Success immediately while wrappers run. +TS=$(date +%s) +MASTER="#!/bin/bash +" +for WORKER_NUM in $(seq 1 $WORKER_COUNT); do + REPOS_FOR_THIS_WORKER="${WORKER_REPOS[$WORKER_NUM]}" + [ -z "$REPOS_FOR_THIS_WORKER" ] && continue # skip workers with no assignment + + # Resolve container name (legacy "atx-ct" if WorkerCount=1, else "atx-ct-N") + if [ "$WORKER_COUNT" -eq 1 ]; then + CONT="atx-ct" + else + CONT="atx-ct-${WORKER_NUM}" + fi + + # Build provider-specific TOKEN_PRELUDE (mirrors build_command_analysis()). + # github/gitlab: fetch token from Secrets Manager into ~/.atxct/sources//. + # bitbucket: fetch token + inject config.json with email/username (Cloud) or base_url (DC). + # local: no prelude (repos already pre-synced in Step 4.5). + TOKEN_PRELUDE="" + if [ "$PROVIDER" = "github" ] || [ "$PROVIDER" = "gitlab" ]; then + SECRET_ID="atx/${PROVIDER}-token" + TOKEN_FILE="${PROVIDER}_token" + TOKEN_PRELUDE="sudo docker exec ${CONT} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id ${SECRET_ID} --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE} && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE}'" + elif [ "$PROVIDER" = "bitbucket" ]; then + if [ -n "${BITBUCKET_BASE_URL}" ]; then + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + CONFIG_B64=$(echo "${config_json}" | base64 -w 0) + TOKEN_PRELUDE="sudo docker exec ${CONT} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token'" + fi + + JOB_ID="fan-w${WORKER_NUM}-${TS}" + + # Per-worker script: loops through its assigned repos and runs analysis on each. + # Continues on failure (so other repos still get analyzed). Each repo emits its own AID. + SCRIPT=$(cat <> \$LOG + +${TOKEN_PRELUDE} + +for REPO in ${REPOS_FOR_THIS_WORKER}; do + echo "--- \$(date) starting analysis on \$REPO (worker ${WORKER_NUM}) ---" >> \$LOG + sudo docker exec ${CONT} atx ct analysis run \\ + --type ${ANALYSIS_TYPE} ${EXTRA_FLAGS} --source ${LOGICAL_SOURCE_NAME} --repo "${LOGICAL_SOURCE_NAME}::\$REPO" --wait --telemetry "agent=${AGENT},executionMode=ec2" >> \$LOG 2>&1 + RC=\$? + AID=\$(tail -50 \$LOG | grep -oE '01[A-Z0-9]+' | tail -1) + echo "\$REPO -> \$AID (rc=\$RC)" >> \$AIDS_FILE + if [ "${UPLOAD_ARTIFACTS:-true}" = "true" ] && [ "${ANALYSIS_TYPE}" != "tech-debt-quick" ] && [ -n "\$AID" ] && [ \$RC -eq 0 ]; then + sudo docker exec ${CONT} /app/upload-ct-artifacts.sh \$AID atx-ct-output-${ACCOUNT_ID} >> \$LOG 2>&1 + fi +done + +echo "=== \$(date) [DONE] worker ${WORKER_NUM} ===" >> \$LOG +EOF +) + + B64=$(echo "$SCRIPT" | base64 | tr -d '\n') + # Append decode + double-fork stanza for this worker to the master launcher + MASTER+="echo ${B64} | base64 -d > /tmp/${JOB_ID}.sh && chmod +x /tmp/${JOB_ID}.sh && ( ( bash /tmp/${JOB_ID}.sh > /tmp/${JOB_ID}.stdout 2>&1 < /dev/null & ) & ) && echo Launched_w${WORKER_NUM} +" +done +MASTER+="echo ALL_LAUNCHED" + +# Submit the master launcher as a SINGLE SSM command. Wrappers run as orphaned +# background processes; this command itself exits immediately. +MASTER_B64=$(echo "$MASTER" | base64 | tr -d '\n') +SUBMIT_ID=$(ssm_submit "echo ${MASTER_B64} | base64 -d > /tmp/master-${TS}.sh && chmod +x /tmp/master-${TS}.sh && bash /tmp/master-${TS}.sh") +echo "Launched all $WORKER_COUNT workers via single SSM command (id: $SUBMIT_ID)" + +# ────────────────────────────────────────────────────────────────────────── +# Step 6: Poll status across all workers (agent helper) +# ────────────────────────────────────────────────────────────────────────── +# Each worker writes AIDs to /tmp/atxct-fan-w${N}-${TS}.aids as repos finish. +# To check overall progress: +# for w in $(seq 1 $WORKER_COUNT); do +# ssm_run "cat /tmp/atxct-fan-w${w}-*.aids 2>/dev/null" +# done +# When all workers' AIDS files match their assigned repo count, fan-out is complete. +``` + +The agent should **report all N AIDs back** to the customer once each worker emits them (typically as each repo finishes, staggered as workers complete their assigned repos in sequence). + +## Fan-out: Run Remediation on Each Repo in Parallel + +Same round-robin distribution as the analysis fan-out, but each worker runs `atx ct remediation create` (not `atx ct analysis run`). The provider differences match `build_command_remediation()`: github/gitlab/bitbucket push result branches automatically (no S3 upload); local provider requires explicit `--local` flag and S3 upload of the remediated code. + +Use this when the customer says "remediate each repo in source X in parallel" (one RID per repo). Steps 1-4 are identical to the analysis fan-out (discover repos, determine WORKER_COUNT, decide strategy, round-robin distribute). Step 4.5 (local-provider repo sync) also applies and is source-aware: if a prior analysis on the same source already populated `/home/atxuser/repos/` on each worker, Step 4.5 skips the re-sync to preserve atx ct's branch state from that operation. If the customer switched to a different source, Step 4.5 wipes and re-syncs to prevent contamination. The differences are in Step 5 below. + +**`--transformation-name` vs `--ids` mode**: the fan-out below runs Mode 3 from [continuous-modernization-remediation](workload-continuous-modernization-remediation.md), which uses `--transformation-name` per repo with `--repo "::"`. For finding-driven remediation (Modes 1 or 2), distribute finding IDs across workers in chunks instead of repos: each worker calls `atx ct remediation create --ids ` WITHOUT `--repo` (atx ct rejects `--repo` with `--ids` because repos are derived from findings). Extract auto-remediable IDs from a prior analysis with: `atx ct findings list --analysis-id $AID --json | jq -r '.[] | select(.auto_remediable == true) | .id'`. + +```bash +# Steps 1-4.5: same as analysis fan-out (REPOS, REPO_COUNT, WORKER_COUNT, +# WORKER_REPOS round-robin assignment, local-provider repo sync if applicable) + +# Build CREATE_ARGS based on remediation mode (mirrors build_command_remediation) +if [ -n "$FINDING_IDS" ]; then + CREATE_ARGS_BASE="--ids ${FINDING_IDS}" + [ -n "${TRANSFORMATION_NAME}" ] && CREATE_ARGS_BASE="${CREATE_ARGS_BASE} --transformation-name ${TRANSFORMATION_NAME}" +else + CREATE_ARGS_BASE="--transformation-name ${TRANSFORMATION_NAME}" +fi +[ -n "${CONFIGURATION}" ] && CREATE_ARGS_BASE="${CREATE_ARGS_BASE} -g \"${CONFIGURATION}\"" + +# Provider-aware --local flag and upload behavior +LOCAL_FLAG="" +UPLOAD_REMED_LINE='echo "[skip upload: github/gitlab/bitbucket pushes results to source repo]"' +if [ "$PROVIDER" = "local" ]; then + LOCAL_FLAG="--local" + # UPLOAD_REMED_LINE set per-worker below (uses ${CONT}) +fi + +# ────────────────────────────────────────────────────────────────────────── +# Step 5: Submit ALL remediations via a SINGLE SSM command (each worker handles assigned repos serially) +# ────────────────────────────────────────────────────────────────────────── +# Same single-SSM + double-fork rationale as the analysis fan-out: one command +# submission orphans all wrappers to init so SSM returns immediately and status +# checks aren't queued behind the wrappers. +TS=$(date +%s) +MASTER="#!/bin/bash +" +for WORKER_NUM in $(seq 1 $WORKER_COUNT); do + REPOS_FOR_THIS_WORKER="${WORKER_REPOS[$WORKER_NUM]}" + [ -z "$REPOS_FOR_THIS_WORKER" ] && continue + + # Resolve container name (same as analysis fan-out) + if [ "$WORKER_COUNT" -eq 1 ]; then + CONT="atx-ct" + else + CONT="atx-ct-${WORKER_NUM}" + fi + + # Build TOKEN_PRELUDE (same as analysis fan-out's logic for github/gitlab/bitbucket) + TOKEN_PRELUDE="" + if [ "$PROVIDER" = "github" ] || [ "$PROVIDER" = "gitlab" ]; then + SECRET_ID="atx/${PROVIDER}-token" + TOKEN_FILE="${PROVIDER}_token" + TOKEN_PRELUDE="sudo docker exec ${CONT} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id ${SECRET_ID} --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE} && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE}'" + elif [ "$PROVIDER" = "bitbucket" ]; then + if [ -n "${BITBUCKET_BASE_URL}" ]; then + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + CONFIG_B64=$(echo "${config_json}" | base64 -w 0) + TOKEN_PRELUDE="sudo docker exec ${CONT} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token'" + fi + + # Per-worker upload line (only meaningful for local provider) + WORKER_UPLOAD_LINE='echo "[skip upload: github/gitlab/bitbucket pushes to source repo]"' + if [ "$PROVIDER" = "local" ]; then + WORKER_UPLOAD_LINE="sudo docker exec ${CONT} /app/upload-ct-artifacts.sh \\\$RID atx-ct-output-${ACCOUNT_ID}" + fi + + JOB_ID="fan-rem-w${WORKER_NUM}-${TS}" + + # Per-worker remediation script: loops through assigned repos. + # Continues on failure. Polls each remediation until terminal (no --wait support + # for `remediation create`). Uploads to S3 only for local provider. + SCRIPT=$(cat <> \$LOG + +${TOKEN_PRELUDE} + +for REPO in ${REPOS_FOR_THIS_WORKER}; do + echo "--- \$(date) starting remediation on \$REPO (worker ${WORKER_NUM}) ---" >> \$LOG + sudo docker exec ${CONT} atx ct remediation create \\ + ${CREATE_ARGS_BASE} ${LOCAL_FLAG} --source ${LOGICAL_SOURCE_NAME} --repo "${LOGICAL_SOURCE_NAME}::\$REPO" --telemetry "agent=${AGENT},executionMode=ec2" >> \$LOG 2>&1 + RC=\$? + RID=\$(tail -50 \$LOG | grep -oE '01[A-Z0-9]+' | tail -1) + echo "\$REPO -> \$RID (rc=\$RC)" >> \$RIDS_FILE + + # Poll until terminal (atx ct remediation create has no --wait flag) + if [ -n "\$RID" ] && [ \$RC -eq 0 ]; then + while true; do + STATUS=\$(sudo docker exec ${CONT} atx ct remediation status --id \$RID --json 2>/dev/null | jq -r .status 2>/dev/null) + case "\$STATUS" in + complete|completed|failed|cancelled) break ;; + esac + sleep 30 + done + echo "\$REPO -> \$RID terminal: \$STATUS" >> \$LOG + + # Upload artifacts (only meaningful for local; github/gitlab/bitbucket skip) + if [ "\$STATUS" = "complete" ] || [ "\$STATUS" = "completed" ]; then + ${WORKER_UPLOAD_LINE} >> \$LOG 2>&1 + fi + fi +done + +echo "=== \$(date) [DONE] worker ${WORKER_NUM} ===" >> \$LOG +EOF +) + + B64=$(echo "$SCRIPT" | base64 | tr -d '\n') + # Append decode + double-fork stanza for this worker to the master launcher + MASTER+="echo ${B64} | base64 -d > /tmp/${JOB_ID}.sh && chmod +x /tmp/${JOB_ID}.sh && ( ( bash /tmp/${JOB_ID}.sh > /tmp/${JOB_ID}.stdout 2>&1 < /dev/null & ) & ) && echo Launched_w${WORKER_NUM} +" +done +MASTER+="echo ALL_LAUNCHED" + +# Submit the master launcher as a SINGLE SSM command. Wrappers run as orphaned +# background processes; this command itself exits immediately. +MASTER_B64=$(echo "$MASTER" | base64 | tr -d '\n') +SUBMIT_ID=$(ssm_submit "echo ${MASTER_B64} | base64 -d > /tmp/master-${TS}.sh && chmod +x /tmp/master-${TS}.sh && bash /tmp/master-${TS}.sh") +echo "Launched all $WORKER_COUNT remediation workers via single SSM command (id: $SUBMIT_ID)" + +# Step 6: Poll status across workers (same pattern as analysis fan-out) +# Each worker writes RIDs to /tmp/atxct-fan-rem-w${N}-${TS}.rids as repos finish. +``` + +The agent should **report all N RIDs back** to the customer. For github/gitlab/bitbucket, customers will see N PRs/MRs created in their source provider. For local, customers can download the remediated code from `s3://atx-ct-output-${ACCOUNT_ID}//` after each remediation completes. + +## Setup Paths + +Step 0 below detects which one of three states applies and routes accordingly: + +| Path | Trigger | Admin handoff? | +|------|---------|----------------| +| **Operate existing CFN stack** | `describe-stacks atx-runner` returns `CREATE_COMPLETE` | No -- executor creds suffice | +| **Use existing non-CFN instance** | No stack, customer has their own EC2 instance | Only if the instance lacks the `atx-remote-infra=true` tag | +| **Provision a new CFN stack** | No stack, no existing instance | Yes -- admin runs `aws cloudformation create-stack` once | + +## Two-Persona Permission Model + +Provisioning and operating the runner are split across two distinct IAM personas. The skill respects this split: the agent NEVER asks the executor to run a privileged provisioning command, and NEVER asks the admin to run a routine job submission. + +| Persona | Managed policy | Owns | When used | +|---|---|---|---| +| **Admin** | (any identity with the permissions listed below -- typically full admin / `AdministratorAccess` or an org-scoped admin role) | All resource-lifecycle mutations: `iam:Create/Put/Delete*` on `atx-transform-*`, `cloudformation:CreateStack`/`DeleteStack`, `ec2:RunInstances`/`CreateSecurityGroup`/`CreateKeyPair`/`TerminateInstances`/`DeleteSecurityGroup`/`DeleteKeyPair`, `s3:CreateBucket`+lifecycle, `scheduler:CreateScheduleGroup`, `ec2:AssociateIamInstanceProfile`/`ModifyInstanceMetadataOptions` | Initial setup (buckets + stack) and teardown (`delete-stack`) | +| **Executor** | (a least-privilege role scoped to the actions listed below) | Read-only CFN/EC2/SSM, SSM SendCommand on tagged, S3 data plane on `atx-*` buckets, `secretsmanager:GetSecretValue`/`DescribeSecret` on `atx/*` (read only), `ec2:Start/StopInstances` on tagged (power state), KMS-via-alias, `scheduler:CreateSchedule`/`DeleteSchedule`/`GetSchedule`/`UpdateSchedule`/`ListSchedules` (scoped to `atx-control-tower` group), `iam:PassRole` to `atx-transform-role*` (EC2) and `AtxSchedulerInvocationRole` (scheduler) only | Every analysis / remediation submission, status check, artifact fetch, schedule pause/resume | + +The executor policy has **zero IAM mutations and zero resource-lifecycle creations**. Privilege-escalation surface is bounded to the admin handoff at stack create/delete, not the day-to-day developer flow. + +### Skill flow at every entry + +``` +Entry (executor creds -- agent always assumes least privilege) + │ + └─ Step 5: DETECT (read-only -- describe-stacks) + ├─ NOT_DEPLOYED ─────► PROVISION + │ Agent prints template + create-stack command + │ with admin caveat. STOPS. User runs it + │ with admin creds outside the agent. User + │ re-enters the flow when done. + │ + └─ EXISTS & healthy ──► OPERATE + Steps 6–10 run with executor creds only. +``` + +Detection is read-only and always safe. Only `create-stack` and `delete-stack` need admin, and the agent hands those to the user. + +## Step 0: Detect Existing Infrastructure + +ALWAYS run this first, on every entry to the flow. Uses only read-only calls (`cloudformation:DescribeStacks`, `ec2:DescribeInstances`) -- both in the executor policy, so it never fails on permissions. The result decides which path the rest of the skill takes. + +```bash +STACK_NAME="${STACK_NAME:-atx-runner}" +REGION="${REGION:-$(aws configure get region || echo us-east-1)}" +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) + +STACK_STATUS=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION \ + --query 'Stacks[0].StackStatus' --output text 2>/dev/null) + +case "$STACK_STATUS" in + CREATE_COMPLETE|UPDATE_COMPLETE) + INSTANCE_ID=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION \ + --query 'Stacks[0].Outputs[?OutputKey==`InstanceId`].OutputValue' --output text) + echo "OPERATE (Case A) -- CFN stack '$STACK_NAME' exists. Instance: $INSTANCE_ID" + # Skip to Step 6 (Verify the Container is Running). Executor creds suffice. + ;; + CREATE_IN_PROGRESS|UPDATE_IN_PROGRESS) + echo "WAIT -- stack is mid-transition ($STACK_STATUS). Re-run when the admin's" + echo " deploy finishes:" + echo " aws cloudformation wait stack-create-complete \\" + echo " --stack-name $STACK_NAME --region $REGION" + exit 0 + ;; + DELETE_IN_PROGRESS) + echo "WAIT -- stack is being deleted ($STACK_STATUS). Wait for deletion to finish," + echo " then re-run this flow to provision a new stack:" + echo " aws cloudformation wait stack-delete-complete \\" + echo " --stack-name $STACK_NAME --region $REGION" + exit 0 + ;; + CREATE_FAILED|*ROLLBACK*|DELETE_FAILED) + echo "BLOCKED -- stack is in $STACK_STATUS. The admin must clean it up first:" + echo " aws cloudformation describe-stack-events --stack-name $STACK_NAME --region $REGION" + echo " aws cloudformation delete-stack --stack-name $STACK_NAME --region $REGION # admin" + exit 1 + ;; + "") + # No CFN stack -- agent asks which of the two remaining paths applies. + ;; +esac +``` + +If `STACK_STATUS` was empty, the agent MUST ask the customer: + +> "I don't see an `atx-runner` CFN stack in `${REGION}`. Which path applies? +> 1. **Reuse an existing EC2 instance** I already have (launched outside CFN -- e.g., my org's standard EC2, or a dev box I already use) +> 2. **Create a new CFN-managed runner from scratch** (requires admin to run a one-time deploy command outside the agent)" + +| Customer answer | Path | Admin needed? | +|---|---|---| +| 1. Existing instance | OPERATE -- see [Use Existing Instance (no CFN)](#use-existing-instance-no-cfn), then Step 6 | Only if the instance lacks the `atx-remote-infra=true` tag or the `atx-transform-access` inline policy on its role (Step C.0 detects and emits one combined handoff) | +| 2. Create new CFN stack | PROVISION -- Steps 1–5 below | **Yes**, for one command at the end of Step 5 | + +Steps 1–5 below only apply to path 2. + +## Provision Lifecycle (Steps 1–5, fresh CFN stack) + +These steps run **only when the customer chose path 2 in Step 0**. Steps 1–4 stay on executor creds -- they collect inputs and stage credentials. Step 5 then prints a self-contained `aws cloudformation create-stack` command for the customer's **admin** to run; the agent does NOT execute it. + +### Step 1: Verify Source and Enumerate Repos + +Confirm a source is registered locally (see [continuous-modernization-source.md](workload-continuous-modernization-source.md)) and run discovery (see [continuous-modernization-discovery.md](workload-continuous-modernization-discovery.md)) to get the list of repos. This determines instance size (Step 2) and gives the customer visibility into what will be analyzed. + +```bash +atx ct source list +``` + +Show the list to the customer and ask: +1. **Which source to analyze?** (pick from the list above) +2. **Source type:** GitHub, GitLab, Bitbucket, or local +3. **Analysis type:** `tech-debt-comprehensive`, `tech-debt-quick`, `security`, `agentic-readiness`, `modernization-readiness`, or `custom` + +If the list is empty or the customer wants to register a new source, run `atx ct source add` via the [continuous-modernization-source](workload-continuous-modernization-source.md) skill, then return here. + +```bash +LOGICAL_SOURCE_NAME="" + +atx ct discovery scan --source "$LOGICAL_SOURCE_NAME" + +mapfile -t REPOS < <(atx ct repository list --source "$LOGICAL_SOURCE_NAME" --json | jq -r '.items[].full_name') +REPO_COUNT=${#REPOS[@]} + +echo "Source ${LOGICAL_SOURCE_NAME} has ${REPO_COUNT} repos" +``` + +If the customer wants only a subset, set `REPO_FILTER="--repo ::"` for use in Step 8. The `--repo` flag accepts exactly ONE repo per invocation -- to analyze multiple repos in parallel, use the fan-out pattern (one submission per repo across workers). Empty = analyze the whole source. + +### Step 2: Determine Instance Size + +Default: `m5.2xlarge` (8 vCPU, 32 GB RAM) -- handles repo counts of any size for sequential execution. If a repo is unusually large (>5 GB working tree, e.g., a monorepo), bump up by one tier for RAM headroom. + +```bash +INSTANCE_TYPE="m5.2xlarge" +``` + +### Step 3: Confirm Account and Plan + +```bash +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +REGION=$(aws configure get region || echo "us-east-1") +``` + +Tell the user: + +> "I'll set up CT analysis on a new EC2 instance in account `${ACCOUNT_ID}`, region `${REGION}`. This includes: +> - IAM role + instance profile (`atx-transform-role`, `atx-transform-profile`) with SSM Managed Instance Core for shell access +> - Security group (no inbound ports -- SSM SendCommand handles all access via outbound HTTPS) +> - `${INSTANCE_TYPE}` EC2 instance with 100GB volume +> - S3 source bucket (only if local source) and atx-ct-output bucket (always) +> +> Continue?" + +Wait for explicit confirmation. + +### Step 4: Prep Credentials + +Give the user the relevant command below to run in their own terminal -- do not ask them to paste the token into this chat. + +Tokens are stored in AWS Secrets Manager and fetched by the container at job submission time (the EC2 instance role has `secretsmanager:GetSecretValue` for `atx/*`). This is the same pattern as the batch skill -- store once, use for any number of analyses without re-staging files. + +**GitHub HTTPS -- store the PAT:** + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "atx/github-token" \ + --secret-string "$TOKEN" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/github-token" \ + --secret-string "$TOKEN"; }; unset TOKEN +``` + +**GitLab HTTPS -- store the PAT:** + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "atx/gitlab-token" \ + --secret-string "$TOKEN" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/gitlab-token" \ + --secret-string "$TOKEN"; }; unset TOKEN +``` + +**Bitbucket -- store the API token** (the container fetches it from Secrets Manager at job start). Email and username are injected into the container command directly (not secrets -- they're non-sensitive identifiers): + +```bash +read -s TOKEN && { aws secretsmanager create-secret --name "atx/bitbucket-token" \ + --secret-string "$TOKEN" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/bitbucket-token" \ + --secret-string "$TOKEN"; }; unset TOKEN +``` + +**SSH -- store the private key:** + +```bash +aws secretsmanager create-secret --name "atx/ssh-key" \ + --secret-string "$(cat )" 2>/dev/null \ + || aws secretsmanager put-secret-value --secret-id "atx/ssh-key" \ + --secret-string "$(cat )" +``` + +**Private package registries** (if the analysis builds the project): see [custom-remote-execution#private-package-registries](custom-remote-execution.md#private-package-registries) for the `atx/credentials` JSON pattern. + +A single token can be used for any number of sources of the same provider (e.g., one PAT for all your GitHub orgs). The build_command_*() in Step 8 fetches the token and writes it to the source-specific path the CT server expects. + +### Step 5: Provision Infrastructure via CloudFormation + +Provisioning is done through a single CloudFormation stack -- atomic deploy/rollback, single command teardown, full visibility in the customer's CloudFormation console. The stack provisions: + +- IAM role + instance profile (with transform-custom + S3 + KMS + Secrets Manager + securityagent permissions and `AmazonSSMManagedInstanceCore` attached for SSM) +- Security group (no inbound; outbound default allow) +- EC2 instance (Amazon Linux 2023, 100 GB gp3 volume) +- UserData that installs Docker, pulls the atx-ct image, and starts the container + +S3 buckets (`atx-source-code-${ACCOUNT_ID}`, `atx-ct-output-${ACCOUNT_ID}`) are managed OUTSIDE the stack -- they hold persistent customer data, must survive stack delete-and-recreate, and are shared across multiple stacks if the customer ever runs more than one. + +**Step 5a: Check whether S3 buckets exist (executor: read-only):** + +`s3:CreateBucket` and `s3:PutLifecycleConfiguration` are in the **admin** policy, not executor. Bucket creation is bundled into the admin handoff in Step 5d so the admin runs all the privileged setup commands in one shell session. Here, the agent only checks whether the buckets already exist (`head-bucket` is read-only). + +```bash +SOURCE_BUCKET_EXISTS=$(aws s3api head-bucket --bucket atx-source-code-${ACCOUNT_ID} 2>/dev/null && echo yes || echo no) +OUTPUT_BUCKET_EXISTS=$(aws s3api head-bucket --bucket atx-ct-output-${ACCOUNT_ID} 2>/dev/null && echo yes || echo no) +echo "atx-source-code-${ACCOUNT_ID}: $SOURCE_BUCKET_EXISTS" +echo "atx-ct-output-${ACCOUNT_ID}: $OUTPUT_BUCKET_EXISTS" +``` + +If both are `yes`, the agent omits the bucket-creation lines from Step 5d's admin handoff (they're idempotent, but cleaner to omit). If either is `no`, the admin handoff includes the `aws s3api create-bucket` + `put-bucket-lifecycle-configuration` calls before the `cloudformation create-stack` line. + +**Step 5b: Cascading list-and-pick -- VPC, then subnet, then security group, then final confirmation.** + +The skill **never creates a VPC, subnet, or NAT** -- those are customer-owned network resources. The agent's job is to **list what already exists in the account**, let the customer pick each, and run validations on the chosen network before the admin handoff fires. + +The flow is cascading: pick VPC first (so subnet/SG lists can be filtered to that VPC), then subnet, then SG, then a final summary the customer must explicitly confirm before Step 5c proceeds. Customers with self-hosted / internal git hosts can pick a VPC that has VPN / Direct Connect / peering to that host -- same flow, same UX, the customer just picks a different VPC. + +**MANDATORY interaction rules. The agent MUST follow these without exception:** + +- **The agent MUST NOT pre-select** a VPC, subnet, or security group -- not even if "obvious," "functionally equivalent," or "sensible default." Every choice belongs to the customer. +- **The agent MUST present each list and STOP**, waiting for the customer to type their explicit choice. No proceeding to the next step until the customer has answered the current one. +- **After the third selection** (security group), the agent MUST display all four selections (VPC, subnet, SG, AZ) in a final summary and **explicitly ask "proceed with these?"**. The agent MUST NOT advance to Step 5c (write CFN template) until the customer types `yes` or equivalent. +- **If the agent is inclined to skip an ask** ("they said use default VPC, I can pick the subnet myself"): STOP. The customer's "use default VPC" answer is ONLY about the VPC. Subnet and SG remain unanswered until the customer types those choices too. + +```bash +EXISTING_SG_ID="${EXISTING_SG_ID:-}" # empty means stack creates a new no-inbound SG + +# ────────────────────────────────────────────────────────────────────────── +# 1. List VPCs in the account+region. Show ID, default flag, Name tag. +# ────────────────────────────────────────────────────────────────────────── +echo "VPCs available in account ${ACCOUNT_ID}, region ${REGION}:" +aws ec2 describe-vpcs --region $REGION \ + --query 'Vpcs[*].{VpcId:VpcId,Default:IsDefault,Cidr:CidrBlock,Name:Tags[?Key==`Name`]|[0].Value}' \ + --output table + +VPC_COUNT=$(aws ec2 describe-vpcs --region $REGION --query 'length(Vpcs)' --output text) +DEFAULT_VPC_COUNT=$(aws ec2 describe-vpcs --region $REGION --filters Name=isDefault,Values=true --query 'length(Vpcs)' --output text) +``` + +**Two account-state cases the agent MUST handle explicitly before continuing:** + +**Case 1 -- `VPC_COUNT=0`: no VPCs at all in this account+region.** + +The skill **never creates VPCs** -- that's an infrastructure decision the customer's network team must make. The agent MUST stop and tell the customer: + +> "There are no VPCs in account `${ACCOUNT_ID}`, region `${REGION}`. +> +> The skill cannot proceed without a VPC, and it does NOT auto-create one -- VPCs are foundational network infrastructure that should be set up by your network or platform team, not by an analysis tool. +> +> Ask whoever owns AWS networking in your org to: +> 1. Create a VPC (or restore the deleted default VPC) in this region +> 2. Add at least one subnet with outbound internet access (NAT gateway, internet gateway, or transit gateway) +> 3. Optionally, prepare a security group with allow-all-egress on TCP 443 (or scope to specific endpoints) +> +> Once that exists, come back to this conversation and I'll re-run the VPC list." + +The agent then STOPS this turn. Don't try to work around it. + +**Case 2 -- `VPC_COUNT≥1` but `DEFAULT_VPC_COUNT=0`: VPCs exist but none are marked default.** + +This is normal in enterprise accounts where the default VPC was deliberately deleted (security baseline, AWS Landing Zone, Control Tower OUs). The customer just needs to pick one of the non-default VPCs. The agent presents the list with the same ASK as the next step -- the absence of a default VPC isn't an error, just slightly different framing: + +> "I don't see a default VPC in this region -- that's normal in enterprise accounts. Here are the VPCs that DO exist; please pick the one your runner should deploy into." + +The list-and-pick flow below handles both Case 2 and the simple-default case. Only Case 1 stops the flow. + +```bash +if [ "$VPC_COUNT" = "0" ]; then + echo "ERROR: No VPCs in account ${ACCOUNT_ID}, region ${REGION}." + echo " The skill does NOT auto-create VPCs. Ask your network team to provision" + echo " a VPC + subnet (with NAT/IGW/TGW egress) before re-running." + exit 1 +fi +``` + +**STOP HERE.** The agent MUST present the list above and ask the customer which VPC to use. The agent MUST NOT proceed to listing subnets until the customer has typed a VPC ID. Suggested phrasing: + +> "Here are the VPCs in your account. Which one should the runner be deployed in? +> If you have a self-hosted git host (GHES, GitLab self-managed, Bitbucket DC), pick the VPC that has VPN / Direct Connect / peering routes to it. +> If you're using public github/gitlab/bitbucket, the default VPC works, but you may prefer a workload VPC for better network isolation. +> Please reply with the VPC ID." + +```bash +read -p "VPC ID: " VPC_ID + +# Verify the customer's choice exists +VPC_EXISTS=$(aws ec2 describe-vpcs --vpc-ids "$VPC_ID" --region $REGION \ + --query 'Vpcs[0].VpcId' --output text 2>/dev/null) +if [ "$VPC_EXISTS" != "$VPC_ID" ]; then + echo "ERROR: VPC $VPC_ID not found in $REGION." + exit 1 +fi + +# ────────────────────────────────────────────────────────────────────────── +# 2. List subnets in the chosen VPC. Show ID, AZ, CIDR, public flag, Name. +# ────────────────────────────────────────────────────────────────────────── +echo "" +echo "Subnets in $VPC_ID:" +aws ec2 describe-subnets --region $REGION --filters "Name=vpc-id,Values=$VPC_ID" \ + --query 'Subnets[*].{SubnetId:SubnetId,AZ:AvailabilityZone,Cidr:CidrBlock,Public:MapPublicIpOnLaunch,Name:Tags[?Key==`Name`]|[0].Value}' \ + --output table + +SUBNET_COUNT=$(aws ec2 describe-subnets --region $REGION --filters "Name=vpc-id,Values=$VPC_ID" --query 'length(Subnets)' --output text) +if [ "$SUBNET_COUNT" = "0" ]; then + echo "ERROR: VPC $VPC_ID has no subnets. The skill cannot create one." + echo " Customer's network team must add a subnet. Bail out." + exit 1 +fi +``` + +**STOP HERE.** The agent MUST present the list above and ask the customer which subnet to use. The agent MUST NOT pre-pick "the first one" or "any of the AZ-a subnets" -- every subnet is the customer's call. The agent MUST NOT proceed to listing security groups until the customer has typed a subnet ID. Suggested phrasing: + +> "Pick a subnet for the runner. The subnet must have outbound internet access (NAT gateway, internet gateway, or transit gateway) so the runner can pull the atx-ct image from ECR Public, reach the atx ct backend, and talk to S3 / Secrets Manager. +> Public subnets (`Public: True`) auto-assign public IPs -- easiest for image pull, but exposes the instance to the internet. +> Private subnets (`Public: False`) need NAT or TGW egress -- typical for production workloads. +> Please reply with the subnet ID." + +```bash +read -p "Subnet ID: " SUBNET_ID + +# Validation #1: subnet is actually in the chosen VPC. +SUBNET_VPC=$(aws ec2 describe-subnets --subnet-ids "$SUBNET_ID" --region $REGION \ + --query 'Subnets[0].VpcId' --output text 2>/dev/null) +if [ "$SUBNET_VPC" != "$VPC_ID" ]; then + echo "ERROR: subnet $SUBNET_ID is not in VPC $VPC_ID (or doesn't exist in $REGION)." + exit 1 +fi +SUBNET_AZ=$(aws ec2 describe-subnets --subnet-ids "$SUBNET_ID" --region $REGION \ + --query 'Subnets[0].AvailabilityZone' --output text) +echo " ✓ Subnet $SUBNET_ID is in $SUBNET_VPC, AZ $SUBNET_AZ." + +# Validation #2: subnet has a default route (egress exists). +ROUTE_TABLE_ID=$(aws ec2 describe-route-tables --region $REGION \ + --filters "Name=association.subnet-id,Values=$SUBNET_ID" \ + --query 'RouteTables[0].RouteTableId' --output text 2>/dev/null) +if [ "$ROUTE_TABLE_ID" = "None" ] || [ -z "$ROUTE_TABLE_ID" ]; then + # Subnet has no explicit association; it inherits the VPC's main route table. + ROUTE_TABLE_ID=$(aws ec2 describe-route-tables --region $REGION \ + --filters "Name=vpc-id,Values=$VPC_ID" "Name=association.main,Values=true" \ + --query 'RouteTables[0].RouteTableId' --output text) +fi +DEFAULT_ROUTE=$(aws ec2 describe-route-tables --route-table-ids "$ROUTE_TABLE_ID" --region $REGION \ + --query "RouteTables[0].Routes[?DestinationCidrBlock=='0.0.0.0/0'] | [0]" --output json) +EGRESS_TARGET=$(echo "$DEFAULT_ROUTE" | jq -r '.GatewayId // .NatGatewayId // .TransitGatewayId // .VpcPeeringConnectionId // "MISSING"') +if [ "$EGRESS_TARGET" = "MISSING" ] || [ "$EGRESS_TARGET" = "null" ]; then + echo "ERROR: subnet $SUBNET_ID has no default route (0.0.0.0/0). The runner won't" + echo " reach atx ct backend / ECR / S3. The customer's network team must add" + echo " a NAT gateway, internet gateway, or transit gateway route before deploying." + echo " (We do NOT auto-provision NAT -- those are real network changes.)" + exit 1 +fi +echo " ✓ Subnet has default route via $EGRESS_TARGET." + +# ────────────────────────────────────────────────────────────────────────── +# 3. List security groups in the chosen VPC. Show ID, Name, description. +# ────────────────────────────────────────────────────────────────────────── +echo "" +echo "Security groups in $VPC_ID:" +aws ec2 describe-security-groups --region $REGION --filters "Name=vpc-id,Values=$VPC_ID" \ + --query 'SecurityGroups[*].{GroupId:GroupId,Name:GroupName,Description:Description}' \ + --output table +``` + +**STOP HERE.** The agent MUST present the list above and ask the customer which security group to reuse, or whether to let the stack create a new one. The agent MUST NOT default to "let the stack create one" without asking -- that's the customer's choice. The agent MUST NOT proceed to the final confirmation step until the customer has typed an SG ID or `new`. Suggested phrasing: + +> "Pick a security group for the runner, or type 'new' to let the stack create a fresh one with no inbound and allow-all outbound. +> If you reuse an existing SG, it MUST allow outbound HTTPS (port 443) to atx ct backend, ECR, S3, Secrets Manager, and (if applicable) your internal git host. I'll verify outbound 443 is allowed before proceeding. +> Please reply with the SG ID or 'new'." + +```bash +read -p "Security group ID (or 'new' to create one): " SG_ANSWER +if [ "$SG_ANSWER" = "new" ] || [ -z "$SG_ANSWER" ]; then + EXISTING_SG_ID="" + echo " ✓ Stack will create a new no-inbound, allow-all-egress SG." +else + EXISTING_SG_ID="$SG_ANSWER" + + # Validation #4: reused SG allows outbound HTTPS. + EGRESS_443=$(aws ec2 describe-security-groups --group-ids "$EXISTING_SG_ID" --region $REGION \ + --query "SecurityGroups[0].IpPermissionsEgress[?FromPort==\`443\` || FromPort==null || IpProtocol=='-1'] | [0]" \ + --output json 2>/dev/null) + if [ "$EGRESS_443" = "null" ] || [ -z "$EGRESS_443" ]; then + echo "ERROR: security group $EXISTING_SG_ID does not appear to allow outbound HTTPS." + echo " Add an egress rule for TCP 443 to 0.0.0.0/0 (or to the specific atx ct," + echo " ECR, S3, Secrets Manager, and internal git host CIDRs) before deploying." + exit 1 + fi + echo " ✓ Security group $EXISTING_SG_ID allows outbound HTTPS." +fi + +echo "" +echo "Final selections:" +echo " VPC: $VPC_ID" +echo " Subnet: $SUBNET_ID (AZ $SUBNET_AZ)" +[ -n "$EXISTING_SG_ID" ] && echo " SG: $EXISTING_SG_ID (reused)" || echo " SG: stack will create a new one" +``` + +**FINAL CONFIRMATION GATE.** The agent MUST present the four selections above (VPC, subnet, SG, AZ) to the customer in a clear summary and ask explicit confirmation before advancing to Step 5c. Suggested phrasing: + +> "Here's what I'll deploy with: +> - **VPC**: `$VPC_ID` +> - **Subnet**: `$SUBNET_ID` (AZ `$SUBNET_AZ`) +> - **Security Group**: `$EXISTING_SG_ID` (reused) ← OR → stack will create a new no-inbound, allow-all-egress SG +> - **WorkerCount / InstanceType / VolumeSize**: (from Step 2) +> +> Proceed with these? (yes / no -- type yes to continue to the admin handoff, or anything else to revise)" + +The agent MUST wait for the customer's explicit `yes` (or equivalent affirmative) before advancing. If the customer says no or wants to change something, the agent MUST loop back to the relevant step and re-ask. **The agent MUST NOT skip this confirmation, even if every selection looks reasonable** -- this is the last chance for the customer to catch a mistake before the admin is asked to deploy infrastructure. + +**The skill NEVER creates VPCs, subnets, or NAT gateways.** It only describes them, asks the customer to choose, validates the choice, and (on the SG side) lets the stack create one when the customer doesn't want to reuse one. All other network resources are customer-provisioned, customer-owned. If the account has no VPCs or no subnets in the chosen VPC, the skill bails and tells the customer to provision them first -- those are infrastructure changes that need network-team approval, not something the skill should silently do. + +**Step 5c: Write the CFN template and create the stack:** + +```bash +STACK_NAME="${STACK_NAME:-atx-runner}" + +# Write the template inline. Customer can inspect /tmp/atx-ec2-stack.yaml before deploy. +cat > /tmp/atx-ec2-stack.yaml <<'CFN_EOF' +AWSTemplateFormatVersion: '2010-09-09' +Description: ATX CT runner - single EC2 instance with the atx-ct container running. + +Parameters: + InstanceType: + Type: String + Default: m5.2xlarge + AllowedValues: [m5.large, m5.xlarge, m5.2xlarge, m5.4xlarge, m5.8xlarge, m5.12xlarge] + ImageUri: + Type: String + Default: public.ecr.aws/d9h8z6l7/aws-transform:latest + VpcId: + Type: AWS::EC2::VPC::Id + Description: VPC where the runner will be deployed. If the git host is private/internal (self-managed GitLab, GHES, Bitbucket DC), provide a VPC with a route to it (VPN, Direct Connect, or peering). + SubnetId: + Type: AWS::EC2::Subnet::Id + Description: Subnet for the runner. Must have outbound internet access (NAT, IGW, or transit gateway) to reach the atx ct backend, ECR (image pulls), S3, and Secrets Manager. + ExistingSecurityGroupId: + Type: String + Default: '' + Description: Optional. If provided, the stack reuses this security group instead of creating a new one. The reused SG MUST allow outbound HTTPS (port 443) to the atx ct backend, ECR, S3, Secrets Manager, and (if applicable) the customer's internal git host. Leave empty to let the stack create a new no-inbound SG. + VolumeSizeGB: + Type: Number + Default: 100 + MinValue: 50 + WorkerCount: + Type: Number + Default: 5 + MinValue: 1 + MaxValue: 5 + Description: Number of parallel atx-ct containers (1-5). Default 5 provides headroom for parallel work without requiring later resize. WorkerCount=1 creates a single container named "atx-ct" (legacy behavior). WorkerCount>1 creates "atx-ct-1", "atx-ct-2", etc. For more than 5 parallel jobs, use the Batch path. + +Conditions: + CreateNewSG: !Equals [!Ref ExistingSecurityGroupId, ''] + +Resources: + TransformRole: + Type: AWS::IAM::Role + Properties: + RoleName: !Sub 'atx-transform-role-${AWS::StackName}' + AssumeRolePolicyDocument: + Version: '2012-10-17' + Statement: + - Effect: Allow + Principal: { Service: ec2.amazonaws.com } + Action: sts:AssumeRole + ManagedPolicyArns: + - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore + Policies: + - PolicyName: atx-transform-access + PolicyDocument: + Version: '2012-10-17' + Statement: + - Effect: Allow + Action: 'transform-custom:*' + Resource: '*' + - Effect: Allow + Action: [s3:GetObject, s3:PutObject, s3:ListBucket, s3:DeleteObject] + Resource: + - !Sub 'arn:aws:s3:::atx-source-code-${AWS::AccountId}' + - !Sub 'arn:aws:s3:::atx-source-code-${AWS::AccountId}/*' + - !Sub 'arn:aws:s3:::atx-ct-output-${AWS::AccountId}' + - !Sub 'arn:aws:s3:::atx-ct-output-${AWS::AccountId}/*' + - Effect: Allow + Action: [kms:GenerateDataKey, kms:Decrypt, kms:Encrypt, kms:DescribeKey] + Resource: !Sub 'arn:aws:kms:*:${AWS::AccountId}:key/*' + Condition: + StringLike: { 'kms:ViaService': 's3.*.amazonaws.com' } + - Effect: Allow + Action: secretsmanager:GetSecretValue + Resource: !Sub 'arn:aws:secretsmanager:*:${AWS::AccountId}:secret:atx/*' + - Effect: Allow + Action: + - securityagent:ListAgentSpaces + - securityagent:CreateCodeReview + - securityagent:StartCodeReviewJob + - securityagent:ListCodeReviewJobsForCodeReview + - securityagent:ListFindings + - securityagent:BatchGetFindings + - securityagent:StartCodeRemediation + Resource: 'arn:aws:securityagent:*:*:agent-space*' + Condition: + StringEquals: { 'aws:ResourceAccount': !Ref AWS::AccountId } + - Effect: Allow + Action: [s3:GetObject, s3:ListBucket] + Resource: + - 'arn:aws:s3:::kct-security-agent-*' + - 'arn:aws:s3:::kct-security-agent-*/*' + - Effect: Allow + Action: s3:PutObject + Resource: 'arn:aws:s3:::kct-security-agent-*/security-scans/*' + - Effect: Allow + Action: iam:PassRole + Resource: !Sub 'arn:aws:iam::${AWS::AccountId}:role/security-agent-*' + Condition: + StringEquals: + 'iam:PassedToService': securityagent.amazonaws.com + + TransformInstanceProfile: + Type: AWS::IAM::InstanceProfile + Properties: + InstanceProfileName: !Sub 'atx-transform-profile-${AWS::StackName}' + Roles: [!Ref TransformRole] + + SecurityGroup: + Type: AWS::EC2::SecurityGroup + Condition: CreateNewSG + Properties: + GroupDescription: ATX Transform EC2 - no inbound (access via SSM) + VpcId: !Ref VpcId + Tags: + - Key: Name + Value: !Sub 'atx-transform-sg-${AWS::StackName}' + + Instance: + Type: AWS::EC2::Instance + CreationPolicy: + ResourceSignal: { Timeout: PT15M, Count: 1 } + Properties: + InstanceType: !Ref InstanceType + ImageId: '{{resolve:ssm:/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64}}' + IamInstanceProfile: !Ref TransformInstanceProfile + SubnetId: !Ref SubnetId + SecurityGroupIds: + - !If [CreateNewSG, !Ref SecurityGroup, !Ref ExistingSecurityGroupId] + MetadataOptions: + # Enforce IMDSv2 (token-based, defense against SSRF) and allow 2 hops so + # containers using bridge networking can reach IMDS for IAM credentials. + # AWS recommendation for Docker-on-EC2: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + HttpEndpoint: enabled + HttpTokens: required + HttpPutResponseHopLimit: 2 + BlockDeviceMappings: + - DeviceName: /dev/xvda + Ebs: { VolumeSize: !Ref VolumeSizeGB, VolumeType: gp3, DeleteOnTermination: true } + Tags: + - { Key: Name, Value: !Sub 'atx-ct-runner-${AWS::StackName}' } + - { Key: ManagedBy, Value: ATX-CFN } + - { Key: StackName, Value: !Ref AWS::StackName } + UserData: + Fn::Base64: !Sub | + #!/bin/bash + set -e + trap 'cfn-signal -e $? --stack ${AWS::StackName} --resource Instance --region ${AWS::Region}' ERR EXIT + dnf install -y docker + systemctl start docker + systemctl enable docker + usermod -aG docker ec2-user + docker pull ${ImageUri} + + # Worker naming convention: + # WorkerCount=1 -> single container named "atx-ct" (existing behavior) + # WorkerCount>1 -> "atx-ct-1", "atx-ct-2", ... "atx-ct-N" + # Bridge networking (no --net=host) so multiple containers can coexist. + if [ "${WorkerCount}" -eq 1 ]; then + CONTAINERS="atx-ct" + else + CONTAINERS=$(seq -f "atx-ct-%g" 1 ${WorkerCount}) + fi + + for name in $CONTAINERS; do + docker run -d --name "$name" --restart unless-stopped \ + --entrypoint /bin/bash \ + -e CT_OUTPUT_BUCKET=atx-ct-output-${AWS::AccountId} \ + -e AWS_REGION=${AWS::Region} \ + ${ImageUri} \ + -c 'mkdir -p /home/atxuser/.atxct/sources /home/atxuser/.atxct/shared && \ + source ~/.bashrc && atx ct server' + done + + # Wait for all containers to report healthy in PARALLEL (background each + # health-check, then wait on all PIDs). Sequential checking would not fit + # within the CFN CreationPolicy timeout for higher WorkerCount values. + # Note: ${!name} is the CFN !Sub escape: !Sub leaves it as literal ${name} + # for bash to resolve. Plain ${name} would error with "Unresolved resource + # dependencies [name]" because !Sub treats ${...} as CFN refs. + PIDS=() + for name in $CONTAINERS; do + ( + for i in $(seq 1 60); do + if docker exec "$name" bash -c 'atx ct status --health' > /dev/null 2>&1; then exit 0; fi + sleep 5 + done + exit 1 + ) & + PIDS+=($!) + done + for pid in "${!PIDS[@]}"; do + wait "$pid" || { echo "Health check failed for one or more workers"; exit 1; } + done + for name in $CONTAINERS; do + docker ps --filter "name=^${!name}$" --filter status=running --format '{{.Names}}' | grep -q "^${!name}$" + docker exec "$name" bash -c 'atx ct status --health' > /dev/null 2>&1 + done + + trap - ERR EXIT + cfn-signal -e 0 --stack ${AWS::StackName} --resource Instance --region ${AWS::Region} + +Outputs: + StackName: { Value: !Ref AWS::StackName } + InstanceId: { Value: !Ref Instance } + RoleArn: { Value: !GetAtt TransformRole.Arn } + InstanceProfileName: { Value: !Ref TransformInstanceProfile } + SecurityGroupId: + Value: !If [CreateNewSG, !GetAtt SecurityGroup.GroupId, !Ref ExistingSecurityGroupId] + AccountId: { Value: !Ref AWS::AccountId } + Region: { Value: !Ref AWS::Region } +CFN_EOF + +# Worker count (default 5; max 5). Default of 5 provides headroom for parallel work +# without later resize (which is destructive; see "Changing WorkerCount" section). +# Customer can override to a smaller value (e.g., WORKER_COUNT=1 for legacy single-container +# behavior, or WORKER_COUNT=3 for moderate parallelism with lower cost). +WORKER_COUNT="${WORKER_COUNT:-5}" +if [ "$WORKER_COUNT" -lt 1 ] || [ "$WORKER_COUNT" -gt 5 ]; then + echo "ERROR: WORKER_COUNT must be 1-5. Got: $WORKER_COUNT. For more parallelism, use the Batch path." >&2 + exit 1 +fi + +# Auto-recommend InstanceType based on WorkerCount if customer did not override. +# Sizing assumes typical analyses (single-repo fan-out). For monorepos or 10x source-wide +# analyses simultaneously, customer should override INSTANCE_TYPE=m5.12xlarge. +if [ -z "$INSTANCE_TYPE" ]; then + if [ "$WORKER_COUNT" -le 1 ]; then INSTANCE_TYPE="m5.2xlarge" + elif [ "$WORKER_COUNT" -le 4 ]; then INSTANCE_TYPE="m5.4xlarge" + else INSTANCE_TYPE="m5.8xlarge" + fi +fi + +# Auto-recommend disk size: 50 GB per worker (covers typical and heavy use; override +# to 100 GB/worker for monorepos via VOLUME_SIZE env var). +VOLUME_SIZE="${VOLUME_SIZE:-$((50 * WORKER_COUNT))}" + +# ────────────────────────────────────────────────────────────────────────── +# Pre-deploy confirmation: ASK THE CUSTOMER before creating the stack. +# Show them the resolved config so they can override WorkerCount, InstanceType, +# or VolumeSize before commit. Customer is responsible for checking AWS pricing. +# ────────────────────────────────────────────────────────────────────────── +cat < **Admin handoff -- one-time setup** +> +> I've written the CloudFormation template to `/tmp/atx-ec2-stack.yaml`. **This stack creates IAM roles, so deploying requires admin / role-creation permissions (`iam:CreateRole`, `iam:PutRolePolicy`, `iam:PassRole`, instance profiles). Run it with an admin identity. Read-only or runtime credentials are enough for everything afterward.** +> +> The agent MUST include the following sentence verbatim in every Step 5d handoff, immediately after the admin-identity sentence above and before the command block. Do NOT abbreviate, drop, or paraphrase it -- customers onboarding a new executor identity rely on this pointer: +> +> For reference, the executor policy this skill expects is in `references/AWSTransformInfrastructureExecutorAccessEC2.json`. +> +> Those permissions are admin-scope; the executor permissions I'm running under intentionally do not grant them, so day-to-day analysis runs cannot escalate privileges. +> +> Ask someone in your account with admin / role-creation permissions (or yourself if you have a separate admin profile) to run these commands from the same shell, in the same region. **Replace `` with the AWS profile name that has admin / role-creation permissions in your environment.** + +**Profile-name guidance for the agent.** When emitting this admin handoff (or any of the other admin handoffs in this skill), the agent MUST use the placeholder `` rather than guessing a profile name from the customer's local AWS config, environment variables, or shell history. Customers commonly have multiple AWS profiles configured locally and the agent has no reliable way to identify which one carries admin permissions. Substituting a wrong name leads to confusing AccessDenied errors during deploy. Examples: + +- ❌ `AWS_PROFILE=atx-zerog-admin aws cloudformation create-stack ...` (the agent guessed from `~/.aws/config`) +- ❌ `AWS_PROFILE=admin aws cloudformation create-stack ...` (the agent assumed a name) +- ✅ `AWS_PROFILE= aws cloudformation create-stack ...` (placeholder for the customer to fill in) + +This rule applies to **every admin handoff in this skill**: create-stack, delete-stack, security-agent bootstrap, instance-tag handoff, instance-role-policy handoff, anywhere else admin is invoked. + +The full handoff command set (admin runs in their shell, in `${REGION}`): +> +> ```bash +> # 1. Create the persistent S3 buckets (only if Step 5a reported them missing). +> # These live OUTSIDE the CFN stack so they survive delete-and-recreate. +> # us-east-1 quirk: --create-bucket-configuration LocationConstraint=us-east-1 +> # is rejected by the API; omit the flag in that one region. +> LOC_CONSTRAINT="" +> [ "$REGION" != "us-east-1" ] && LOC_CONSTRAINT="--create-bucket-configuration LocationConstraint=$REGION" +> +> aws s3api create-bucket --bucket atx-source-code-${ACCOUNT_ID} --region $REGION $LOC_CONSTRAINT +> aws s3api put-bucket-lifecycle-configuration --bucket atx-source-code-${ACCOUNT_ID} \ +> --lifecycle-configuration '{"Rules":[{"ID":"expire-7d","Status":"Enabled","Expiration":{"Days":7},"Filter":{"Prefix":""}}]}' +> +> aws s3api create-bucket --bucket atx-ct-output-${ACCOUNT_ID} --region $REGION $LOC_CONSTRAINT +> aws s3api put-bucket-lifecycle-configuration --bucket atx-ct-output-${ACCOUNT_ID} \ +> --lifecycle-configuration '{"Rules":[{"ID":"expire-30d","Status":"Enabled","Expiration":{"Days":30},"Filter":{"Prefix":""}}]}' +> +> # 2. Create the CFN stack (instance, IAM role/profile, security group). +> aws cloudformation create-stack \ +> --stack-name "$STACK_NAME" \ +> --template-body file:///tmp/atx-ec2-stack.yaml \ +> --capabilities CAPABILITY_NAMED_IAM \ +> --parameters \ +> ParameterKey=VpcId,ParameterValue=$VPC_ID \ +> ParameterKey=SubnetId,ParameterValue=$SUBNET_ID \ +> ParameterKey=InstanceType,ParameterValue=$INSTANCE_TYPE \ +> ParameterKey=WorkerCount,ParameterValue=$WORKER_COUNT \ +> ParameterKey=VolumeSizeGB,ParameterValue=$VOLUME_SIZE \ +> ParameterKey=ExistingSecurityGroupId,ParameterValue="$EXISTING_SG_ID" \ +> --region $REGION \ +> --tags Key=atx-remote-infra,Value=true +> +> aws cloudformation wait stack-create-complete \ +> --stack-name "$STACK_NAME" --region $REGION +> ``` +> +> When the deploy finishes, come back to this conversation and tell me -- I'll re-detect the stack via `describe-stacks` (which my executor creds CAN do) and continue from Step 6. + +The agent then STOPS this turn. The admin runs the commands in their own terminal, outside the chat. On the next user turn, re-run **Step 0 (Detect)** -- the stack should now be `CREATE_COMPLETE` and the flow resumes at Step 6. + +**Why CloudFormation:** + +| Concern | CFN advantage | +|---|---| +| Audit trail | Single stack event log shows every resource created | +| Atomic deploy | Failure rolls back entire stack -- no orphaned IAM roles or instances | +| Drift detection | Customer can run `aws cloudformation detect-stack-drift` to see if anything changed manually | +| Teardown | Single `aws cloudformation delete-stack` cleans up everything in the stack | +| Multi-stack support | Customer can run `STACK_NAME=dev`, `STACK_NAME=prod` for isolated runners | +| Visibility | Customer's CloudFormation console shows the resources, parameters, and outputs | + +### Step 6: Verify the Container is Running + +The CFN stack's `CreationPolicy` ensures `CREATE_COMPLETE` fires only after the UserData script signals success -- meaning Docker is installed, the image is pulled, and the atx-ct container is up. So verification is a quick confidence check. + +```bash +# Define the SSM helpers (used by all subsequent steps for short status calls +# and fire-and-forget submissions of long-running work): +# +# ssm_submit -- fire-and-forget. Returns SSM CommandId immediately. NEVER blocks. +# Use for build_command_*() submissions. +# ssm_run -- submit + wait + get output. Blocks until command completes +# (~100s SSM-side timeout). Use for short status commands. +# +# DO NOT use ssm_run for build_command_*() -- the wrapper runs for hours. + +ssm_submit() { + aws ssm send-command --region $REGION \ + --instance-ids "$INSTANCE_ID" \ + --document-name AWS-RunShellScript \ + --parameters "commands=[\"$1\"]" \ + --query 'Command.CommandId' --output text +} + +ssm_run() { + local cmd="$1" + local CMD_ID=$(aws ssm send-command --region $REGION \ + --instance-ids "$INSTANCE_ID" \ + --document-name AWS-RunShellScript \ + --parameters "commands=[\"$cmd\"]" \ + --query 'Command.CommandId' --output text) + aws ssm wait command-executed --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" --region $REGION 2>/dev/null || true + aws ssm get-command-invocation --region $REGION \ + --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" \ + --query 'StandardOutputContent' --output text +} + +# Resolve CONTAINER_NAME based on stack's WorkerCount + the desired worker. +# WorkerCount=1 (default): single container "atx-ct" (existing behavior). +# WorkerCount>1: containers "atx-ct-1", "atx-ct-2", ..., "atx-ct-N". +# WORKER_NUM is the 1-indexed worker to target (1..WorkerCount). Defaults to 1. +WORKER_COUNT=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION \ + --query 'Stacks[0].Parameters[?ParameterKey==`WorkerCount`].ParameterValue' --output text 2>/dev/null) +WORKER_COUNT=$(echo "$WORKER_COUNT" | xargs) # strip whitespace defensively +[ -z "$WORKER_COUNT" ] || [ "$WORKER_COUNT" = "None" ] && WORKER_COUNT=1 +WORKER_NUM="${WORKER_NUM:-1}" +if [ "$WORKER_COUNT" -eq 1 ]; then + CONTAINER_NAME="atx-ct" +else + if [ "$WORKER_NUM" -lt 1 ] || [ "$WORKER_NUM" -gt "$WORKER_COUNT" ]; then + echo "ERROR: WORKER_NUM ($WORKER_NUM) must be 1-${WORKER_COUNT}." >&2 + exit 1 + fi + CONTAINER_NAME="atx-ct-${WORKER_NUM}" +fi + +# Confirm container is running and atx ct server is healthy +ssm_run "sudo docker ps --filter \"name=^${CONTAINER_NAME}$\" --filter status=running --format '{{.Names}}: {{.Status}}'" +ssm_run "sudo docker exec ${CONTAINER_NAME} atx ct status --health" +``` + +If either check fails, inspect the container logs: + +```bash +ssm_run "sudo docker logs ${CONTAINER_NAME} 2>&1 | tail -50" +``` + +For a fully failed bootstrap, the stack would be in `ROLLBACK_COMPLETE` (UserData failed → cfn-signal sent error → stack rolled back). Check stack events: + +```bash +aws cloudformation describe-stack-events --stack-name "$STACK_NAME" --region $REGION \ + --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`].[ResourceType,ResourceStatusReason]' --output table +``` + +**Local source preparation (local provider only):** if `PROVIDER=local`, sync repo bundles into the container after the container is up: + +```bash +if [ "$PROVIDER" = "local" ]; then + # Customer must have already uploaded zips to s3://atx-source-code-${ACCOUNT_ID}/repos/ + ssm_run "sudo docker exec ${CONTAINER_NAME} bash -c 'mkdir -p /home/atxuser/repos /tmp/zips && \ + aws s3 sync s3://atx-source-code-${ACCOUNT_ID}/repos/ /tmp/zips/ && \ + for zip in /tmp/zips/*.zip; do unzip -q -o \"\$zip\" -d /home/atxuser/repos/; done'" +fi +``` + +#### Security analysis prerequisite + +If `ANALYSIS_TYPE=security` (or `agentic-readiness` / `modernization-readiness` which depend on it), the security agent must be set up first. See [continuous-modernization-setup](workload-continuous-modernization-setup.md) for `atx ct setup security-agent`. + +The S3 + `iam:PassRole` grants the instance role needs for security analysis are **always-on** in the CFN template (the `securityagent:*` actions, `s3:*` on `kct-security-agent-*`, and `iam:PassRole` on `security-agent-*` are part of the base role policy). No stack redeploy is required if the customer decides to run security analysis after the stack is up -- the role already has what's needed. + +**One-time agent-space bootstrap.** The first time anyone in this account runs a security analysis, the runtime calls `securityagent:CreateAgentSpace` to provision the agent-space resource. That permission is intentionally NOT granted to the EC2 instance role (the role only has the operate-mode `securityagent:*` actions). So the first security analysis MUST run locally with admin credentials, which create the agent space and populate `agentSpaceId` in `~/.atxct/shared/security_agent_config.json`. Every subsequent run -- on EC2 or local, by any caller -- finds the existing agent space via `list-agent-spaces` and never needs `CreateAgentSpace` again. + +**The agent MUST run this check before submitting a security/agentic-readiness/modernization-readiness analysis on EC2:** + +```bash +atx ct setup security-agent --status 2>/dev/null > /tmp/sa-status.json +AGENT_SPACE_ID=$(jq -r '.agentSpaceId // ""' /tmp/sa-status.json) + +if [ -z "$AGENT_SPACE_ID" ]; then + # First-time bootstrap required. + echo "agent space not yet provisioned" +fi +``` + +**If `agentSpaceId` is empty**, the agent MUST stop the EC2 flow and emit an admin handoff. Phrasing: + +> "Before I can run security analysis on EC2, the agent space resource needs to be provisioned in your account. This is a one-time bootstrap that requires admin credentials (only the first security analysis ever in this account needs this). Run this on your laptop with your admin profile: +> +> ```bash +> AWS_PROFILE= AWS_REGION=$REGION atx ct analysis run \ +> --type security \ +> --source $LOGICAL_SOURCE_NAME \ +> --repo "$LOGICAL_SOURCE_NAME::" \ +> --telemetry "agent=$AGENT,executionMode=local" +> ``` +> +> Pick any single repo from your source -- the bootstrap doesn't depend on which one. After this completes, your local `~/.atxct/shared/security_agent_config.json` will have `agentSpaceId` populated. Come back to this conversation and I'll sync it to the EC2 container and run the actual analysis there." + +The agent then STOPS this turn. On the next user turn, re-check `--status`; if `agentSpaceId` is now populated, proceed to the config-sync step below. + +**If `agentSpaceId` is populated** (either from a prior bootstrap, or because the customer just ran the one-time admin-creds analysis), sync the config file into the EC2 container so the runtime can find the existing agent space: + +```bash +# Sync security agent config from laptop into all atx-ct containers. +# The loop applies to single-worker (just "atx-ct") and multi-worker (atx-ct-1..N) stacks. +aws s3 cp ~/.atxct/shared/security_agent_config.json \ + s3://atx-source-code-${ACCOUNT_ID}/temp/security_agent_config.json +ssm_run "aws s3 cp s3://atx-source-code-${ACCOUNT_ID}/temp/security_agent_config.json /tmp/sa.json && \ + for c in \$(sudo docker ps --filter name=atx-ct --format '{{.Names}}'); do \ + sudo docker cp /tmp/sa.json \$c:/home/atxuser/.atxct/shared/security_agent_config.json && \ + sudo docker exec \$c chown 1000:1000 /home/atxuser/.atxct/shared/security_agent_config.json; \ + done" +aws s3 rm s3://atx-source-code-${ACCOUNT_ID}/temp/security_agent_config.json +``` + +### Step 7: Confirm and Submit + +Tell the customer what will happen and wait for explicit confirmation. + +**For GitHub:** +> "I'll submit `` on EC2 instance `${INSTANCE_ID}` against your GitHub source ``. The container is already configured with your GitHub PAT. The submission will: +> - Run `atx ct analysis run --type --source ` in the background +> - Poll status until complete +> - Upload artifacts to `s3://atx-ct-output-${ACCOUNT_ID}///code.zip` +> +> Continue?" + +**For GitLab:** same as GitHub with `atx/gitlab-token`. + +**For Bitbucket Cloud:** +> "I'll submit `` on EC2 instance `${INSTANCE_ID}` against your Bitbucket source ``. The container will: +> - Place your Bitbucket API token (from Secrets Manager `atx/bitbucket-token`) and inject email/username into config.json +> - Run `atx ct analysis run --type --source ` in the background +> - Poll status until complete +> - Upload artifacts to S3 +> +> Continue?" + +**For Bitbucket Data Center:** +> "I'll submit `` on EC2 instance `${INSTANCE_ID}` against your Bitbucket Data Center source ``. The container will: +> - Place your HTTP Access Token (from Secrets Manager `atx/bitbucket-token`) and inject base_url into config.json +> - Run `atx ct analysis run --type --source ` in the background +> - Poll status until complete +> - Upload artifacts to S3 +> +> Continue?" + +**For Local:** same as GitHub with bundle synced to `/home/atxuser/repos`. + +Do NOT submit until the customer confirms. + +### Step 8: Submit Work + +Build the nohup'd command via `build_command_*()` (returns one self-contained script that runs analysis → polls status → uploads artifacts) and submit via SSM. The SSM call returns immediately because the script is backgrounded. The agent stays free during the long-running work. + +```bash +ANALYSIS_TYPE="" # tech-debt-quick | tech-debt-comprehensive | security | agentic-readiness | modernization-readiness | custom +AGENT="" # AI assistant name (kiro, claude, amazonq, etc.) +JOB_ID="atxct-$(date +%s)" # unique per submission; per-job state files keyed by this +REPO_FILTER="" # empty = whole source; or "--repo ::" (ONE repo only, never multiple) +EXTRA_FLAGS="" # for --type custom: "--transformation-name -g 'KEY=VAL'" +BITBUCKET_WORKSPACE="" # bitbucket only -- workspace (Cloud) or project key (DC) +BITBUCKET_EMAIL="" # bitbucket cloud only -- email for API auth +BITBUCKET_USERNAME="" # bitbucket cloud only -- username for git clone/push +BITBUCKET_BASE_URL="" # bitbucket DC only -- e.g. https://bitbucket.corp.example.com (empty for Cloud) +``` + +The script written to the instance follows this shape: + +```bash +# (1) Submit analysis (no --wait; returns AID immediately) +sudo docker exec ${CONTAINER_NAME} atx ct analysis run --type $ANALYSIS_TYPE $EXTRA_FLAGS --source $SOURCE $REPO_FILTER > /tmp/run.log 2>&1 +AID=$(grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1) + +# (2) Poll status until terminal +while true; do + STATUS=$(sudo docker exec ${CONTAINER_NAME} atx ct analysis get --id $AID --json | jq -r .status) + case "$STATUS" in + complete|completed) break ;; + failed) exit 1 ;; + *) sleep 60 ;; + esac +done + +# (3) Upload artifacts (skipped for tech-debt-quick -- read-only scan) +sudo docker exec ${CONTAINER_NAME} /app/upload-ct-artifacts.sh $AID atx-ct-output-$ACCOUNT_ID +``` + +#### `build_command_*()` builders + +Each builder writes the wrapper script to the instance via heredoc and launches it via `nohup`. Local-side bash substitutes `${LOGICAL_SOURCE_NAME}`, `${ANALYSIS_TYPE}`, etc.; runtime values like `$AID` and `$STATUS` are escaped (`\$`) so they're evaluated on the instance. + +```bash +# Analysis on github / gitlab / local -- same wrapper shape, same upload step. +# Token (github/gitlab) is fetched from Secrets Manager at job time and placed in +# the container's source dir. atx ct's async provider resolution queries the +# backend for source metadata, so no config.json is needed in the container. +# +# IMPORTANT: build_command_analysis() returns the script BODY only (clean bash, no +# heredoc tricks). The skill base64-encodes the body and submits a short SSM command +# that decodes-and-runs it. This avoids the multi-level quote-escaping nightmare that +# happens when you try to pass a multi-line bash script through `aws ssm send-command +# --parameters "commands=[\"...\"]"` (the JSON layer + the bash layer collide). +build_command_analysis() { + local UPLOAD_LINE="sudo docker exec ${CONTAINER_NAME} /app/upload-ct-artifacts.sh \$AID atx-ct-output-${ACCOUNT_ID}" + [ "${ANALYSIS_TYPE}" = "tech-debt-quick" ] && UPLOAD_LINE='echo "[skip upload -- tech-debt-quick is read-only]"' + + # Token-injection prelude (runs INSIDE the container at job start) + local TOKEN_PRELUDE="" + if [ "$PROVIDER" = "github" ] || [ "$PROVIDER" = "gitlab" ]; then + local SECRET_ID="atx/${PROVIDER}-token" + local TOKEN_FILE="${PROVIDER}_token" + TOKEN_PRELUDE="sudo docker exec ${CONTAINER_NAME} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id ${SECRET_ID} --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE} && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE}'" + elif [ "$PROVIDER" = "bitbucket" ]; then + # Bitbucket requires token + config.json with email/username (Cloud) or base_url (DC). + # BITBUCKET_WORKSPACE, BITBUCKET_EMAIL, BITBUCKET_USERNAME, BITBUCKET_BASE_URL must be set by caller. + local config_json + if [ -n "${BITBUCKET_BASE_URL}" ]; then + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + local CONFIG_B64=$(echo "${config_json}" | base64 -w 0) + TOKEN_PRELUDE="sudo docker exec ${CONTAINER_NAME} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token'" + fi + + # Return the script body. Local bash substitutes ${ANALYSIS_TYPE}, ${LOGICAL_SOURCE_NAME}, + # ${TOKEN_PRELUDE}, etc.; runtime references like \$AID and \$LOG stay as-is. + cat <> \$LOG + +${TOKEN_PRELUDE} + +sudo docker exec ${CONTAINER_NAME} atx ct analysis run --type ${ANALYSIS_TYPE} ${EXTRA_FLAGS} --source ${LOGICAL_SOURCE_NAME} ${REPO_FILTER} --telemetry "agent=${AGENT},executionMode=ec2" >> \$LOG 2>&1 +AID=\$(grep -oE '01[A-Z0-9]+' \$LOG | head -1) +[ -z "\$AID" ] && { echo "ERROR: no analysis ID extracted" >> \$LOG; exit 1; } +echo \$AID > /tmp/atxct-${JOB_ID}.aid + +while true; do + STATUS=\$(sudo docker exec ${CONTAINER_NAME} atx ct analysis get --id \$AID --json 2>/dev/null | jq -r .status 2>/dev/null) + case "\$STATUS" in + complete|completed) echo "=== \$(date) [DONE] analysis \$AID ===" >> \$LOG; break ;; + failed|cancelled) echo "=== \$(date) [\$STATUS] analysis \$AID ===" >> \$LOG; exit 1 ;; + *) echo "\$(date) status=\${STATUS:-pending}" >> \$LOG; sleep 60 ;; + esac +done + +${UPLOAD_LINE} >> \$LOG 2>&1 +echo "=== \$(date) [DONE] upload ===" >> \$LOG +EOF +} + +# Build the script body, base64-encode it (avoids quoting hell when submitting via SSM), +# and submit a single short SSM command that decodes + runs it. +SCRIPT=$(build_command_analysis) +B64=$(echo "$SCRIPT" | base64 | tr -d '\n') + +# Compose a single-line SSM command: +# 1. echo $B64 | base64 -d > /tmp/atxct-.sh (decode script to disk) +# 2. chmod +x ... (make executable) +# 3. ( ( bash ... > log 2>&1 < /dev/null & ) & ) (double-fork orphans wrapper to init) +# 4. echo Started_... (so SSM sees a quick exit) +# +# IMPORTANT: the double-fork is required. Without it, SSM's AWS-RunShellScript +# tracks the wrapper via cgroup and keeps the command slot pinned until the +# wrapper exits, saturating the SSM agent's worker pool. The double-fork +# `( ( bash X & ) & )` reparents the wrapper to init (PID 1) so SSM marks +# the launch command Success immediately. +LAUNCH_CMD="echo ${B64} | base64 -d > /tmp/atxct-${JOB_ID}.sh && chmod +x /tmp/atxct-${JOB_ID}.sh && ( ( bash /tmp/atxct-${JOB_ID}.sh > /tmp/atxct-${JOB_ID}.stdout 2>&1 < /dev/null & ) & ) && echo Started_${JOB_ID}" + +SUBMIT_ID=$(ssm_submit "$LAUNCH_CMD") +echo "Submitted job $JOB_ID (SSM command: $SUBMIT_ID). Ask me to check status anytime." +``` + +The agent prints "Submitted job $JOB_ID" and is free to interact with the user. The wrapper continues on the instance independently -- analysis, polling, and upload all happen there. + +#### Remediation (instead of analysis) + +Same shape. The build differs by source provider -- github / gitlab use the backend's branch-push flow (no `--local`, no S3 upload); `local` uses `--local` and uploads artifacts. + +```bash +build_command_remediation() { + local CREATE_ARGS="" + if [ -n "$FINDING_IDS" ]; then + CREATE_ARGS="--ids ${FINDING_IDS}" + [ -n "${TRANSFORMATION_NAME}" ] && CREATE_ARGS="${CREATE_ARGS} --transformation-name ${TRANSFORMATION_NAME}" + else + CREATE_ARGS="--transformation-name ${TRANSFORMATION_NAME} ${REPO_FILTER}" + fi + [ -n "${CONFIGURATION}" ] && CREATE_ARGS="${CREATE_ARGS} -g \"${CONFIGURATION}\"" + + local LOCAL_FLAG="" + local UPLOAD_LINE='echo "[skip upload -- github/gitlab remediation pushes a branch]"' + if [ "$PROVIDER" = "local" ]; then + LOCAL_FLAG="--local" + UPLOAD_LINE="sudo docker exec ${CONTAINER_NAME} /app/upload-ct-artifacts.sh \$RID atx-ct-output-${ACCOUNT_ID}" + fi + + # Token-injection prelude (runs INSIDE the container at job start) + local TOKEN_PRELUDE="" + if [ "$PROVIDER" = "github" ] || [ "$PROVIDER" = "gitlab" ]; then + local SECRET_ID="atx/${PROVIDER}-token" + local TOKEN_FILE="${PROVIDER}_token" + TOKEN_PRELUDE="sudo docker exec ${CONTAINER_NAME} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id ${SECRET_ID} --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE} && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/${TOKEN_FILE}'" + elif [ "$PROVIDER" = "bitbucket" ]; then + # Bitbucket requires token + config.json with email/username (Cloud) or base_url (DC). + # BITBUCKET_WORKSPACE, BITBUCKET_EMAIL, BITBUCKET_USERNAME, BITBUCKET_BASE_URL must be set by caller. + local config_json + if [ -n "${BITBUCKET_BASE_URL}" ]; then + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"base_url":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_BASE_URL}") + else + config_json=$(printf '{"provider":"bitbucket","identifier":"%s","provider_config":{"email":"%s","username":"%s"}}' "${BITBUCKET_WORKSPACE}" "${BITBUCKET_EMAIL}" "${BITBUCKET_USERNAME}") + fi + local CONFIG_B64=$(echo "${config_json}" | base64 -w 0) + TOKEN_PRELUDE="sudo docker exec ${CONTAINER_NAME} bash -c 'mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && echo ${CONFIG_B64} | base64 -d > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/config.json && aws secretsmanager get-secret-value --secret-id atx/bitbucket-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token && chmod 600 /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/bitbucket_token'" + fi + + # Returns clean script body (no heredoc tricks). The skill base64-encodes and submits + # via a short SSM command (same pattern as build_command_analysis above). + cat <> \$LOG + +${TOKEN_PRELUDE} + +sudo docker exec ${CONTAINER_NAME} atx ct remediation create ${CREATE_ARGS} ${LOCAL_FLAG} --source ${LOGICAL_SOURCE_NAME} --telemetry "agent=${AGENT},executionMode=ec2" >> \$LOG 2>&1 +RID=\$(grep -oE '01[A-Z0-9]+' \$LOG | head -1) +[ -z "\$RID" ] && { echo "ERROR: no remediation ID" >> \$LOG; exit 1; } +echo \$RID > /tmp/atxct-${JOB_ID}.rid + +while true; do + STATUS=\$(sudo docker exec ${CONTAINER_NAME} atx ct remediation status --id \$RID --json 2>/dev/null | jq -r .status 2>/dev/null) + case "\$STATUS" in + complete|completed) echo "=== \$(date) [DONE] remediation \$RID ===" >> \$LOG; break ;; + failed|cancelled) echo "=== \$(date) [\$STATUS] remediation \$RID ===" >> \$LOG; exit 1 ;; + *) sleep 60 ;; + esac +done + +${UPLOAD_LINE} >> \$LOG 2>&1 +echo "=== \$(date) [DONE] upload ===" >> \$LOG +EOF +} + +# Same base64 pattern as analysis +SCRIPT=$(build_command_remediation) +B64=$(echo "$SCRIPT" | base64 | tr -d '\n') +LAUNCH_CMD="echo ${B64} | base64 -d > /tmp/atxct-${JOB_ID}.sh && chmod +x /tmp/atxct-${JOB_ID}.sh && ( ( bash /tmp/atxct-${JOB_ID}.sh > /tmp/atxct-${JOB_ID}.stdout 2>&1 < /dev/null & ) & ) && echo Started_${JOB_ID}" + +SUBMIT_ID=$(ssm_submit "$LAUNCH_CMD") +echo "Submitted remediation job $JOB_ID (SSM command: $SUBMIT_ID). Ask me to check status anytime." +``` + +### Step 9: Status Checking + +When the customer asks for status, ask `atx ct` for the authoritative state. The wrapper's log file is only useful for DEBUGGING the wrapper itself (e.g., "did the wrapper start? did it parse the AID?"); for "is my analysis done?" the answer comes from the atx ct server. + +```bash +# Authoritative status. What the customer actually wants to know. +AID=$(ssm_run "cat /tmp/atxct-${JOB_ID}.aid 2>/dev/null" | tr -d '[:space:]') + +if [ -z "$AID" ]; then + # No AID means the wrapper failed before extracting an analysis ID. Most + # likely cause: instance role lacks a permission needed by the wrapper or + # by atx ct's first backend call. Surface the specific error from the + # wrapper log instead of reporting "running" or "pending". + ssm_run "grep -iE 'AccessDenied|not authorized|Error:' /tmp/atxct-${JOB_ID}.log 2>&1 | head -5" + echo "Wrapper failed to dispatch the analysis. See the errors above." + echo "Common causes: missing transform-custom:* (instance role) or secretsmanager:GetSecretValue." + echo "Tell the customer the specific permission identified in the AccessDenied message and ask them to attach it." +else + ssm_run "sudo docker exec ${CONTAINER_NAME} atx ct analysis get --id $AID --json" | \ + jq '{status, repos_total: (.repos | length), findings_count}' +fi +``` + +Or, if you don't have the JOB_ID handy, list all in-flight analyses on the instance: + +```bash +ssm_run "sudo docker exec ${CONTAINER_NAME} atx ct analysis list --json | jq '.items[] | select(.status == \"running\" or .status == \"pending\")'" +``` + +For remediation jobs, swap `analysis get` → `remediation status` and `*.aid` → `*.rid`. + +**Wrapper log tail is only for debugging** (when you need to see what the wrapper is doing on the instance, not what the analysis is doing on the server): + +```bash +ssm_run "tail -20 /tmp/atxct-${JOB_ID}.log" +``` + +To list all in-flight jobs on the instance: + +```bash +ssm_run "ls -la /tmp/atxct-*.aid /tmp/atxct-*.rid 2>/dev/null" +``` + +### Step 10: Get Findings and Artifacts + +**Findings** are persisted by the analysis runner during execution and queryable from anywhere with CT access. **`atx ct findings list --json` returns a top-level array** (no `.items` wrapper): + +```json +[ + { + "id": "01ABC...", + "severity": "high|medium|low", + "category": "security|performance|maintainability|...", + "repo": "::", + "title": "Short description", + "description": "Full description", + "fix": null | { ... }, + ... + }, + ... +] +``` + +> **Heads up -- JSON shape inconsistency across `atx ct` commands.** Some commands return `{"items": [...]}` (e.g., `repository list`, `analysis list`); others return a bare `[...]` (e.g., `findings list`, `source list`). Always assume bare array for `findings list` -- use `.[]` not `.items[]` in jq. + +Common queries: + +```bash +# Total finding count +atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json | jq 'length' + +# Group by severity +atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json | \ + jq 'group_by(.severity) | map({severity: .[0].severity, count: length})' + +# Group by category +atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json | \ + jq 'group_by(.category) | map({category: .[0].category, count: length})' + +# Auto-remediable findings only (have a fix proposal) +atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json | \ + jq '[.[] | select(.fix != null)] | length' + +# Per-repo summary as TSV (severity, category, repo, title) +atx ct findings list --source ${LOGICAL_SOURCE_NAME} --json | \ + jq -r '.[] | [.severity, .category, .repo, .title] | @tsv' + +# Filter to a specific analysis +atx ct findings list --analysis-id ${AID} --json | jq 'length' +``` + +These commands work from anywhere with `atx ct` CLI access (customer's laptop, the EC2 container via `sudo docker exec ${CONTAINER_NAME} ...`, or any other machine with the same backend access). Findings are server-state, not instance-state. + +**S3 artifacts** are uploaded by `/app/upload-ct-artifacts.sh` automatically when the wrapper completes. Analysis artifacts are written for any provider; remediation artifacts are only written for `--local` remediations. + +``` +s3://atx-ct-output-{account-id}//::/ + code.zip -- the working directory after the analysis or remediation completes, + including a result branch with auto-committed changes (e.g., + `atx-result-staging-` for analysis documentation, or the + remediation's branch for `--local` runs). The customer can `git log` + and `git diff` to review what the bot changed. `.git/` is preserved + for this reason. + Excludes node_modules/, .env*, *.pem, *.key, .aws/. + logs.zip -- cherry-picked debug logs (ATX CLI debug, error log, conversation + transcript, plan.json, validation_summary.md). +``` + +To download: + +```bash +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) + +# All artifacts for one analysis +aws s3 sync s3://atx-ct-output-${ACCOUNT_ID}/${AID}/ ./artifacts/ + +# Just one repo's reports +aws s3 cp s3://atx-ct-output-${ACCOUNT_ID}/${AID}/${SOURCE}::${REPO}/code.zip ./ +``` + +Surface findings to the user as the primary result. Reference S3 artifacts only when the user asks for raw reports/logs. + +## Cancellation + +To cancel an in-flight job: + +```bash +JOB_ID="" + +# Read the AID/RID and the wrapper PID +AID=$(ssm_run "cat /tmp/atxct-${JOB_ID}.aid 2>/dev/null") +WRAPPER_PID=$(ssm_run "pgrep -f 'atxct-${JOB_ID}.sh'") + +# Kill the wrapper (stops the polling loop on the instance) +ssm_run "sudo kill -TERM $WRAPPER_PID 2>/dev/null" + +# Cancel the in-flight CT analysis (server-side) +[ -n "$AID" ] && ssm_run "sudo docker exec ${CONTAINER_NAME} atx ct analysis cancel --id $AID" + +# Clean up this job's temp files +ssm_run "rm -f /tmp/atxct-${JOB_ID}.*" +``` + +Findings already persisted (from earlier in the analysis) survive the cancel. The upload step does NOT run if the wrapper is killed -- recover via: + +```bash +ssm_run "sudo docker exec ${CONTAINER_NAME} /app/upload-ct-artifacts.sh $AID atx-ct-output-${ACCOUNT_ID}" +``` + +## Use Existing Instance (no CFN) + +Reached when **Step 0 returned no stack** and the customer chose path 1 (existing EC2 instance launched outside CFN). Steps C.1–C.7 verify the instance, bootstrap the atx-ct container, and resume at Step 6. + +**At most ONE admin handoff** is needed in this path -- Step C.0 pre-flights both the instance tag and the role permissions in read-only mode, then emits a single combined admin handoff if either is missing. After admin runs that one bundle, the executor proceeds through C.2–C.6 (Docker install, image pull, container start) without further interruption. If both the tag and role were already in place, the handoff is skipped entirely. + +#### Step C.0: Pre-flight + Combined Admin Handoff + +Capture the basics first: + +```bash +INSTANCE_ID="" +REGION="" +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) + +# Discover the instance's IAM role (executor's iam:GetInstanceProfile is account-scoped, so this works). +PROFILE_ARN=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --region $REGION \ + --query 'Reservations[0].Instances[0].IamInstanceProfile.Arn' --output text) +PROFILE_NAME=$(echo "$PROFILE_ARN" | awk -F/ '{print $NF}') +INSTANCE_ROLE_NAME=$(aws iam get-instance-profile --instance-profile-name "$PROFILE_NAME" \ + --query 'InstanceProfile.Roles[0].RoleName' --output text) + +if [ -z "$INSTANCE_ROLE_NAME" ] || [ "$INSTANCE_ROLE_NAME" = "None" ]; then + echo "ERROR: instance has no IAM instance profile attached. The customer's admin must" + echo " create one and attach it before this skill can proceed. This is a much larger" + echo " handoff than tagging or policy-setting; bail out and ask the customer." + exit 1 +fi +``` + +**Read-only pre-flight checks** (executor creds suffice for all of these): + +```bash +# Check 1: Is the instance tagged atx-remote-infra=true? +TAG_VALUE=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --region $REGION \ + --query 'Reservations[0].Instances[0].Tags[?Key==`atx-remote-infra`].Value | [0]' --output text) +TAG_OK="no"; [ "$TAG_VALUE" = "true" ] && TAG_OK="yes" + +# Check 2: Does the role have transform-custom:* (the marker action that proves the +# full Instance Role IAM spec was applied)? Inspect inline policies. +ROLE_POLICIES=$(aws iam list-role-policies --role-name "$INSTANCE_ROLE_NAME" --query 'PolicyNames' --output text) +ROLE_OK="no" +for POLICY in $ROLE_POLICIES; do + if aws iam get-role-policy --role-name "$INSTANCE_ROLE_NAME" --policy-name "$POLICY" \ + --query 'PolicyDocument.Statement[].Action' --output json 2>/dev/null \ + | grep -q '"transform-custom:\*"'; then + ROLE_OK="yes"; break + fi +done + +# Check 3: Is AmazonSSMManagedInstanceCore attached? +SSM_OK="no" +aws iam list-attached-role-policies --role-name "$INSTANCE_ROLE_NAME" \ + --query 'AttachedPolicies[?PolicyName==`AmazonSSMManagedInstanceCore`].PolicyName' \ + --output text 2>/dev/null | grep -q AmazonSSMManagedInstanceCore && SSM_OK="yes" + +echo "Tag atx-remote-infra=true: $TAG_OK" +echo "Role has transform-custom:* etc: $ROLE_OK" +echo "AmazonSSMManagedInstanceCore attached: $SSM_OK" +``` + +**If all three are `yes`**: skip the handoff and proceed directly to Step C.1. + +**If any is `no`**: emit ONE combined admin handoff covering all the missing pieces. Tell the customer: + +> **Admin handoff -- one-time setup for `$INSTANCE_ID`** +> +> This bundle (a) tags the instance so the executor can SSM into it, (b) attaches the full ATX Control Tower instance role policy, and (c) ensures `AmazonSSMManagedInstanceCore` is attached. All three are admin-only operations (`ec2:CreateTags`, `iam:PutRolePolicy`, `iam:AttachRolePolicy`). Run with admin / role-creation permissions: +> +> ```bash +> INSTANCE_ID="$INSTANCE_ID" +> INSTANCE_ROLE_NAME="$INSTANCE_ROLE_NAME" +> ACCOUNT_ID="$ACCOUNT_ID" +> REGION="$REGION" +> +> # 1. Tag the instance so executor's tag-conditioned SSM permissions activate +> aws ec2 create-tags \ +> --resources "$INSTANCE_ID" \ +> --tags Key=atx-remote-infra,Value=true \ +> --region "$REGION" +> +> # 2. Attach the full ATX Control Tower instance role policy (the FULL spec from +> # the Instance Role IAM section -- do NOT subset by analysis type). +> aws iam put-role-policy \ +> --role-name "$INSTANCE_ROLE_NAME" \ +> --policy-name atx-transform-access \ +> --policy-document '{ +> "Version": "2012-10-17", +> "Statement": [ +> {"Effect": "Allow", "Action": "transform-custom:*", "Resource": "*"}, +> {"Effect": "Allow", +> "Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket", "s3:DeleteObject"], +> "Resource": ["arn:aws:s3:::atx-source-code-'$ACCOUNT_ID'", +> "arn:aws:s3:::atx-source-code-'$ACCOUNT_ID'/*", +> "arn:aws:s3:::atx-ct-output-'$ACCOUNT_ID'", +> "arn:aws:s3:::atx-ct-output-'$ACCOUNT_ID'/*"]}, +> {"Effect": "Allow", "Action": "secretsmanager:GetSecretValue", +> "Resource": "arn:aws:secretsmanager:*:'$ACCOUNT_ID':secret:atx/*"}, +> {"Effect": "Allow", +> "Action": ["securityagent:ListAgentSpaces", +> "securityagent:CreateCodeReview", "securityagent:StartCodeReviewJob", +> "securityagent:ListCodeReviewJobsForCodeReview", +> "securityagent:ListFindings", "securityagent:BatchGetFindings", +> "securityagent:StartCodeRemediation"], +> "Resource": "arn:aws:securityagent:*:*:agent-space*", +> "Condition": {"StringEquals": {"aws:ResourceAccount": "'$ACCOUNT_ID'"}}}, +> {"Effect": "Allow", +> "Action": ["s3:GetObject", "s3:ListBucket"], +> "Resource": ["arn:aws:s3:::kct-security-agent-*", +> "arn:aws:s3:::kct-security-agent-*/*"]}, +> {"Effect": "Allow", "Action": "s3:PutObject", +> "Resource": "arn:aws:s3:::kct-security-agent-*/security-scans/*"}, +> {"Effect": "Allow", "Action": "iam:PassRole", +> "Resource": "arn:aws:iam::'$ACCOUNT_ID':role/security-agent-*", +> "Condition": {"StringEquals": {"iam:PassedToService": "securityagent.amazonaws.com"}}}, +> {"Effect": "Allow", "Action": ["kms:GenerateDataKey", "kms:Decrypt", "kms:Encrypt", "kms:DescribeKey"], +> "Resource": "arn:aws:kms:*:'$ACCOUNT_ID':key/*", +> "Condition": {"StringLike": {"kms:ViaService": "s3.*.amazonaws.com"}}} +> ] +> }' +> +> # 3. Ensure the SSM agent's managed policy is attached (idempotent). +> aws iam attach-role-policy \ +> --role-name "$INSTANCE_ROLE_NAME" \ +> --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore 2>&1 | grep -v "EntityAlreadyExists" || true +> ``` +> +> When this finishes, come back to the conversation. I'll re-run the pre-flight checks and continue from Step C.1. + +The handoff bundles all admin operations the existing-instance path needs -- there are no other admin handoffs later in the flow. + +The agent MUST omit any block from the handoff that's already correct. Example: if `TAG_OK=yes` but `ROLE_OK=no`, drop block #1, keep #2 and #3. The point is to print exactly what's needed, not the full template every time. + +#### Step C.1: Customer provides WorkerCount + +`INSTANCE_ID`, `REGION`, and `ACCOUNT_ID` were already captured in Step C.0. The remaining input the customer chooses is WorkerCount: + +```bash +# WorkerCount: how many parallel atx-ct containers to run on this instance. +# Default 1 (single container). For multi-repo parallelism, customer chooses +# N sized to their instance's vCPU/RAM. +WORKER_COUNT="${WORKER_COUNT:-1}" +``` + +ALWAYS ask before proceeding: "How many parallel containers do you want on this instance? Each worker uses ~3-4 vCPU and ~4-8 GB RAM. Default 1." Ask explicitly even when the customer is only running a single analysis, so they know multi-worker is an option. Sizing guidance based on instance type: + +- t3.medium / t3.large (2 vCPU): 1 worker +- m5.xlarge / m5.2xlarge (4-8 vCPU): 2-4 workers +- m5.4xlarge or larger (16+ vCPU): up to 5 workers (cap; for more parallelism use the Batch path) + +For monorepos or memory-heavy analyses, scale down. The skill does NOT auto-detect the instance's capacity for an existing instance; the customer is responsible for sizing. + +#### Step C.2: Verify SSM is online + +```bash +PING=$(aws ssm describe-instance-information \ + --filters "Key=InstanceIds,Values=$INSTANCE_ID" \ + --query 'InstanceInformationList[0].PingStatus' --output text --region $REGION) +[ "$PING" = "Online" ] || { echo "ERROR: SSM agent not Online (got: ${PING:-no response})"; exit 1; } +``` + +If not Online, the instance is missing `AmazonSSMManagedInstanceCore` on its IAM role, or the SSM agent is not running. Customer must fix this before proceeding. + +Define the SSM helpers used by the rest of the steps: + +```bash +ssm_submit() { + aws ssm send-command --region $REGION \ + --instance-ids "$INSTANCE_ID" --document-name AWS-RunShellScript \ + --parameters "commands=[\"$1\"]" --query 'Command.CommandId' --output text +} + +ssm_run() { + local cmd="$1" + local CMD_ID=$(aws ssm send-command --region $REGION \ + --instance-ids "$INSTANCE_ID" --document-name AWS-RunShellScript \ + --parameters "commands=[\"$cmd\"]" --query 'Command.CommandId' --output text) + aws ssm wait command-executed --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" --region $REGION 2>/dev/null || true + aws ssm get-command-invocation --region $REGION \ + --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" \ + --query 'StandardOutputContent' --output text +} +``` + +#### Step C.3: Verify Docker is installed (install if missing) + +```bash +DOCKER_STATUS=$(ssm_run "command -v docker >/dev/null 2>&1 && echo INSTALLED || echo MISSING") +``` + +If `MISSING`, install: + +```bash +ssm_run "if command -v dnf >/dev/null; then sudo dnf install -y docker; \ + elif command -v apt-get >/dev/null; then sudo apt-get update -qq && sudo apt-get install -y docker.io; \ + elif command -v yum >/dev/null; then sudo yum install -y docker; \ + else echo 'ERROR: unsupported package manager' >&2; exit 1; fi && \ + sudo systemctl start docker && sudo systemctl enable docker" +``` + +Verify with `ssm_run "docker --version"`. If it fails, ask the customer to install Docker manually and re-try. + +#### Step C.4: Pull the public docker image + +Reachability check (any HTTP response code means reachable; the public ECR API legitimately returns 401 for anonymous requests): + +```bash +HTTP_CODE=$(ssm_run "curl -sS --max-time 10 -o /dev/null -w '%{http_code}' https://public.ecr.aws/v2/") +``` + +Expected: `200` or `401`. If `000`, the instance has no path to public.ecr.aws. Mitigation: customer adds NAT, OR mirrors the image to ECR Private and overrides `ATX_IMAGE_URI` in Step C.5 below. + +Pull: + +```bash +ssm_run "sudo docker pull public.ecr.aws/d9h8z6l7/aws-transform:latest" +``` + +If the pull fails: typical causes are network egress, insufficient disk space, or a private-registry override needed. + +#### Step C.5: Launch the atx-ct container(s) + +If `WORKER_COUNT=1`, launch a single container named `atx-ct` (matches CFN single-worker naming). If `WORKER_COUNT>1`, launch `atx-ct-1`, `atx-ct-2`, ..., `atx-ct-N`. Each container runs `atx ct server` as the foreground process; this mirrors the CFN UserData pattern: override the image's job-runner entrypoint with bash and run the server (which keeps the container alive). The container image must already contain `atx ct` -- there is no runtime install step. + +```bash +if [ "$WORKER_COUNT" -eq 1 ]; then + CONTAINERS="atx-ct" +else + CONTAINERS=$(seq -f "atx-ct-%g" 1 $WORKER_COUNT) +fi + +for name in $CONTAINERS; do + ssm_run "sudo docker rm -f $name 2>/dev/null; \ + sudo docker run -d --name $name --restart unless-stopped \ + --entrypoint /bin/bash \ + -e CT_OUTPUT_BUCKET=atx-ct-output-${ACCOUNT_ID} \ + -e AWS_REGION=${REGION} \ + public.ecr.aws/d9h8z6l7/aws-transform:latest \ + -c 'mkdir -p /home/atxuser/.atxct/sources /home/atxuser/.atxct/shared && \ + source ~/.bashrc && atx ct server'" +done +``` + +Multi-worker uses bridge networking (no `--net=host`) so each container has its own network namespace. Launches happen sequentially via SSM, so total launch time scales with `WORKER_COUNT`. + +#### Step C.6: Wait for all container(s) healthy + +```bash +for name in $CONTAINERS; do + for i in $(seq 1 18); do + STATUS=$(ssm_run "sudo docker ps --filter 'name=^${name}$' --format '{{.Status}}'") + echo "$STATUS" | grep -q '(healthy)' && { echo "$name healthy after $((i*5))s"; break; } + sleep 5 + done +done +``` + +If any container fails to reach healthy after 90s, inspect that specific container's logs: `ssm_run "sudo docker logs 2>&1 | tail -30"`. Common causes are install network failure (one container's network namespace differs) and per-worker port conflicts (rare with bridge networking). + +Verify the CT CLI in one container (all containers share the same image, so verifying one is enough): + +```bash +FIRST_CONTAINER=$(echo $CONTAINERS | awk '{print $1}') +ssm_run "sudo docker exec $FIRST_CONTAINER bash -c 'source ~/.bashrc && atx --version'" +``` + +#### Step C.7: Verify Instance Role Permissions (sanity check) + +Step C.0's pre-flight should have caught any missing role permissions before we got here, but the `atx ct server` startup runs real backend calls (resume remediations, list sources) that exercise permissions in ways the static check can't fully simulate. This step is a runtime sanity check. + +Check the first container's startup log for AccessDenied errors: + +```bash +FIRST_CONTAINER=$(echo $CONTAINERS | awk '{print $1}') +ssm_run "sudo docker logs $FIRST_CONTAINER 2>&1 | grep -iE 'AccessDenied|not authorized' | head -5" +``` + +If the grep returns empty: server initialized cleanly. Proceed to **Step 7 (Confirm and Submit)**. + +If the grep returns matches: this means the role's policy is somehow incomplete relative to the [Instance Role IAM](#instance-role-iam) section, despite Step C.0 saying it had `transform-custom:*`. Most likely cause: a custom inline policy that has only some of the required statements. Re-emit the **same combined admin handoff from Step C.0** (`aws iam put-role-policy --policy-name atx-transform-access ...` with the FULL spec from the Instance Role IAM section -- do NOT subset it). This will overwrite the partial policy with the complete one. + +## Instance Role IAM + +The EC2 instance's IAM role (`atx-transform-role` from Step 4) needs: + +```json +{ + "Statement": [ + {"Effect": "Allow", "Action": "transform-custom:*", "Resource": "*"}, + {"Effect": "Allow", + "Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket", "s3:DeleteObject"], + "Resource": ["arn:aws:s3:::atx-source-code-${ACCOUNT_ID}", + "arn:aws:s3:::atx-source-code-${ACCOUNT_ID}/*", + "arn:aws:s3:::atx-ct-output-${ACCOUNT_ID}", + "arn:aws:s3:::atx-ct-output-${ACCOUNT_ID}/*"]}, + {"Effect": "Allow", "Action": "secretsmanager:GetSecretValue", + "Resource": "arn:aws:secretsmanager:*:${ACCOUNT_ID}:secret:atx/*"}, + {"Effect": "Allow", + "Action": ["securityagent:ListAgentSpaces", + "securityagent:CreateCodeReview", "securityagent:StartCodeReviewJob", + "securityagent:ListCodeReviewJobsForCodeReview", + "securityagent:ListFindings", "securityagent:BatchGetFindings", + "securityagent:StartCodeRemediation"], + "Resource": "arn:aws:securityagent:*:*:agent-space*", + "Condition": {"StringEquals": {"aws:ResourceAccount": "${ACCOUNT_ID}"}}}, + {"Effect": "Allow", + "Action": ["s3:GetObject", "s3:ListBucket"], + "Resource": ["arn:aws:s3:::kct-security-agent-*", + "arn:aws:s3:::kct-security-agent-*/*"]}, + {"Effect": "Allow", "Action": "s3:PutObject", + "Resource": "arn:aws:s3:::kct-security-agent-*/security-scans/*"}, + {"Effect": "Allow", "Action": "iam:PassRole", + "Resource": "arn:aws:iam::${ACCOUNT_ID}:role/security-agent-*", + "Condition": {"StringEquals": {"iam:PassedToService": "securityagent.amazonaws.com"}}}, + {"Effect": "Allow", "Action": ["kms:GenerateDataKey", "kms:Decrypt", "kms:Encrypt", "kms:DescribeKey"], + "Resource": "arn:aws:kms:*:${ACCOUNT_ID}:key/*", + "Condition": {"StringLike": {"kms:ViaService": "s3.*.amazonaws.com"}}} + ] +} +``` + +Plus the AWS-managed policy `arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore` (attached separately) so the SSM agent can phone home. + +## Container Customization + +The pre-built `public.ecr.aws/d9h8z6l7/aws-transform:latest` image includes Java 8/11/17/21/25, Python 3.8-3.14, Node.js 16-24, Maven, Gradle, common build tools, AWS CLI v2, and AWS Transform CLI. CT CLI is installed at container start time via the curl install. + +For continuous modernization analyses, the pre-built image's defaults handle every runtime need. For custom TDs requiring a runtime not in the image (Rust, Go, .NET on Linux), follow [custom-remote-execution#custom-image-path](custom-remote-execution.md#custom-image-path-docker-required). + +## Runtime Version Switching + +For remediation runs that target a specific language version (e.g., Java 21, Python 3.13), pass the version as an environment variable on the `docker run` (Step 6): + +```bash +ssm_run "sudo docker run -d --name atx-ct ... \ + -e JAVA_VERSION=21 \ + -e PYTHON_VERSION=3.13 \ + -e NODE_VERSION=22 \ + $IMAGE -c '...'" +``` + +Available versions: +- **Java**: 8, 11, 17, 21, 25 (Amazon Corretto) +- **Python**: 3.8-3.14 (accepts `3.13` or `13`) +- **Node.js**: 16, 18, 20, 22, 24 + +For analyses, runtime switching is generally not needed. + +## Limits + +- Per-job temp files (LOG, AID_FILE, STDOUT_LOG) keyed by JOB_ID let multiple concurrent jobs coexist +- Bedrock throughput is per-account -- running many parallel continuous modernization containers shares the quota; large workloads may throttle + +## Error Handling + +| Error | Cause | Fix | +|---|---|---| +| SSM agent not Online | Instance role missing `AmazonSSMManagedInstanceCore` or no outbound internet | Re-attach the managed policy; verify VPC has NAT or public IP | +| Container exits / restarts on launch | `atx ct server` crashed on startup, or the image failed to pull | Check container logs: `ssm_run "sudo docker logs ${CONTAINER_NAME} 2>&1 \| tail -50"`. If the image failed to pull, verify outbound internet (NAT/public IP) so the host can reach `public.ecr.aws`. If the server crashed, check for port conflicts on 8081. If UserData itself failed, the stack is in `ROLLBACK_COMPLETE`; check `aws cloudformation describe-stack-events --stack-name $STACK_NAME` | +| `atx ct analysis run` clone fails | PAT expired or repo private to a different account | Verify customer's PAT has access; re-stage source config (Step 6) | +| Findings missing after analysis | Server crashed before persisting | Check `tail /tmp/atxct-.log`; recover via `analysis get --id $AID` | +| Artifacts missing from S3 | Wrapper killed before upload step | Re-run upload manually (see Cancellation section) | +| Polling never completes | atx ct server hung or container down | `ssm_run "sudo docker ps"` and `ssm_run "sudo docker logs ${CONTAINER_NAME} \| tail"` to diagnose | +| Container starts but all AWS API calls return "credentials not available" or fail to reach IMDS | Bridge networking + IMDSv2 hop limit = 1 (default). Token TTL expires before reaching the container's network namespace. | On an existing instance, the customer can run `aws ec2 modify-instance-metadata-options --instance-id --http-put-response-hop-limit 2 --region `. We do NOT modify metadata options on customer instances automatically; it's a side-effect on resources they own. The CFN-managed flow does not hit this in practice with current Docker bridge defaults. | +| Status-check ssm_run calls hang during fan-out | Older fan-out submissions kept SSM agent worker slots occupied until each wrapper exited (CommandWorkersLimit default 5). Mitigated by submitting all workers via a single SSM command and using `( ( bash X & ) & )` double-fork to orphan each wrapper to init. SSM marks the launch Success immediately. | If you still observe queueing, list in-flight commands with `aws ssm list-commands --instance-id --query 'Commands[?Status==\`InProgress\`].CommandId'` and cancel orphaned ones via `aws ssm cancel-command --command-id `. Read wrapper progress through a single batched `ssm_run` reading `/tmp/atxct-fan-w*-*.{log,aids,rids}` rather than many small calls. | +| `atx ct analysis run` hangs cloning from internal/self-hosted git host | Subnet has 0.0.0.0/0 egress (Step 5b validation passed) but no route to the customer's internal git host. VPN / Direct Connect / VPC peering missing or filtered. | The skill cannot auto-verify routes to corporate-network git hosts. Confirm with the network team that the subnet's route table includes a path to the git host CIDR. Test from any instance in the same subnet: `nslookup ` then `curl -v https:///`. | +| Stack create fails with "no NAT/route" or container UserData times out after Step 5b validation passed | Subnet's 0.0.0.0/0 default route was removed (or the subnet was switched to a different route table) between Step 5b validation and stack deploy. | Re-run Step 5b validation on the current state of the route table. The check uses `ec2:DescribeRouteTables`; if the network team is making concurrent changes, run validation immediately before the admin handoff. | + +## Pricing + +Direct customer to: +- EC2 pricing: https://aws.amazon.com/ec2/pricing/ +- AWS Transform agent minutes: https://aws.amazon.com/transform/pricing/ + +Do NOT quote specific dollar amounts or time estimates. + +## Cleanup + +**Never delete the stack or stop/terminate the instance without explicit customer confirmation.** `delete-stack` is destructive -- it removes the instance, IAM role, and security group, and any in-flight analyses on the instance will be terminated. Even if the customer said "I'm done" earlier in the conversation, ask again before issuing the delete. + +When the customer indicates they're finished, prompt with options and **wait for explicit confirmation** before running any of the commands below: + +> Your EC2 stack `${STACK_NAME}` is still running and incurring charges. What would you like to do? +> 1. **Delete the stack** (admin handoff) -- removes the instance, IAM role, and security group atomically. Stops all charges. Your S3 buckets and Secrets Manager entries persist (so analysis history survives). Requires admin creds -- I'll print the command for someone with `cloudformation:DeleteStack` + `iam:Delete*` to run. +> 2. **Stop the instance** -- keeps the stack but stops the EC2. No compute charges, small EBS storage charge. Container needs to re-initialize after restart. I CAN run this with executor creds (`ec2:StopInstances` on the tagged instance). +> 3. **Keep running** -- instance stays up. Hourly EC2 charges continue. Useful if another analysis is coming. +> +> Reply with 1, 2, or 3. + +Do NOT run delete-stack proactively. Do NOT assume option 1 because the customer's last analysis finished. The customer must explicitly choose. + +**Option 1: Delete the entire stack -- admin handoff** (only after explicit confirmation) + +The agent does NOT run `delete-stack` itself. Deleting the stack tears down the IAM role, instance profile, and security group, which requires `iam:DeleteRole`, `iam:DeleteRolePolicy`, `iam:DeleteInstanceProfile`, and `cloudformation:DeleteStack` -- all of these live in the admin policy, not the executor policy. The agent prints the command and the customer's admin runs it: + +```bash +# Admin runs: +aws cloudformation delete-stack --stack-name "$STACK_NAME" --region $REGION +aws cloudformation wait stack-delete-complete --stack-name "$STACK_NAME" --region $REGION +``` + +This removes everything in the stack atomically. If anything fails to delete, the stack moves to `DELETE_FAILED` and the customer can inspect what's left (executor creds CAN read this): + +```bash +aws cloudformation describe-stack-events --stack-name "$STACK_NAME" --region $REGION \ + --query 'StackEvents[?ResourceStatus==`DELETE_FAILED`]' +``` + +**Option 2: Stop the instance** (only after explicit confirmation) + +```bash +aws ec2 stop-instances --instance-ids $INSTANCE_ID --region $REGION +``` + +The stack stays in `CREATE_COMPLETE`. Customer can `aws ec2 start-instances` later to bring it back. Note: container needs time to become healthy after start (atx ct server has to re-initialize). + +**Option 3: Keep running** + +No-op. Customer continues to pay EC2 hourly charges. Useful when expecting another analysis soon. + +**What persists across delete-stack:** + +| Resource | Persists? | Why | +|---|---|---| +| S3 buckets (`atx-source-code-${ACCOUNT_ID}`, `atx-ct-output-${ACCOUNT_ID}`) | ✅ Yes -- managed outside the stack | Customer's analysis results survive stack lifecycles | +| Secrets Manager (`atx/github-token`, etc.) | ✅ Yes -- managed outside the stack | Customer's tokens persist for next run | +| Customer-supplied VPC, subnet, security group (when reused) | ✅ Yes -- never owned by the skill | Customer or their network team owns these | +| Stack-managed resources (instance, IAM role, profile, SG when stack-created) | ❌ No -- deleted with stack | Recreated on next `create-stack` | + +**Resources the skill MUST NEVER delete, under any circumstances:** + +- **Customer-supplied VPCs, subnets, route tables, NAT gateways, internet gateways, transit gateways, VPC peering connections** -- these are network infrastructure owned by the customer or their network team. The skill never created them (we don't have permission to), and we never delete them. If a customer asks "clean up everything including the VPC," refuse and explain that VPC lifecycle is a network-team responsibility. +- **Customer-supplied security groups** (when the customer chose "reuse" at Step 5b's SG ask) -- these existed before our stack and persist after it. Only stack-created SGs (when the customer typed `new`) get cleaned up via `delete-stack`. +- **S3 buckets** (`atx-source-code-*`, `atx-ct-output-*`) -- these hold customer analysis history and are intentionally outside the stack to survive stack delete-and-recreate. The skill never empties them or removes them. Lifecycle policies auto-expire objects (7 days for source bundles, 30 days for output artifacts), so residual storage cost converges to zero without intervention. +- **Secrets** (`atx/github-token`, `atx/gitlab-token`, etc.) -- these hold customer credentials and persist across stacks. Customers may want to keep them for future analyses on a fresh stack. The skill never deletes them. + +If the customer asks for "complete teardown" or "delete everything," the agent's response is: "I can run `delete-stack` which removes the runner instance, IAM role, instance profile, and any stack-created security group. Your S3 buckets, secrets, and customer-owned network resources (VPC/subnets/SGs you supplied) stay -- those aren't owned by this skill. If you want to remove those too, please do it directly in the AWS console or CLI; I won't run those commands because they're destructive of data and infrastructure outside the skill's scope." diff --git a/aws-transform/steering/workload-continuous-modernization-findings.md b/aws-transform/steering/workload-continuous-modernization-findings.md new file mode 100644 index 00000000..9c19a831 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-findings.md @@ -0,0 +1,95 @@ +--- +name: findings +description: List/filter/get/update/delete findings (vulnerabilities, tech-debt issues, upgrade opportunities) by repo, source, severity (exact via --severity or threshold via --min-severity), status, analysis type, or auto-fix transform. +--- +name: findings + +# Findings + +## Telemetry + +When running `atx ct analysis run` or `atx ct remediation create`, always include `--telemetry`. + +Format: `--telemetry "agent=,executionMode="` +- `agent` — the AI assistant driving this session (lowercase, no spaces). Use the real assistant name — e.g. kiro, claude, amazonq, copilot. +- `executionMode` — `local` + +If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session. + +```bash +# List with JSON output (machine-readable). Always pass --json from agents. +atx ct findings list --json + +# Filter by repo, source, severity, type, status, analysis, or fix transform +atx ct findings list \ + --repo :: \ + --source \ + --severity \ + --min-severity \ + --type \ + --status \ + --analysis-id \ + --fix-transform + +# Severity flags (mutually exclusive -- pass at most one): +# --severity Exact match. e.g. --severity high returns only high findings. +# --min-severity Threshold. e.g. --min-severity medium returns medium AND high. +# For "show me findings at least " prompts, use --min-severity. + +# Get a single finding by ID +atx ct findings get --id + +# Update a finding (status, notes, dismiss) +atx ct findings update --id --status --reason "dismiss reason" --notes "notes" + +# Batch update multiple findings +atx ct findings batch-update --ids --status --reason "reason" + +# Delete a finding (must be dismissed or obsolete) +atx ct findings delete --id +``` + +## Status set + +`open`, `dismissed`, `obsolete`. Transitions a user can drive: `open ↔ dismissed`. `obsolete` is a terminal state set by the system when a re-analysis no longer produces the finding — users do not transition into or out of it. + +## Filter shapes — pick the narrowest one + +Filtering at the CLI is materially faster than pulling everything and filtering after the fact. Each shape below is backed by a server-side index. Combinations that don't match one of these degrade to a full account scan with in-memory filtering and get slow on accounts with thousands of findings. + +| User intent | Filter shape | +|---|---| +| Findings from one analysis run | `--analysis-id ` (alone or combined with anything) | +| Live findings on one repo | `--repo --status ` | +| Account-wide triage | `--status ` (optionally `+ --severity ` for one level, or `+ --min-severity ` for a threshold) | +| One repo, one analysis type | `--repo --type ` (single type only) | +| Everything under one source | `--source ` (alone) | +| Auto-fixable by a known transform | `--fix-transform ` (alone or combined) | + +### Anti-patterns + +- Calling `atx ct findings list --json` with no filters and post-filtering in the model. Always filter at the CLI. +- Per-repo loops when a single `--source` filter would cover the whole batch. +- Omitting `--status open` when the user only cares about live findings — `dismissed` and `obsolete` pile up over time. +- Passing `--type` and `--analysis-id` together when `--analysis-id` alone already pins the result set to one run. +- "Auto-fixable" without a transform name → narrow with `--type tech-debt-quick` first. `tech-debt-quick` findings carry an ATX-transform fix; `security` findings carry a security-agent fix (see the [remediation](workload-continuous-modernization-remediation.md) skill). Findings without a `fix` field may still be remediable — see the [remediation](workload-continuous-modernization-remediation.md) skill's decision tree. +- `--type` alone or `--type --severity`/`--type --min-severity` (no status, no repo) → add `--status open` to anchor on the live-triage shape. +- Passing both `--severity` and `--min-severity` in the same call → the CLI rejects this. Pick one. + +### Multi-repo, multi-type questions + +`--repo` accepts one slug. For multi-repo questions, prefer `--source` (one call covers every repo under that source). For multi-type questions, call once per type and merge — combining `--repo` with multiple types is not supported by a single index path. + +## Remediating findings + +Auto-remediable findings can be fixed by passing their IDs to `remediation create`: + +```bash +atx ct findings list --type security --json # find auto-remediable security findings +atx ct remediation create --ids --name "Fix name" --telemetry "agent=,executionMode=local" +``` + +- **Security findings** (`--type security`) route to the AWS Security Agent and produce a code diff or, for GitHub sources, an auto-opened pull request. +- **Tech-debt / upgrade findings** route to an ATX transform (PR/CR). + +See the [remediation](workload-continuous-modernization-remediation.md) skill for outcomes by source provider and for handling findings without a `fix` field. diff --git a/aws-transform/steering/workload-continuous-modernization-guide.md b/aws-transform/steering/workload-continuous-modernization-guide.md new file mode 100644 index 00000000..9e0c89f1 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-guide.md @@ -0,0 +1,231 @@ +--- +name: guide +description: Interactive onboarding guide — walks new users through the full AWS Transform - continuous modernization (continuous modernization) workflow step by step, detects current state, explains concepts, and drives the user forward. +--- + +# Guide + +You are now in guided onboarding mode. Your job is to walk the user through the full AWS Transform - continuous modernization (continuous modernization) workflow one step at a time. Be proactive — you drive the conversation, not the user. + +For the exact commands at each step, use the corresponding skill (`/source`, `/discovery`, `/analysis`, `/findings`, `/remediation`, `/reporting`). This guide focuses on workflow orchestration — detecting state, explaining concepts, and moving the user forward. + +## Two Modes + +### Local Mode + +- Storage: local (`~/.atxct/`) +- Execution: local (this machine) +- No scheduling, no team sharing +- Good for: trying it out, small repos, individual use + +### Infrastructure Mode + +- Storage: S3 +- Execution: Fargate or EC2 +- Supports scheduling, team sharing, CI/CD +- Good for: teams, recurring analysis, scale + +## Routing + +This guide handles continuous modernization onboarding only. For routing across Custom vs. continuous modernization (named transforms, prior findings, edge cases), see [continuous modernization routing](workload-continuous-modernization-routing.md). Do not duplicate routing logic here. + +## On Start — Detect State (Prereq check /setup skill) + +ALWAYS begin by running: + +```bash +atx ct status --health +``` + +DO NOT share this command with the customer in your response. Only run it to check the current status. This is just a table guide for you to know which step to go to based on the current state. + +This returns sources, repo counts, analyses, findings, and remediations. Use these to determine where the user is: + +| Condition | Start at | +| -------------------------------------------------------- | ----------------------------------------- | +| No mode selected, nothing configured | Step 1 | +| Mode selected but no source configured | Step 2 | +| Source exists but 0 repos discovered | Step 2 (re-scan) | +| Infrastructure mode, no execution environment configured | Step 3 | +| All infra configured, no analysis ever run | Step 5 | +| Analyses or findings exist | Step 5 (show progress, offer next action) | + +## Step 1: Mode Selection + +Explain for first time users: "Hi, I am AWS Transform - continuous modernization. I can help analyze your codebase for tech debt, security issues, and upgrade opportunities, then help you fix them. You can also run targeted upgrades like Java 8→21 or migrate AWS SDKs. AWS Transform - continuous modernization can run in two modes: Local and on AWS Infrastructure." + +Explain: "How do you want to run AWS Transform - continuous modernization? + +- Local — Everything runs on this machine. Good for testing or small repos. +- Your AWS infrastructure — S3 + Fargate/EC2. Supports teams, scheduling, scale." + +After selection, proceed to Step 2 to set up sources. + +## Step 2: Source + +Explain: "A **source** tells AWS Transform - continuous modernization where your repositories are — a GitHub org, a GitLab group/user, a Bitbucket workspace/project, or a local folder." + +Ask the user, "Where does your code live?": + +- **GitHub org** — needs an org name and a Personal Access Token (PAT) +- **GitLab group/user** — needs a group or username and a Personal Access Token (PAT). Supports self-hosted instances. +- **Bitbucket workspace/project** — needs a workspace (Cloud) or project key (Data Center) and an API token. Supports self-hosted instances. +- **Local folder** — just needs a path on disk + +**If the user picks an unsupported source.** AWS Transform - continuous modernization currently supports only GitHub, GitLab, Bitbucket, and local folders. If the user names anything else, do NOT stop or fail. Acknowledge it's not directly supported, then offer the local-folder workaround: + +> "We don't yet support direct integration with every source control system. In the meantime, the easiest way to try AWS Transform - continuous modernization on a few of your repositories is to clone them to your local machine — I can walk you through it. Once they're local, AWS Transform - continuous modernization will analyze them and, when you run a remediation, apply the fixes directly to the local files. From there, you can diff and push back to your repository the way you normally would." + +Wait for them to confirm. If they agree, restart Step 2 with **Local folder**. If they want to skip for now, follow the "Let them skip" rule. + +Use the `/source` skill for the exact commands to add a source. + +For local folders: the `/discovery` skill scans the path you provide; never guess or use the current working directory. + +If the user doesn't have a GitHub PAT, explain: "You'll need a Personal Access Token with `repo` scope. Create one at GitHub → Settings → Developer settings → Personal access tokens. For analysis only, read-only is fine. For auto-fix PRs (remediation), you'll need write access." + +If the user doesn't have a GitLab PAT, explain: "You'll need a Personal Access Token with `api` scope. Create one at GitLab → Settings → Access Tokens → Personal Access Tokens. The `api` scope covers reading projects, pushing branches, and creating Merge Requests for remediation." + +If the user doesn't have a Bitbucket token, explain: "For Bitbucket Cloud, go to https://id.atlassian.com/manage-profile/security/api-tokens and click 'Create API token with scopes'. Select these scopes: `read:repository:bitbucket`, `write:repository:bitbucket`, `read:pullrequest:bitbucket`, `write:pullrequest:bitbucket`. You'll also need your Bitbucket account email (for API auth, pass via `--email`) and your Bitbucket username (for git clone/push, pass via `--username` — visible in your clone URLs at bitbucket.org). For Bitbucket Data Center (self-hosted), create an HTTP Access Token in your project/repo settings and pass `--url` with your instance URL." + +If Infrastructure mode, explain: "As next steps, you need to set up your infrastructure and environment.", proceed to Step 3. +If Local mode, explain: "As next steps, you can run different types of analysis", move to Step 4. + +After success, move to Step 3 (Infrastructure mode) or Step 4 (Local mode). + +## Step 3: Setup Execution Environment (Infrastructure mode only) + +This step only runs in Infrastructure mode. Local mode runs on this machine automatically. + +Explain: "Execution environment is used for analysis (detecting tech debt, security issues, upgrade opportunities) and remediation (running transforms that generate fixes; PR creation uses the GitHub API)." + +Explain: "Where should analysis and remediations run? + +- Fargate (recommended) — Managed containers. Scales automatically. +- EC2 — Your own instance. Good for existing build servers." + +If EC2, follow the `/ec2-execution` skill (existing instance: provide instance ID or IP; new instance: launch with AWS Transform - continuous modernization runtime pre-installed). If Fargate, follow the `/batch-execution` skill (creates ECS cluster, task definition, IAM roles). + +After completion, move to Step 4. + +## Step 4: Analysis + +### Local Mode Summary + +Show a summary of the status of the current setup if running in local mode: + +``` +Setup complete. + + ✓ Mode: Local + ✓ Source: GitHub (acme-corp) -- 127 repos + ✓ Execution: This machine +``` + +### Infrastructure Mode Summary + +Show a summary of the status of the current setup if running in infrastructure mode: + +``` +Setup complete. + + ✓ Mode: Infrastructure + ✓ Source: GitHub (acme-corp) -- 127 repos + ✓ Execution: Fargate +``` + +### Select and Start an Analysis + +**Render this menu as plain numbered markdown text in your response and wait for the user to type a choice. Do NOT route it through any structured choice/picker tool (e.g., `AskUserQuestion` in Claude Code, or any equivalent multi-select/option UI in other harnesses) — those tools impose option caps that silently drop Agentic Readiness and Modernization Readiness. All six options below MUST appear verbatim.** + +``` +What do you want to do next? + + 1. Tech Debt -- Quick + Outdated dependencies and easy wins. + 2. Tech Debt -- Comprehensive + Deeper analysis, more findings. + 3. Security analysis + Vulnerabilities and CVEs. + 4. Agentic Readiness + Analyze how ready your repos are for AI agents (frameworks, APIs, docs). + 5. Modernization Readiness + Analyze modernization opportunities (infrastructure, application, data, security, operations). + 6. Run remediation + Skip analysis and go straight to an upgrade (e.g., Java 8→21, AWS SDK migrations). +``` + +Use the `/analysis` skill for the exact commands. Show progress while it runs. After completion, summarize findings by severity: + +``` +Analysis complete + +Found **N findings** across M repos: + - **X high** -- fix these first + - **Y medium** + - **Z low** + +What would you like to do next? + + • List all findings (uses /findings) + • Schedule continuous analysis (Infrastructure mode) + • Auto-remediate high-severity issues + • Auto-remediate everything + • Later -- Save for next time +``` + +### Remediation Selected + +Remediation requires: + +1. **Execution environment** — already configured in Step 3 (Infrastructure) or local. +2. **GitHub write access** — to create branches and PRs. If the token from Step 2 was read-only, prompt the user to update it with `repo` scope. +3. **GitLab write access** — to push branches and create Merge Requests. The token needs `api` scope. +4. **Bitbucket write access** — to push branches and create Pull Requests. Cloud needs API token with `write:repository:bitbucket` + `write:pullrequest:bitbucket` scopes. Data Center needs HTTP Access Token with write permissions. + +After token is sufficient, list available remediations grouped by language (e.g., Java: `java8-to-java21`, `aws-sdk-v1-to-v2`; Python: `python39-to-python312`, `boto2-to-boto3`; Node.js: `node18-to-node22`, `aws-sdk-v2-to-v3`). + +Use the `/remediation` skill for the exact commands. After execution, show summary (repos upgraded, repos needing manual review) and offer to open PRs. + +### Scheduling Selected + +Scheduling requires Infrastructure mode. If user is in Local mode, explain: "Scheduling requires Infrastructure mode (S3 + Fargate/EC2). Local mode runs on-demand only — no background jobs. Switch to Infrastructure mode to enable continuous analysis, continuous remediation, and team notifications." + +If already in Infrastructure mode: + +- **Recurring analysis** — ask cadence (Daily / Weekly / Custom cron). Sets up an EventBridge rule. +- **Continuous remediation** — monitors for new findings and auto-fixes them. Requires recurring analysis and GitHub write access. Offers severity thresholds (high → auto-fix immediately; medium → auto-fix batched daily; low → log only). + +## When User Wants to Exit Onboarding + +If user says "cancel", "stop", "later", "skip setup", or wants to do something else: + +``` +Setup paused. + +Progress saved: + ✓ Source: GitHub (acme-corp) -- 127 repos + ○ Execution: Not configured +``` + +Let them exit. Pick up where they left off if they want to proceed with an action. + +## Completion + +When all steps are done, show a recap of what was accomplished in this session. Use the `/reporting` skill to generate an HTML report. + +## Rules + +1. **One question at a time.** Don't ask multiple things in one message. +2. **Explain briefly, then ask.** 1-2 sentences of context max. +3. **Offer defaults.** Have a recommended option. Make it easy to proceed. +4. **Show commands.** Always display the `atx ct` command you're running so the user learns the CLI. +5. **Handle errors plainly.** Say what failed, offer a fix or alternative: + - Connection error → "The AWS Transform - continuous modernization server isn't running. Starting it now: `atx ct server`" + - Invalid token → "That token didn't work. Make sure it has `repo` scope." + - No repos found → "No repos found in that source. Double-check the org name or path." +6. **Let them skip.** "skip", "later", "not now" — move on. +7. **Let them go back.** If they want to redo a step, accommodate. +8. **Show progress.** For long operations, show status. +9. **End with action.** Finish by doing something, not just "setup complete". +10. **Save progress.** If user cancels or errors out, let them resume. diff --git a/aws-transform/steering/workload-continuous-modernization-remediation.md b/aws-transform/steering/workload-continuous-modernization-remediation.md new file mode 100644 index 00000000..91c74c44 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-remediation.md @@ -0,0 +1,144 @@ +--- +name: remediation +description: Create/retry/list/delete remediation campaigns — auto-fix findings by applying ATX transforms or run custom TDs directly on repos, create PRs/CRs with fixes. +--- +name: remediation + +# Remediation + +## Before offering remediation + +When the user wants to remediate specific findings, fetch each one with `atx ct findings get --id ` and inspect its `fix` field before presenting options. + +When using `--transformation-name`, ask the user if they have additional instructions (e.g. a target version or specific guidance) before running. If they do, pass them via `-g "additionalPlanContext="`. + +- **`fix` is set** — the finding is auto-remediable via `--ids` alone. +- **`fix` is null and `recommendation` names a transformation definition** — offer `--ids --transformation-name `. +- **`fix` is null and no `recommendation`** — use [Transformation Definition Discovery for Remediation](#transformation-definition-discovery-for-remediation) to find a matching transformation definition. + +## Telemetry + +When running `atx ct analysis run` or `atx ct remediation create`, always include `--telemetry`. + +Format: `--telemetry "agent=,executionMode="` +- `agent` — the AI assistant driving this session (lowercase, no spaces). Use the real assistant name — e.g. kiro, claude, amazonq, copilot. +- `executionMode` — `local` + +If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session. + +```bash +# Create from finding IDs (uses each finding's fix.transform_name) +atx ct remediation create --ids --name "Fix name" --telemetry "agent=,executionMode=local" + +# Create from finding IDs with a custom TD override (ignores finding's fix field) +atx ct remediation create --ids --transformation-name --telemetry "agent=,executionMode=local" + +# Create directly on a repo with a custom TD (no findings required) +atx ct remediation create --transformation-name --repo :: --telemetry "agent=,executionMode=local" + +# Create with configuration passed to the TD +atx ct remediation create --transformation-name --repo :: -g "additionalPlanContext=Upgrade to Node.js 22" --telemetry "agent=,executionMode=local" + +# Create with local execution (runs ATX transform on the server instead of GitHub Actions) +atx ct remediation create --ids --name "Fix name" --local --telemetry "agent=,executionMode=local" + +# List all +atx ct remediation list + +# Check status +atx ct remediation status --id + +# Retry failed +atx ct remediation retry --id + +# Delete +atx ct remediation delete --id +``` + +## Security Remediation + +Security findings (from `atx ct analysis run --type security`) are auto-remediable with the **same** `remediation create` command as any other finding — no `--transformation-name` is needed. Security findings carry a `security-agent` fix, which routes to the AWS Security Agent code-remediation API instead of an ATX transform; the fix is generated server-side. + +```bash +# 1. Find the security findings to remediate +atx ct findings list --type security --json + +# 2. Create a remediation from one or more security finding IDs +# (same command as any other remediation) +atx ct remediation create --ids --name "Fix SQL injection" + +# 3. Check status -- the result is a code diff or, for GitHub sources, a pull request +atx ct remediation status --id +``` + +### Outcomes by source provider + +The result link surfaces in `remediation status` and in the remediation record's `execution_artifacts`. What you get depends on the repo's source provider: + +| Source provider | Per-repo status | Artifact | Meaning | +|-----------------|-----------------|----------|---------| +| **github** | `pr_open` | `pull_request_link` | AWS Transform - continuous modernization (continuous modernization) applies the diff on the scanned commit and **opens a pull request** automatically. | +| **gitlab** / **bitbucket** / **local** | `diff_ready` | `code_diff_link` | A presigned URL to a unified diff. No PR is opened — apply the diff yourself. | + +- For **GitHub** sources, the diff is applied on a fresh clone pinned to the scanned commit and pushed as a pull request (idempotent per finding — re-running updates the same PR). +- For **gitlab**, **bitbucket**, and **local** sources, security remediation stays **diff-only**. GitHub is the only provider that gets an auto-opened PR from a security diff. (This differs from tech-debt/transform remediation, where GitLab opens a Merge Request and Bitbucket opens a Pull Request — security diffs are not pushed to those providers.) +- The PR step is **fail-soft**: if opening the PR fails, the usable diff is preserved (status stays `diff_ready`, `code_diff_link` set) and the reason is recorded in `execution_artifacts.pr_bridge_error`. A bridge failure never discards a good diff. + +### Requirements + +- The `AWSSecurityAgentWebAppPolicy` IAM policy already required to run `analysis --type security` also grants the remediation permission — **no additional setup is needed** beyond `atx ct setup security-agent`. +- The finding must come from a security analysis whose code review is still resolvable. If it has aged out, the finding carries no fix (`fix: null`) and is manual-only — re-run the security analysis to make it remediable again. + +## Custom Transformation Definition Remediation + +Remediation supports running any transformation definition directly, with or without existing findings. + +### Three modes: + +1. **Findings-based (existing):** `--ids ` — uses each finding's `fix.transform_name` to determine which transformation definition to run on each repo. + +2. **Findings + transformation definition override:** `--ids --transformation-name ` — uses the repos from the findings but runs the specified transformation definition instead of the finding's `fix.transform_name`. Findings without a `fix` field are accepted (they would normally be rejected). + +3. **Direct transformation definition on repo (no findings):** `--transformation-name --repo ::` — runs the transformation definition directly on the specified repo without requiring any findings. Repos must be discovered first (`atx ct discovery scan`). + +### Configuration (`-g`) + +The `-g`/`--configuration` flag passes configuration directly to the transformation definition. Accepts three formats: +- Key-value: `"additionalPlanContext=Upgrade to Node.js 22,buildCommand=npm test"` +- JSON: `'{"additionalPlanContext":"Upgrade to Node.js 22"}'` +- File path: `"file:///path/to/config.json"` + +Only valid with `--transformation-name`. + +### Constraints + +- At least one of `--ids` or `--transformation-name` is required +- `--repo` cannot be used together with `--ids` (repos are derived from findings) +- `--repo` is required when `--transformation-name` is used without `--ids` +- `-g` is only valid with `--transformation-name` +- Repos must be discovered (`atx ct discovery scan`) before remediation can target them + +## Transformation Definition Discovery for Remediation + +When the user asks to remediate with a custom transformation definition, or a finding has no `fix` field and no `recommendation` that mentions a transformation definition, use transformation definition discovery to find the right transformation definition. If a finding already has a `recommendation` naming a transformation definition, skip discovery and use that name directly. + +### Workflow + +1. **List available transformation definitions:** Run `atx custom def list` to fetch all available transformation definitions. +2. **Match intent:** Based on the user's description of what they want to fix, match against transformation definition names and descriptions. +3. **Recommend and confirm:** Present the matched transformation definition(s) to the user. Wait for confirmation. +4. **Ask for additional instructions:** Ask the user if they have additional instructions (e.g. a target version or specific guidance) before running. If they do, pass them via `-g "additionalPlanContext="`. +5. **Execute:** Run `atx ct remediation create --transformation-name --repo ::` (with `-g` if the user provided additional instructions). + + +## Options + +### `--local` flag (remediation create) + +When `--local` is passed, the ATX transform runs directly on the server against a cloned copy of the repository instead of dispatching a GitHub Actions workflow. This is useful for: + +- GitHub-sourced repos where you want faster feedback without waiting for CI +- Environments where GitHub Actions workflows are not configured or available +- Testing transforms locally before committing to a full workflow run + +The execution mode is persisted on the remediation record (`compute_mode = 'local'`), so subsequent `retry` and `resume` operations automatically honour the original intent without needing to re-specify the flag. \ No newline at end of file diff --git a/aws-transform/steering/workload-continuous-modernization-reporting.md b/aws-transform/steering/workload-continuous-modernization-reporting.md new file mode 100644 index 00000000..c1f59d31 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-reporting.md @@ -0,0 +1,318 @@ +--- +name: reporting +description: Generate a self-contained HTML report visualizing the user's continuous modernization journey — sources connected, repos discovered, analyses run, findings, remediations launched. Use when: report, dashboard, show me everything, recap, status report, what have we done. +argument-hint: "[--repo ::]" +--- + +# Reporting + +Generate a single self-contained HTML report that walks through everything continuous modernization has done in this account: sources connected, repos discovered, analyses run, findings produced, remediations launched (with PR URLs). Claude assembles the HTML inline from the data it gathered and opens it in the browser. + +The report is a **static snapshot**: the HTML has all data baked in as JS consts, so it's portable (emailable, openable offline) and reflects the moment the report was generated. + +## Prerequisites + +- Server running: `atx ct status --health` returns `healthy`. If not, use the `server` skill to start it. + +## Data sources + +Populate the report from the live `atx ct` server. + +```bash +atx ct source list --json +atx ct repository list --json +atx ct analysis list --json +atx ct findings list --json +atx ct remediation list --json +``` + +### Raw response shapes + +The five commands do NOT return the same envelope. Read each carefully — `repository list` wraps results in `{"items": [...]}`; the other four return a flat array. All field names are snake_case. + +**`source list --json`** → flat array: +```jsonc +[ { "source": "...", "provider": "github", "identifier": "...", + "oidcConfigured": false, "githubAppConfigured": false } ] +``` + +**`repository list --json`** → object with `items` array: +```jsonc +{ "items": [ + { "id": "::", "slug": "::", + "full_name": "...", "default_branch": "main", + "language": null, "private": false, "archived": false, + "has_workflow": false, "assessed": false, "source": "...", "labels": [] } +] } +``` + +**`analysis list --json`** → flat array. Note: there is NO `findings` array on an analysis row — the count must be joined from `findings.json`. +```jsonc +[ { "id": "01K...", "status": "complete|running|failed|cancelled|pending|null", + "analysis_type": "security|tech-debt|...", "category": "Security", + "repos": ["::", ...], + "started_at": "2026-...", "completed_at": "2026-...", "failure_reason": null } ] +``` + +**`findings list --json`** → flat array: +```jsonc +[ { "id": "01K...", "analysis_id": "01K..." | "manual:01K...", + "repo": "::", "analysis_type": "...", "severity": "high|medium|low", + "category": "...", "title": "...", "description": "...", + "status": "open|dismissed|obsolete", // tech-debt + "metadata": { "status": "ACTIVE|RESOLVED" }, // security + "file_refs": ["path/to/file.java#L1-L10"], + "fix": { "kind": "atx-transform", "transform_name": "AWS/...", "effort": "Low" } | null } ] +``` + +**`remediation list --json`** → flat array. `repos` is an OBJECT keyed by slug, NOT an array. Statuses are lowercase. PR URL is `repos[].execution_artifacts.pr_url`. +```jsonc +[ { "id": "01K...", "name": "...", "transform_name": "...", + "status": "succeeded|failed|in_progress|pending|cancelled|...", // lowercase + "started_at": "...", "completed_at": "...", "finding_ids": [...], + "repos": { + "::": { + "status": "succeeded|failed|...", // lowercase + "transform_name": "...", "finding_id": "...", + "execution_artifacts": { "pr_url": "https://..." }, + "error": "..." } + } } ] +``` + +### Normalization + +**Findings.** `findingId=id`, `repositoryId=repo`, `severity`, `analysisType=analysis_type`, `category`, `title`, `fileRefs=file_refs`, `fix={transformName: fix.transform_name}` (only if set). Status: for `security` analyses use `metadata.status === 'ACTIVE'` → `open`; for everything else use the top-level `status` (default `open` if missing). + +**Analyses.** `id`, `analysisType=analysis_type`, `status`, `repos`, `startedAt=started_at`, `completedAt=completed_at`, `failureReason=failure_reason`. To compute `findingsCount`, build a map first: `findingsByAnalysisId = groupBy(findings, f => f.analysis_id)`. Manual findings carry `analysis_id` of the form `"manual:"` — also key by the unprefixed `` so manual analyses match. Then `findingsCount = (findingsByAnalysisId[analysis.id] || []).length`. **Drop analyses with status `null`** (the literal string) — these are integ-test artifacts that don't belong in the report. + +**Remediations.** Convert `repos` (object) to `repoStatuses` (array): +```js +// raw: r.repos = { "": { status, execution_artifacts: { pr_url }, error } } +// normalized: r.repoStatuses = [{ slug, status, executionRefs: { prUrl }, error }, ...] +const repoStatuses = Object.entries(r.repos || {}).map(([slug, rs]) => ({ + slug, + status: rs.status, + executionRefs: { prUrl: rs.execution_artifacts?.pr_url }, + error: rs.error, +})); +``` +Top-level fields: `id`, `name`, `transformName=transform_name`, `status` (lowercase), `repos = Object.keys(raw.repos)`, `findingIds=finding_ids`, `startedAt=started_at`, `completedAt=completed_at`. + +### Scoping with `--repo ::` + +If `--repo ::` was passed, scope the report to that repo: +- Replace `findings list --json` with `findings list --repo :: --json`. +- Filter analyses client-side to those whose `repos[]` includes the slug. +- Filter remediations client-side to those whose `repos` include the slug. + +If a list is empty (no remediations yet, no analyses yet), **skip that section entirely** — don't render an empty placeholder. + +## Flow + +### Step 1: Gather data + +Verify server health, then run the CLI calls above to load the five entity arrays (`sources`, `repositories`, `analyses`, `findings`, `remediations`). Normalize per the shape rules above so the renderer can stay simple. + +### Step 2: Assemble the HTML + +**Generation runs in a subagent — never inline in the main loop.** Producing this report is iterative: write a Python generator, run it, hit a JSON-shape mismatch or a Chart.js misconfig, fix, rerun. When that work happens inline, every Write/Edit/Bash retry is visible to the user and the run reads as broken. Delegating to a single subagent keeps all of it private — the user only sees the API calls (Step 1) and the final HTML (Step 3). + +**Save the raw JSON before dispatching.** Persist the five Step 1 outputs to `~/.atxct/shared/reports/raw//` as `sources.json`, `repositories.json`, `analyses.json`, `findings.json`, `remediations.json` (`mkdir -p` first). The subagent reads them off disk, not from the prompt — JSON for a real account is too large to pass inline. + +**Dispatch one subagent.** Inputs: +- The five JSON paths above. +- Output path: `~/.atxct/shared/reports/continuous-modernization-report-.html` (`mkdir -p` first). +- A pointer to this skill — it reads "Raw response shapes," "Normalization," and "Sections" as its spec. +- Approach hint: write a Python generator (more reliable HTML escaping than inline JS templates), run it, validate the HTML file is non-empty and opens, then return. + +The subagent's return value is ONE of: +- `{"path": "", "summary": ""}` +- `{"error": ""}` — only after exhausting reasonable retries (3–4). + +Anything else it learned mid-run — intermediate errors, retry counts, scripts written and discarded, JSON-shape surprises — is dropped on the floor and never relayed to the parent or the user. + +**HTML output requirements (the subagent must satisfy these):** + +- The `` element and the main `<h1>` MUST both be exactly `AWS Transform - continuous modernization Report` — note "continuous modernization" is lowercase, "AWS Transform" stays capitalized. Do not paraphrase or substitute the product name. +- Chart.js loaded via CDN: `<script src="https://cdn.jsdelivr.net/npm/chart.js@4"></script>` +- All CSS inlined in a `<style>` block +- All data inlined as JS `const` declarations (`SOURCES`, `REPOSITORIES`, `ANALYSES`, `FINDINGS`, `REMEDIATIONS`) — JSON-stringified, then safe-escaped before embedding: replace `</` with `<\/` (a finding's text containing literal `</script>` will otherwise close the data block and break the page) and strip U+2028 / U+2029 (valid in JSON, illegal as JS string literals). Verify by counting `</script>` in the output — expected exactly 2 (Chart.js CDN closer + inline data closer); more means a payload broke containment. +- No `fetch()` calls — the report must open offline + +Use a clean modern look: light theme, system font stack, generous whitespace, ~1100px max-width centered. Severity colors: high `#dc2626`, medium `#f59e0b`, low `#10b981`. + +### Step 3: Open the report + +```bash +open ~/.atxct/shared/reports/continuous-modernization-report-<timestamp>.html +``` + +Tell the user the path and what's in the report. + +### Step 4: Clean up the raw JSON dir + +The HTML report has all data baked in — once Step 3 succeeds, the raw JSON files at `~/.atxct/shared/reports/raw/<UNIX-TIMESTAMP>/` are no longer needed: + +```bash +rm -rf ~/.atxct/shared/reports/raw/<UNIX-TIMESTAMP>/ +``` + +## Sections (top to bottom) + +Each section renders only if its data is non-empty. + +### Snapshot header + +KPI cards across the top, one number per entity: + +``` +[ N sources ] [ N repos ] [ N analyses ] [ N open findings ] [ N remediations ] +``` + +No chart. Counts pulled from the lengths of each list (open findings = `findings.filter(f => f.status === 'open').length`). + +### Sources + +Chart: horizontal bar — repos per source. **Cap the chart at top 15 sources by repo count** + +Drilldown table — **top 25 sources by repo count**, not all of them. Note the total count above the table and link to `atx ct source list` for the full set. + +| Name | Provider | Identifier | Repos | +|------|----------|------------|-------| + +Fields (normalized): `name` (raw: `source`), `provider`, `identifier`, `repos_count` (computed from repository list). + +### Repositories + +Chart: doughnut — language distribution (group by `language`, count repos). **Cap at top 12 languages**; bucket the tail under "other" if needed. Treat missing `language` as `"unknown"`. + +No table by default — repo lists get too long. Mention that `atx ct repository list` shows the full table. + +Fields (raw → normalized): `slug`, `language`, `default_branch` → `defaultBranch`, `has_workflow`, `source`. + +### Analyses + +**Drop analyses with `status === "null"` (literal string) before charting or counting.** These are integ-test artifacts and would dominate the chart. + +Chart: stacked bar by `analysis_type`, segments = status (`complete`, `running`, `failed`, `cancelled`, `pending`). + +**Tooltip configuration is mandatory:** stacked bars in this chart can have segments that are pixel-thin (e.g., `agentic-readiness` with 3 entries next to `tech-debt` with 7,000). Default Chart.js hover requires the cursor to land inside the segment, which is unusable at that scale. Apply: + +```js +options: { + interaction: { mode: 'index', intersect: false }, + plugins: { + tooltip: { + mode: 'index', + intersect: false, + filter: (item) => item.parsed.y > 0, // hide zero-count rows + itemSort: (a, b) => b.parsed.y - a.parsed.y, // largest first + }, + }, + scales: { x: { stacked: true }, y: { stacked: true, beginAtZero: true } }, +} +``` + +Hovering anywhere over a column then surfaces every non-zero segment, sorted by count. + +Drilldown table — most recent 10 by `startedAt` desc: + +| ID (short) | Type | Status | Repos | Findings | Duration | +|------------|------|--------|-------|----------|----------| + +- Short ID: first 8 chars of `id`. +- Findings count: looked up from the precomputed `findingsByAnalysisId` map (NOT a field on the analysis row). +- Duration: `completedAt - startedAt` formatted (e.g. "2m 14s"). Blank if still running. +- For `failed` rows, render `failureReason` as a tooltip or expandable row. + +Fields (raw → normalized): `id`, `analysis_type` → `analysisType`, `status`, `repos`, `started_at` → `startedAt`, `completed_at` → `completedAt`, `failure_reason` → `failureReason`. `findingsCount` is computed via the join described in Normalization. + +### Findings + +Two charts side-by-side: +1. Bar — severity counts. Use `status === 'open'` only. **Only include severity buckets that have at least one finding** — don't render zero-count columns. Iterate `['high','medium','low']` in that order, filter to non-zero, then plot. +2. Doughnut — analysis-type split (`quick-scan`, `tech-debt`, `security`, `agentic-readiness`, `custom`, `manual`). Same rule: only include types with at least one finding. + +**Severity enum is `high | medium | low`. There is no `critical`.** + +Two drilldown tables: + +**Top risks** — group open findings by `title`, sort by repo count desc, take top 10: + +| Title | Severity | Repos affected | Auto-fix? | +|-------|----------|----------------|-----------| + +Auto-fix? = whether `fix.transformName` is set on any finding in the group. + +**Top auto-fix transforms** — group findings whose `fix.transformName` is set, by transform name: + +| Transform | Findings | Repos | Auto Remediable | +|-----------|----------|-------|-----------| + +Built-in? = whether the transform name starts with `AWS/`. Customer-namespace transforms (anything else) render as ❌. + +Fields: `findingId`, `repositoryId`, `severity`, `status`, `analysisType`, `category`, `title`, `fileRefs`, `fix.transformName`. + +### Remediations + +**Statuses are lowercase** (`succeeded`, `completed`, `complete`, `failed`, `in_progress`, `pending`, `cancelled`, `running`) — never pattern-match against uppercase. + +#### Trends chart (cumulative line) + +Replace any "by aggregate status" bar with a **cumulative line chart over time**. Three series: + +1. **Total created** — every remediation, keyed by `startedAt` date. +2. **Succeeded with PR** — remediations whose top-level `status` is in `{succeeded, complete, completed}` AND at least one repo has a non-null `executionRefs.prUrl`. Keyed by `completedAt` date (fall back to `startedAt` if missing). This is the strict definition of success — a transform can be marked `completed` without producing a PR (e.g., target version already met, or PR-publish step failed after a clean run). Only "with PR" represents real code in flight, so it's the only success line worth charting. +3. **Failed** — remediations with top-level `status === "failed"`. Keyed by `completedAt` date (fall back to `startedAt`). + +Bucket by ISO date (`startedAt.slice(0, 10)`), accumulate day by day, sort labels ascending. + +```js +const SUCCESS = new Set(['succeeded', 'complete', 'completed']); +const hasPR = r => (r.repoStatuses || []).some(rs => rs.executionRefs?.prUrl); +// per-day buckets: { created, succeededWithPR, failed } +// then cumulative running totals across sorted days +``` + +Chart configuration: + +- `type: 'line'`, three datasets in this order: Total created (blue, filled area), Succeeded with PR (green), Failed (red). +- `interaction: { mode: 'index', intersect: false }` and matching tooltip mode so a single hover surfaces all three series for that day. +- Y-axis: `beginAtZero: true`, ticks formatted with `Number.toLocaleString()`. +- X-axis: ISO date strings, `maxRotation: 0`, `autoSkip: true`. +- Legend at bottom. + +Below the chart, render a one-line summary: date range, succeeded-with-PR count and rate, failed count. + +#### Recent remediations with PRs + +Cap at **15 most recent** (by `startedAt` desc) where at least one repo has a PR URL. Note the total remediation count below. + +Drilldown — one card per remediation: + +``` +<Name> · <transformName> · <aggregate status> +N repos: X succeeded · Y failed · Z in progress + +PRs: + • <repo-slug> → <prUrl> + • <repo-slug> → <prUrl> + ... +Failures: + • <repo-slug>: <error> +``` + +PR URLs come from `repoStatuses[<repoSlug>].executionRefs.prUrl` (also accept `transform_pr_url` for older entries). Render as `<a href="...">` so they're clickable. + +Fields: `id`, `name`, `transformName`, `status` (aggregate), `repos`, `repoStatuses` (per-repo: `status`, `executionRefs.prUrl`, `error`, `startedAt`, `completedAt`), `findingIds`. + +## Tone + +Data-driven. The HTML is the deliverable. After Step 3, your reply is ONLY: + +1. The output path. +2. A 1–2 sentence summary, sourced from the subagent's `summary` field (e.g. "1 analysis failed", "3 PRs ready for review"). + +**Never relay subagent iteration state to the user.** No retry counts, no "I fixed an issue with X," no narration of intermediate scripts or errors. The visible surface across the whole run is: the Step 1 API calls, the Step 3 `open` command, and these one or two sentences. Nothing in between. + +If the subagent returned `{"error": ...}`, surface that one sentence — don't try to redo the work inline (that would re-leak every retry). diff --git a/aws-transform/steering/workload-continuous-modernization-routing.md b/aws-transform/steering/workload-continuous-modernization-routing.md new file mode 100644 index 00000000..20f87dd5 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-routing.md @@ -0,0 +1,170 @@ +# Custom vs continuous modernization Routing + +Route customer requests to the correct skill set. continuous modernization supports local, local-parallel, +new EC2, existing EC2, and Fargate + AWS Batch as compute options; Custom supports local +and Fargate + AWS Batch. The compute choice is independent of the Custom-vs-continuous modernization choice; +pick routing first, then surface compute options. + +## ⚠️ MANDATORY: Permission Consent After Compute Choice + +**When the customer chooses a remote compute option (EC2 or Batch/Fargate), the VERY FIRST response to the customer MUST be the permission consent message from the chosen execution skill. Do NOT ask any setup questions (source, analysis type, region, existing instance, etc.) before showing the consent message. If the customer says no, warn them about potential permission errors but continue anyway.** + +## Prerequisite: workload check + +This file applies ONLY when the request has cleared the workload-identification step in +POWER.md. The decision tables below assume you have already established that the request is +not VMware, SQL/database, or mainframe, and — for .NET — that the user explicitly chose +"analyze for tech debt / security" over "modernize." Do NOT use this file's keyword lists to +override the workload-identification step: + +- VMware → never continuous modernization; use `workload-vmware*.md`. +- SQL / Database (SQL Server, Oracle, MySQL, Aurora) → never continuous modernization; use `workload-sql*.md`. +- Mainframe / COBOL → never continuous modernization; use `workload-mainframe*.md`. +- .NET → ask intent first (three options: modernize / assessment for modernization / analyze for tech debt or security or CVEs). Only the "analyze for tech debt or security or CVEs" choice routes here. "Modernize" and "Assessment for modernization" both stay in the .NET workload. + +The "Routes to continuous modernization (always)" list below means "always relative to Custom" — it does NOT +mean "override the workload identification step." + +## Decision 1: Analysis-Time Routing (Starting Work) + +| Customer Intent | Route To | Notes | +| -------------------------------------------------------------------------------- | ------------------- | ------------------------------------------------ | +| "analyze / analysis / find / what's wrong / where do I start / evaluate my code" | **continuous modernization analysis** | Default for new customers and ambiguous requests | +| "Find security vulnerabilities / CVEs / security check / is my code secure" | **continuous modernization analysis** | continuous modernization-exclusive; Custom has no security TD | +| "Generate report / dashboard / trend / compare" | **continuous modernization reporting** | continuous modernization-exclusive; Custom is stateless | +| Customer mentions "continuous-modernization" by name | **continuous modernization** | Explicit request; honor the ask | +| Named transformation (e.g., "Upgrade Java 8 to 21"), no prior findings | **Custom** | Greenfield, no audit trail needed | +| "Run our internal/org-specific TD" | **Custom** | TD authoring/execution, no portfolio context | +| Customer not sure / first-time | **continuous modernization analysis** | Adoption bias; continuous modernization is the default front door | + +## Decision 2: Remediation-Time Routing (Fixing Existing Findings) + +Before answering, check: do the repos in scope have any prior continuous modernization analysis findings? + +| State | Route To | Why | +| ----------------------------------------------------------- | -------------------------------- | ------------------------------------------------------------------------ | +| Prior continuous modernization findings exist | **continuous modernization remediation** (always) | Must write to event log; otherwise next analysis can't attribute the fix | +| No prior findings, customer names a specific transformation | **Custom** | Stateless one-shot, no event log needed | +| No prior findings, customer asks "fix what you can find" | **continuous modernization analysis → remediation** | Run analysis first, then remediate through same surface | + +**Mixed scope** (some repos have findings, some don't): Split the request. Route repos +with findings through continuous modernization. Route others through Custom OR ask if they want unified +continuous modernization flow (recommended for adoption). + +## Quick Routing Reference + +### Routes to Custom (only if NO prior continuous modernization findings on these repos) + +- "Upgrade Java / Java 8 to Java 21" +- "Migrate to Java 21" +- "Migrate to Python 3.13" +- "Bump Node 16 to Node 22" +- "Migrate AWS SDK v1 to v2 / boto2 to boto3 / aws-sdk v2 to v3" +- "Spring Boot 2 to 3 / Angular to React / log4j to slf4j" +- "x86 Java to Graviton" +- "Run our internal/org-specific transformation" +- "Run this exact recipe across N repos" + +### Routes to continuous modernization (always) + +- "What's the state of our codebase?" +- "Scan our repos for issues" +- "What tech debt do we have?" +- "Find security vulnerabilities / CVEs" (continuous modernization-exclusive) +- "Are our repos ready for AI agents?" +- "Which repos can be modernized?" +- "Generate a modernization plan" +- "Where do I start with these 200 repos?" +- "Tell me what's outdated" +- "Auto-fix whatever you can find" +- "Inventory our GitHub org" +- "Find auto-remediable upgrades" +- "Continuous code health monitoring" (continuous modernization-exclusive) +- "Single repo — what should I fix?" +- "Compare repos against best practices" +- "I have a CVE, where else does it appear?" +- "Show me a dashboard of code health" (continuous modernization exclusive) +- "Compare this quarter to last quarter" (continuous modernization exclusive) +- "Generate a report for leadership" (continuous modernization exclusive) +- "I scanned last week, now fix the findings" (continuous modernization exclusive) +- "Apply the remediation we discussed" (continuous modernization exclusive) +- "I'm not sure what we need" +- "What can AWS Transform do for us?" +- "Audit my repos" +- "Evaluate my code" +- Any prompt mentioning "continuous-modernization" by name + +## Edge Cases + +| Situation | Route To | +| ---------------------------------------------------------------------- | -------------------------------------------------------------------- | +| Customer names a transformation AND prior continuous modernization findings exist | **continuous modernization** (audit trail wins over phrasing) | +| Customer says "analyze and upgrade Java" | **continuous modernization** (analysis surfaces the work, then continuous modernization dispatches Custom) | +| Customer has clear target but >50 repos | **continuous modernization** (scope discovery first), then continuous modernization remediation | +| Customer has clear target, <10 repos, no prior findings | **Custom** | +| Mixed scope (some repos with findings, some without) | Split; offer unified continuous modernization flow as default | +| Cross-type (tech-debt findings exist; customer asks for security scan) | **continuous modernization** (any continuous modernization history → all subsequent work in continuous modernization) | +| Ambiguous request | **continuous modernization** (adoption bias) | +| Explicit "use Custom" / "use continuous modernization" | Honor the ask | + +## Net Rule + +**Named transformation + no prior findings → Custom. Anything else → continuous modernization.** When in doubt, continuous modernization. + +## How to Check for Prior Findings + +**Prerequisite:** Before running any `atx ct` command, verify it's installed AND up to date: + +```bash +INSTALLED=$(atx ct --version 2>/dev/null | head -1) +LATEST=$(curl -fsSL "https://transform-cli.awsstatic.com/index.json" 2>/dev/null | grep -o '"latest"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*"latest"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/') +echo "Installed: ${INSTALLED:-not found}, Latest: ${LATEST:-unknown}" +``` + +If `INSTALLED` is empty or `LATEST` is newer, follow [workload-continuous-modernization-setup.md](workload-continuous-modernization-setup.md) to install/update it first. + +```bash +atx ct status +``` + +If this returns sources, repos, or findings → continuous modernization has been used before. +Route through continuous modernization for any remediation work. + +If `atx ct` is not configured or returns empty → no prior continuous modernization history. Custom is fine for +named transformations. + +## Adoption Nudge (After Custom Completes) + +After a Custom transformation completes successfully, present this message: + +> "Want to see what else might be worth fixing across your repos? AWS Transform - continuous modernization can scan for +> security, tech debt, and modernization opportunities — and keep a record of every +> remediation so future scans can tell you what got fixed and what didn't." + +## continuous modernization Skills Reference + +| Skill | When to Use | +| ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | +| [workload-continuous-modernization-guide.md](workload-continuous-modernization-guide.md) | New user onboarding, "how do I start?" | +| [workload-continuous-modernization-discovery.md](workload-continuous-modernization-discovery.md) | Analyze/discover repos from sources | +| [workload-continuous-modernization-analysis.md](workload-continuous-modernization-analysis.md) | Run security, tech-debt, agentic-readiness, modernization-readiness analyses | +| [workload-continuous-modernization-findings.md](workload-continuous-modernization-findings.md) | List/filter/manage findings | +| [workload-continuous-modernization-remediation.md](workload-continuous-modernization-remediation.md) | Create remediation campaigns, auto-fix findings | +| [workload-continuous-modernization-status.md](workload-continuous-modernization-status.md) | System overview and health check | +| [workload-continuous-modernization-source.md](workload-continuous-modernization-source.md) | Manage source connections | +| [workload-continuous-modernization-setup.md](workload-continuous-modernization-setup.md) | Infrastructure setup and configuration | +| [workload-continuous-modernization-server.md](workload-continuous-modernization-server.md) | Start, stop, or restart the AWS Transform - continuous modernization (continuous modernization) server | +| [workload-continuous-modernization-ec2-execution.md](workload-continuous-modernization-ec2-execution.md) | Run CT analysis/remediation on EC2 (new or existing instance) | +| [workload-continuous-modernization-batch-execution.md](workload-continuous-modernization-batch-execution.md) | Run CT analysis on AWS Batch (Fargate) — single job, AWS-managed compute | +| [workload-continuous-modernization-schedule.md](workload-continuous-modernization-schedule.md) | Schedule recurring analyses on an existing EC2 instance (EventBridge Scheduler + SSM) | +| [workload-continuous-modernization-reporting.md](workload-continuous-modernization-reporting.md) | Generate an HTML report of continuous modernization analyses | +| [workload-continuous-modernization-security-agent.md](workload-continuous-modernization-security-agent.md) | Security agent setup (admin) and runtime verification (executor) | + +## Custom Skills Reference + +| Skill | When to Use | +| ------------------------------------------------------------------------------------ | ----------------------------------- | +| [workload-custom.md](workload-custom.md) | Named transformations, TD execution | +| [workload-custom-remote-execution.md](workload-custom-remote-execution.md) | Batch/Fargate remote execution | +| [workload-custom-single-transformation.md](workload-custom-single-transformation.md) | Single repo transformation | +| [workload-custom-multi-transformation.md](workload-custom-multi-transformation.md) | Multi-repo parallel transformation | diff --git a/aws-transform/steering/workload-continuous-modernization-schedule.md b/aws-transform/steering/workload-continuous-modernization-schedule.md new file mode 100644 index 00000000..a2479657 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-schedule.md @@ -0,0 +1,1188 @@ +# continuous modernization Recurring Analysis Scheduling + +## Telemetry + +When running `atx ct analysis run` or `atx ct remediation create`, always include `--telemetry`. + +Format: `--telemetry "agent=<agent>,executionMode=<mode>"` + +- `agent` -- the AI assistant driving this session (lowercase, no spaces). Use the real assistant name -- e.g. kiro, claude, amazonq, copilot. +- `executionMode` -- `ec2` for the EC2 path, `fargate` for the Batch path + +If the user explicitly asks to disable telemetry, omit `--telemetry` for the rest of the session. + +Create and manage scheduled AWS Transform - continuous modernization (continuous modernization) work using AWS EventBridge Scheduler. Schedules fire on a cron expression (or one-shot via `at()`) and dispatch the work to either: + +- **EC2 path**: SSM SendCommand to the customer's running atx-ct container, or +- **Batch path**: Lambda invocation of `atx-trigger-batch-jobs` to submit a Fargate job + +The skill supports two job types: + +- **Analysis** (`JOB_TYPE=analysis`, default): runs `atx ct analysis run`, persists findings to the backend, uploads artifacts (for analysis types that produce code changes). Customer reviews and acts on findings later. +- **Remediation** (`JOB_TYPE=remediation`): runs `atx ct remediation create` against a pre-determined target (specific finding IDs OR a transformation+repo combo). Customer captures the target NOW (before scheduling); the schedule fires it later. + +Either way, results land in the same place as a manual run -- findings in the backend, PRs/MRs pushed by the backend (github/gitlab/bitbucket), or `code.zip` uploaded to S3 (local provider). + +## When to Use This Skill + +The customer's intent involves recurring or delayed work on a schedule: + +- "schedule this analysis to run weekly" +- "automate the scan, run it every Monday" +- "set up a cron job for tech-debt analysis" +- "I want this to run nightly / daily / monthly" +- "apply these fixes Friday at 9am" +- "delay the remediation until off-hours" +- "schedule the Java upgrade for next week" +- "I have findings to fix -- schedule it for tonight" + +**This is NOT for:** + +- One-shot analyses → use [continuous-modernization-analysis](workload-continuous-modernization-analysis.md), [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md), or [continuous-modernization-batch-execution](workload-continuous-modernization-batch-execution.md) +- Local cron on the customer's laptop -- use the OS's native cron; no AWS resources needed + +## Choose the Path + +Ask the customer (or infer from context): + +1. **EC2 path** -- fires on a long-running EC2 instance (one container, persistent). Best when the customer already has an EC2 stack from `continuous-modernization-ec2-execution` and wants to reuse it. +2. **Batch path** -- fires a Fargate job per scheduled invocation. Best when the customer uses `continuous-modernization-batch-execution` and prefers serverless / fan-out execution. + +```bash +PATH_TYPE="${PATH_TYPE:-ec2}" # or "batch" +``` + +If the customer has both available and didn't specify, default to whichever they used most recently for one-shot analyses. + +## Choosing the Right Mode + +The skill supports two job types. Match customer intent before committing. + +### Mode 1: Scheduled Analysis (`JOB_TYPE=analysis`) + +Use when the customer wants **recurring visibility** into their codebase health. Findings populate the backend; the customer reviews and acts on them later (manually or via a separate scheduled remediation). + +Customer-intent signals: + +- "schedule a weekly tech-debt scan" +- "run analysis every Monday" +- "track our code quality over time" +- "audit my repos monthly for security issues" +- "find new vulnerabilities each week" + +Setup: customer chooses an analysis type and cadence. Schedule fires `atx ct analysis run` and uploads artifacts (for analysis types that produce code changes). + +### Mode 2: Scheduled Remediation (`JOB_TYPE=remediation`) + +Use when the customer wants **delayed action** on a known set of issues. They've reviewed findings (or know the transformation to apply) and want to fire the remediation later -- e.g., during a maintenance window or after a code freeze. + +Customer-intent signals: + +- "apply these fixes Friday at 9am" +- "schedule the Java upgrade for next week" +- "delay the remediation until off-hours" +- "run AWS/java-version-upgrade on these repos every Sunday night" +- "fix these 50 findings tonight" + +Two sub-modes: + +- **`REMEDIATION_MODE=findings`**: customer provides explicit finding IDs (captured from a prior `atx ct findings list`). Schedule fires with those IDs hardcoded. Best for one-shot delayed remediation. + - Default: findings must have `.fix` populated (typical of `tech-debt-quick`). The backend uses each finding's `.fix.transform_name` automatically. + - **Hybrid mode** (optional): set `TRANSFORMATION_NAME` to override. Lets you remediate findings WITHOUT `.fix` populated (typical of `tech-debt-comprehensive`, `security` findings without auto-fix). The skill appends `--transformation-name <name>` to apply the same transformation to all selected findings -- so filter findings to one coherent category (e.g., all "Java" findings + `AWS/java-version-upgrade`) before scheduling. +- **`REMEDIATION_MODE=transformation`**: customer provides a transformation name + repo (e.g., `AWS/java-version-upgrade` on `my-org::my-repo`). Schedule fires that transformation directly without referencing findings. Best for recurring upgrades. + +### Routing a customer's request + +| Customer says | Mode | +| ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | +| "weekly tech-debt scan", "monthly security audit", "track findings over time" | Mode 1 (analysis) | +| "remediate these findings on Friday", "apply this fix tonight", "I have IDs to fix" (findings have `.fix`) | Mode 2, findings sub-mode (no `TRANSFORMATION_NAME`) | +| "remediate these comprehensive findings with `AWS/java-version-upgrade`", "apply this transformation to these specific finding IDs" | Mode 2, findings sub-mode + hybrid (set `TRANSFORMATION_NAME`) | +| "run `AWS/java-version-upgrade` every Sunday on my repos", "weekly Python upgrade" | Mode 2, transformation sub-mode | +| "scan AND auto-fix" | Decompose: run analysis NOW (one-shot), then route to Mode 2 findings sub-mode | +| Mixed/unclear | Ask: "Are you trying to (a) regularly scan your repos for visibility, or (b) schedule a fix you've already decided on?" | + +```bash +JOB_TYPE="${JOB_TYPE:-analysis}" # analysis | remediation +REMEDIATION_MODE="" # findings | transformation (only when JOB_TYPE=remediation) +TRANSFORMATION_NAME="" # optional override for findings sub-mode (when .fix == null) +``` + +## Prerequisites + +### For the EC2 path + +Customer **MUST** already have an EC2 instance running with the `atx-ct` container set up via [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md). The schedule reuses that instance. + +Specifically: + +1. EC2 instance running with one or more atx-ct containers active (CFN-managed via `atx-runner` stack, or any other source; both supported) +2. Container has the CT server up (`docker exec ${CONTAINER_NAME} atx ct status --health` succeeds) +3. Instance has `AmazonSSMManagedInstanceCore` attached (so SSM can target it) +4. Customer has previously registered the source via `atx ct source add` (so a manual `atx ct analysis run` would work) + +The schedule skill auto-discovers the instance via CFN stack outputs first (preferred), then falls back to tag-based search. If neither finds the instance, the customer is asked to set `INSTANCE_ID` manually. + +**Multi-worker stacks**: if the customer's stack was deployed with `WorkerCount > 1`, the schedule routes to a specific worker via the optional `WORKER_NUM` env var (1..WorkerCount, default 1). Container naming: single-worker stacks use `atx-ct`; multi-worker stacks use `atx-ct-1`, `atx-ct-2`, etc. The skill auto-detects WorkerCount from the CFN stack parameter, so the customer only sets `WORKER_NUM` when they want a specific worker (otherwise worker 1 is used). To fan out N parallel scheduled jobs across N workers, create N schedules each with a distinct `WORKER_NUM` (1..N). + +If any prerequisite is missing, hand off to [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md) first. + +### For the Batch path + +Customer **MUST** already have the Custom CDK stack (`AtxInfrastructureStack`) deployed via [continuous-modernization-batch-execution](workload-continuous-modernization-batch-execution.md). The schedule reuses the existing Lambda functions and Batch infrastructure. + +Specifically: + +1. `AtxInfrastructureStack` in `CREATE_COMPLETE` or `UPDATE_COMPLETE` state +2. `atx-trigger-batch-jobs` Lambda function is callable +3. Customer has previously registered the source via `atx ct source add` (same as EC2 path) +4. For local sources: source bundle uploaded to `s3://atx-source-code-${ACCOUNT_ID}/repos/` + +If any prerequisite is missing, hand off to [continuous-modernization-batch-execution](workload-continuous-modernization-batch-execution.md) first. + +## Two-Persona Permission Model + +This skill respects the same admin/executor split as `continuous-modernization-ec2-execution.md`. Schedule lifecycle and IAM mutations require admin; everything else is executor. + +| Persona | Managed policy | Owns | When used in this skill | +|---|---|---|---| +| **Admin** | `AdministratorAccess` (or equivalent) | `iam:Create/Put/Delete*` on `AtxSchedulerInvocationRole`, `scheduler:CreateScheduleGroup` | Step 3 (one-time IAM + group setup) only | +| **Executor** | (a least-privilege role scoped to the actions listed below) | `scheduler:CreateSchedule`/`DeleteSchedule`/`GetSchedule`/`UpdateSchedule`/`ListSchedules` (scoped to `atx-control-tower` group), `iam:PassRole` on `AtxSchedulerInvocationRole`, all read calls | Steps 1, 2, 4 (verify, identity-detect, parameter collection), Step 5 (create-schedule), Step 6 (verify), entire Schedule Management section | + +The agent NEVER runs admin actions itself, even if an admin profile is reachable locally -- every admin step prints a handoff command and waits for the user to run it. + +## Step 1: Verify Path Prerequisites + +Verify the customer's chosen path is ready before creating any schedules. Branches on `PATH_TYPE`: + +```bash +PROFILE="${AWS_PROFILE:-default}" +REGION=$(aws --profile $PROFILE configure get region 2>/dev/null || echo "us-east-1") +ACCOUNT_ID=$(aws --profile $PROFILE sts get-caller-identity --query Account --output text) +PATH_TYPE="${PATH_TYPE:-ec2}" # "ec2" or "batch" + +if [ "$PATH_TYPE" = "ec2" ]; then + # ───────────────────────────────────────────────────────────────── + # EC2 path: discover the instance, verify SSM, verify container + # ───────────────────────────────────────────────────────────────── + # Discovery order: (1) CFN stack outputs (preferred), (2) instance tags, (3) ask customer. + # If the EC2 was provisioned via the CFN-based continuous-modernization-ec2-execution skill, the stack + # (default name: atx-runner) has the InstanceId in its outputs. + + STACK_NAME="${STACK_NAME:-atx-runner}" + INSTANCE_ID="" + ROLE_ARN="" + + # (1) Try the CFN stack first + STACK_STATUS=$(aws --profile $PROFILE --region $REGION cloudformation describe-stacks \ + --stack-name "$STACK_NAME" --query 'Stacks[0].StackStatus' --output text 2>/dev/null) + + case "$STACK_STATUS" in + CREATE_COMPLETE|UPDATE_COMPLETE) + INSTANCE_ID=$(aws --profile $PROFILE --region $REGION cloudformation describe-stacks \ + --stack-name "$STACK_NAME" \ + --query 'Stacks[0].Outputs[?OutputKey==`InstanceId`].OutputValue' --output text) + ROLE_ARN=$(aws --profile $PROFILE --region $REGION cloudformation describe-stacks \ + --stack-name "$STACK_NAME" \ + --query 'Stacks[0].Outputs[?OutputKey==`RoleArn`].OutputValue' --output text) + echo "Found CFN stack '$STACK_NAME'. Instance: $INSTANCE_ID, Role: $ROLE_ARN" + ;; + esac + + # (2) If no stack found, try tag-based discovery (handles legacy / non-CFN setups) + if [ -z "$INSTANCE_ID" ]; then + echo "No CFN stack '$STACK_NAME' found. Falling back to tag-based discovery." + + MATCHES=$(aws --profile $PROFILE --region $REGION ec2 describe-instances \ + --filters "Name=tag:Name,Values=atx-ct-runner,atx-ct-runner-*" \ + "Name=instance-state-name,Values=running" \ + --query 'Reservations[].Instances[].[InstanceId, LaunchTime]' \ + --output text 2>/dev/null | sort -k2 -r) + + INSTANCE_COUNT=$(echo "$MATCHES" | grep -c '^i-' || true) + + if [ "$INSTANCE_COUNT" = "0" ]; then + echo "No running instance tagged Name=atx-ct-runner* and no CFN stack '$STACK_NAME'." + echo "List candidates:" + aws --profile $PROFILE --region $REGION ec2 describe-instances \ + --filters "Name=instance-state-name,Values=running" \ + --query 'Reservations[].Instances[].[InstanceId,Tags[?Key==`Name`]|[0].Value]' \ + --output table + echo "Ask the customer for the instance ID and set INSTANCE_ID before continuing." + return 1 + fi + + if [ "$INSTANCE_COUNT" -gt 1 ]; then + echo "WARNING: $INSTANCE_COUNT running instances tagged atx-ct-runner*:" + echo "$MATCHES" | column -t + echo "" + echo "Picking the most recently launched. If wrong, set INSTANCE_ID manually." + fi + + INSTANCE_ID=$(echo "$MATCHES" | head -1 | awk '{print $1}') + + # Try to derive role ARN from the instance's profile (for non-CFN setups) + PROFILE_ARN=$(aws --profile $PROFILE --region $REGION ec2 describe-instances \ + --instance-ids "$INSTANCE_ID" \ + --query 'Reservations[0].Instances[0].IamInstanceProfile.Arn' --output text 2>/dev/null) + PROFILE_NAME=$(echo "$PROFILE_ARN" | awk -F/ '{print $NF}') + ROLE_ARN=$(aws --profile $PROFILE iam get-instance-profile \ + --instance-profile-name "$PROFILE_NAME" \ + --query 'InstanceProfile.Roles[0].Arn' --output text 2>/dev/null) + fi + + echo "Using instance: $INSTANCE_ID" + echo "Instance role: $ROLE_ARN" + + # Verify the SSM agent on the instance is online + SSM_STATUS=$(aws --profile $PROFILE --region $REGION ssm describe-instance-information \ + --filters "Key=InstanceIds,Values=$INSTANCE_ID" \ + --query 'InstanceInformationList[0].PingStatus' --output text 2>/dev/null) + + if [ "$SSM_STATUS" != "Online" ]; then + echo "ERROR: SSM agent is not online for instance $INSTANCE_ID (status: $SSM_STATUS)." + echo "Verify the instance role has AmazonSSMManagedInstanceCore." + echo " CFN-managed instances: it's attached automatically -- check stack status." + echo " Ad-hoc instances: aws iam list-attached-role-policies --role-name <role>" + echo "Wait 30-90s after attaching the policy for the agent to phone home." + return 1 + fi + + # Helper for sending commands to the instance (drop-in for the prior $SSH "..." pattern). + ssm_run() { + local cmd="$1" + local CMD_ID=$(aws --profile $PROFILE --region $REGION ssm send-command \ + --instance-ids "$INSTANCE_ID" \ + --document-name AWS-RunShellScript \ + --parameters "commands=[\"$cmd\"]" \ + --query 'Command.CommandId' --output text) + aws --profile $PROFILE --region $REGION ssm wait command-executed \ + --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" 2>/dev/null || true + aws --profile $PROFILE --region $REGION ssm get-command-invocation \ + --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" \ + --query 'StandardOutputContent' --output text + } + + CONTAINER_STATUS=$(ssm_run "sudo docker inspect -f '{{.State.Status}}' atx-ct 2>/dev/null || echo missing") + if [ "$CONTAINER_STATUS" != "running" ]; then + echo "Container atx-ct is not running (status: $CONTAINER_STATUS)." + echo "Customer must restart it before scheduling. See continuous-modernization-ec2-execution Step 6 (verify) or Step 7 (start container)." + return 1 + fi + + echo "Container atx-ct: running" + +elif [ "$PATH_TYPE" = "batch" ]; then + # ───────────────────────────────────────────────────────────────── + # Batch path: verify CDK stack and Lambda function exist + # ───────────────────────────────────────────────────────────────── + CDK_STACK_NAME="AtxInfrastructureStack" + + # Verify the CDK stack is deployed + CDK_STACK_STATUS=$(aws --profile $PROFILE --region $REGION cloudformation describe-stacks \ + --stack-name "$CDK_STACK_NAME" --query 'Stacks[0].StackStatus' --output text 2>/dev/null) + + case "$CDK_STACK_STATUS" in + CREATE_COMPLETE|UPDATE_COMPLETE) + echo "Found CDK stack '$CDK_STACK_NAME' in $CDK_STACK_STATUS state." + ;; + "") + echo "ERROR: CDK stack '$CDK_STACK_NAME' not found in account $ACCOUNT_ID, region $REGION." + echo "The Batch path requires the CDK stack to be deployed first." + echo "Hand off to continuous-modernization-batch-execution and run setup.sh, then return here." + return 1 + ;; + *) + echo "ERROR: CDK stack '$CDK_STACK_NAME' is in $CDK_STACK_STATUS state. Wait for completion or investigate." + return 1 + ;; + esac + + # Verify the trigger Lambda exists and is callable + LAMBDA_FN="atx-trigger-batch-jobs" + LAMBDA_STATUS=$(aws --profile $PROFILE --region $REGION lambda get-function \ + --function-name "$LAMBDA_FN" \ + --query 'Configuration.State' --output text 2>/dev/null) + + if [ "$LAMBDA_STATUS" != "Active" ]; then + echo "ERROR: Lambda function '$LAMBDA_FN' is not Active (status: ${LAMBDA_STATUS:-not found})." + echo "Verify the CDK stack deployment completed without errors." + return 1 + fi + + echo "Lambda '$LAMBDA_FN': Active" + echo "Batch path is ready for scheduling." + + # No INSTANCE_ID, no ROLE_ARN -- Batch path uses Lambda + Batch infrastructure managed by CDK. + INSTANCE_ID="" + ROLE_ARN="" + +else + echo "ERROR: PATH_TYPE must be 'ec2' or 'batch' (got: '$PATH_TYPE')" + return 1 +fi +``` + +## Step 2: Detect Identity Type + +Different identity types need different IAM setup. Detect once before doing any IAM work: + +```bash +CALLER_ARN=$(aws --profile $PROFILE sts get-caller-identity --query Arn --output text) + +case "$CALLER_ARN" in + *":user/"*) + IDENTITY_TYPE="iam_user" + USER_NAME=$(echo "$CALLER_ARN" | awk -F'/' '{print $NF}') + echo "Identity: IAM user $USER_NAME" + ;; + *":assumed-role/"*) + IDENTITY_TYPE="federated" + ROLE_NAME=$(echo "$CALLER_ARN" | awk -F'/' '{print $(NF-1)}') + echo "Identity: federated role $ROLE_NAME" + echo "Will skip put-user-policy. Federated roles inherit perms from their attached policies." + ;; + *) + IDENTITY_TYPE="unknown" + echo "Identity type not recognized: $CALLER_ARN" + echo "Will attempt schedule creation. If AccessDenied, customer's admin must grant" + echo " scheduler:CreateSchedule/DeleteSchedule/GetSchedule/UpdateSchedule/ListSchedules" + echo " on arn:aws:scheduler:*:\$ACCOUNT_ID:schedule/atx-control-tower/*" + echo " plus iam:PassRole on arn:aws:iam::\$ACCOUNT_ID:role/AtxSchedulerInvocationRole" + ;; +esac +``` + +For Amazon engineers using Isengard / SSO-federated access, the result is `federated`. For most enterprise customers using IAM Identity Center, also `federated`. IAM users are uncommon outside legacy setups. + +## Step 3: One-Time IAM Setup (Admin Handoff) + +This step provisions the `AtxSchedulerInvocationRole` (the role EventBridge Scheduler assumes when firing each schedule) and the schedule group. Every action in this step is **admin-only**: + +| Action | Why admin | +|---|---| +| `iam:AttachRolePolicy` (3a -- SSM safety net for ad-hoc instances) | IAM mutation | +| `iam:CreateRole` (3c -- `AtxSchedulerInvocationRole`) | IAM mutation | +| `iam:PutRolePolicy` (3d -- inline policy on that role) | IAM mutation | +| `scheduler:CreateScheduleGroup` (3f -- `atx-control-tower`) | Resource lifecycle | +| `iam:PutUserPolicy` (3g -- for IAM-user identities only) | IAM mutation | + +**The agent does NOT run these commands itself**, even if an admin profile is reachable locally. It prepares the inputs, prints the bundle as a single admin handoff, and waits for the user to come back. This is the same pattern Step 5d uses in `continuous-modernization-ec2-execution.md`. + +**Profile-name guidance for the agent.** When emitting this admin handoff (or any admin handoff in this skill -- including the security-agent bootstrap below in Mode 1), the agent MUST use the placeholder `<your-admin-profile>` rather than guessing a profile name from the customer's local AWS config, environment variables, or shell history. Customers commonly have multiple AWS profiles configured locally and the agent has no reliable way to identify which one carries admin permissions. Substituting a wrong name leads to confusing AccessDenied errors during execution. + +This step is also idempotent -- re-running it on an already-set-up account is safe (`grep -v EntityAlreadyExists` and `grep -v ConflictException` swallow the no-op cases). So the admin runs it once per account; subsequent schedules reuse the same role and group. + +The agent assembles the input values (`ACCOUNT_ID`, `REGION`, `PATH_TYPE`, `INSTANCE_ROLE_NAME` if EC2, `IDENTITY_TYPE` + `USER_NAME` from Step 2), then prints: + +> **Admin handoff -- one-time scheduler setup** +> +> The schedule cannot be created until your account has the `AtxSchedulerInvocationRole` and the `atx-control-tower` schedule group provisioned. This requires admin / role-creation permissions (`iam:CreateRole`, `iam:PutRolePolicy`, `iam:PassRole`, `scheduler:CreateScheduleGroup`). Run it with an admin identity. Read-only or runtime credentials are enough for everything afterward. +> +> Ask someone with admin permissions to run this from the same shell, in the same region: +> +> ```bash +> # 3a. (EC2 path only) Ensure the instance role has AmazonSSMManagedInstanceCore. +> # CFN-managed instances (atx-runner stack) already have it via the stack's role definition. +> if [ "$PATH_TYPE" = "ec2" ] && [ -n "$INSTANCE_ROLE_NAME" ]; then +> aws iam attach-role-policy \ +> --role-name "$INSTANCE_ROLE_NAME" \ +> --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore 2>&1 | grep -v "EntityAlreadyExists" || true +> fi +> +> # 3c. Create the Scheduler invocation role (both paths share the role; policies differ). +> aws iam create-role \ +> --role-name AtxSchedulerInvocationRole \ +> --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"scheduler.amazonaws.com"},"Action":"sts:AssumeRole"}]}' 2>&1 | grep -v "EntityAlreadyExists" || true +> +> # 3d. Attach the path-specific inline policy. put-role-policy is idempotent (overwrites +> # the named policy). Each path uses a different policy name so both can coexist on the +> # same role -- useful when a customer schedules on both EC2 and Batch from the same account. +> if [ "$PATH_TYPE" = "ec2" ]; then +> POLICY_NAME="ssm-send-command" +> POLICY_DOC='{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":"ssm:SendCommand","Resource":"arn:aws:ec2:'$REGION':'$ACCOUNT_ID':instance/*","Condition":{"StringEquals":{"ssm:resourceTag/atx-remote-infra":"true"}}},{"Effect":"Allow","Action":"ssm:SendCommand","Resource":"arn:aws:ssm:'$REGION'::document/AWS-RunShellScript"}]}' +> else +> POLICY_NAME="lambda-invoke-batch-trigger" +> POLICY_DOC='{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":"lambda:InvokeFunction","Resource":"arn:aws:lambda:'$REGION':'$ACCOUNT_ID':function:atx-trigger-batch-jobs"}]}' +> fi +> aws iam put-role-policy \ +> --role-name AtxSchedulerInvocationRole \ +> --policy-name "$POLICY_NAME" \ +> --policy-document "$POLICY_DOC" +> +> # 3e. Brief wait for IAM propagation (eventual consistency). +> sleep 5 +> +> # 3f. Create the scheduler group to isolate our schedules. +> aws --region $REGION scheduler create-schedule-group \ +> --name atx-control-tower 2>&1 | grep -v "ConflictException" || true +> +> # 3g. (IAM-user identities only) Grant the user permission to manage schedules. +> # For federated/SSO identities, grant scheduler:CreateSchedule + iam:PassRole on +> # AtxSchedulerInvocationRole through the same mechanism your org uses for that role. +> if [ "$IDENTITY_TYPE" = "iam_user" ]; then +> aws iam put-user-policy \ +> --user-name "$USER_NAME" \ +> --policy-name atx-scheduler-access \ +> --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["scheduler:CreateSchedule","scheduler:DeleteSchedule","scheduler:GetSchedule","scheduler:UpdateSchedule","scheduler:ListSchedules","scheduler:GetScheduleGroup","scheduler:ListScheduleGroups"],"Resource":["arn:aws:scheduler:*:'$ACCOUNT_ID':schedule/atx-control-tower/*","arn:aws:scheduler:*:'$ACCOUNT_ID':schedule-group/atx-control-tower"]},{"Effect":"Allow","Action":"iam:PassRole","Resource":"arn:aws:iam::'$ACCOUNT_ID':role/AtxSchedulerInvocationRole"}]}' +> fi +> ``` +> +> When this finishes, come back to the conversation. I'll re-detect the role and group via read-only `iam:GetRole` + `scheduler:GetSchedule` and continue from Step 4. + +The agent then STOPS this turn. The admin runs the commands in their own terminal, outside the chat. On the next user turn, re-run Step 2 (Detect Identity Type) -- it should succeed now -- and continue. + +**Why one role, two inline policies:** A customer who schedules analyses on BOTH paths from the same account ends up with `AtxSchedulerInvocationRole` having both `ssm-send-command` AND `lambda-invoke-batch-trigger` inline policies. Each schedule's target uses whichever it needs -- the role grants both, the action filter on the policy ensures only the right action is allowed. + +## Step 4: Collect Schedule Parameters + +Parameter collection branches on `JOB_TYPE`. Common parameters first, then mode-specific. + +### Common parameters (both modes) + +```bash +# Customer's choice, lowercase + hyphens +SCHEDULE_NAME="atxct-weekly-techdebt" + +# The atx ct source name (must match `atx ct source list`) +LOGICAL_SOURCE_NAME="my-org-github" + +AGENT="<AGENT>" # AI assistant name (kiro, claude, amazonq, etc.) + +# Provider -- needed for Batch path to pick the right build_command_*() template. +# For EC2 path, ignored (uses the running container's existing source config). +PROVIDER="github" # github | gitlab | local + +# Cron expression (or at() for one-shot) +CRON_EXPR="cron(0 9 ? * MON *)" # Monday 9am +TIMEZONE="UTC" # or "America/Los_Angeles", "Europe/London", etc. + +# Batch + local provider only -- name of the source bundle in S3 +ZIP_NAME="" # e.g., "my-org-bundle" + +# Optional repo scope (analysis mode) -- leave empty for source-wide +REPO_FILTER="" # e.g., "--repo my-org::my-repo" +``` + +Common cron expressions: + +| Customer says | Cron expression | +| -------------------------- | ------------------------- | +| every Monday at 9am | `cron(0 9 ? * MON *)` | +| daily at 2am | `cron(0 2 * * ? *)` | +| weekdays at 5am | `cron(0 5 ? * MON-FRI *)` | +| first of every month | `cron(0 9 1 * ? *)` | +| every 15 minutes (testing) | `cron(0/15 * * * ? *)` | +| every hour | `cron(0 * * * ? *)` | +| one-shot at specific time | `at(2026-06-15T09:00:00)` | + +For other patterns: AWS Scheduler uses 6-field cron (`min hr day-of-month month day-of-week year`). Day-of-month and day-of-week can't both be `*` -- use `?` for the unused one. + +### Verify the source is registered + +The schedule skill creates the schedule but does NOT register sources -- that's the [continuous-modernization-source](workload-continuous-modernization-source.md) skill's job. Verify the source exists before creating the schedule, otherwise the customer gets a "successful" schedule that fails at fire time when the container can't find the source. + +```bash +echo "Verifying source '$LOGICAL_SOURCE_NAME' is registered..." +SOURCE_EXISTS=$(atx ct source list --json 2>/dev/null \ + | jq -r --arg n "$LOGICAL_SOURCE_NAME" '.[] | select(.name == $n) | .name' 2>/dev/null) + +if [ -z "$SOURCE_EXISTS" ]; then + echo "" + echo "❌ Source '$LOGICAL_SOURCE_NAME' is NOT registered in atx ct." + echo "" + echo "Register it first via the continuous-modernization-source skill, then re-run this step:" + echo " github/gitlab : atx ct source add --name $LOGICAL_SOURCE_NAME --provider <github|gitlab> --org <org-name> --token <PAT>" + echo " local : atx ct source add --name $LOGICAL_SOURCE_NAME --provider local --path <local-path>" + echo "" + echo "After registration, also ensure credentials are in Secrets Manager" + echo "(see continuous-modernization-ec2-execution Step 3 or continuous-modernization-batch-execution Step 2 for the put-secret-value pattern)." + return 1 +fi +echo "✅ Source '$LOGICAL_SOURCE_NAME' is registered (provider=$(atx ct source list --json | jq -r --arg n "$LOGICAL_SOURCE_NAME" '.[] | select(.name == $n) | .provider'))" +``` + +### Verify credentials are accessible (Batch path only) + +**Constraints:** + +- You MUST verify that the caller has access to the required credential secret BEFORE + creating the schedule. A schedule without accessible credentials will fail silently + at fire time. +- You MUST NOT create a schedule if this check fails — inform the user and refuse to + proceed. + +For non-local providers, verify the provider secret exists and is accessible: + +```bash +PROVIDER=$(atx ct source list --json | jq -r --arg n "$LOGICAL_SOURCE_NAME" '.[] | select(.name == $n) | .provider') + +if [ "$PROVIDER" != "local" ]; then + SECRET_ID="atx/${PROVIDER}-token" + + echo "Verifying access to credential secret '${SECRET_ID}'..." + if ! aws secretsmanager describe-secret --secret-id "$SECRET_ID" 2>/dev/null; then + echo "" + echo "❌ Cannot access secret '${SECRET_ID}'." + echo "" + echo "Either the secret does not exist, or you do not have permission to access it." + echo "Create it first (see continuous-modernization-batch-execution Step 2)." + echo "" + echo "Schedule creation blocked — the schedule would fail at fire time without valid credentials." + return 1 + fi + echo "✅ Credential secret '${SECRET_ID}' is accessible" +fi +``` + +### Mode-specific parameters + +#### Mode 1: Analysis (`JOB_TYPE=analysis`) + +Ask the customer: + +1. **Analysis type** -- `tech-debt-quick`, `tech-debt-comprehensive`, `agentic-readiness`, `modernization-readiness`, `security`, or `custom` +2. **(For `custom` only)** Transformation name and optional configuration + +```bash +if [ "$JOB_TYPE" = "analysis" ]; then + ANALYSIS_TYPE="tech-debt-quick" # tech-debt-quick | tech-debt-comprehensive | security | agentic-readiness | modernization-readiness | custom + + # Required only when ANALYSIS_TYPE=custom + TRANSFORMATION_NAME="" + CONFIGURATION="" + + if [ "$ANALYSIS_TYPE" = "custom" ]; then + [ -z "$TRANSFORMATION_NAME" ] && { echo "ERROR: ANALYSIS_TYPE=custom requires TRANSFORMATION_NAME"; return 1; } + fi +fi +``` + +**Security analysis bootstrap pre-check.** If `ANALYSIS_TYPE` is `security`, `agentic-readiness`, or `modernization-readiness`, the agent MUST verify the agent space has been bootstrapped before creating the schedule. Otherwise the schedule fires and the analysis fails almost immediately (the runtime tries `securityagent:CreateAgentSpace`, which the executor role doesn't grant). + +```bash +if [ "$JOB_TYPE" = "analysis" ] && [[ "$ANALYSIS_TYPE" =~ ^(security|agentic-readiness|modernization-readiness)$ ]]; then + AGENT_SPACE_ID=$(atx ct setup security-agent --status 2>/dev/null | jq -r '.agentSpaceId // ""') + + if [ -z "$AGENT_SPACE_ID" ]; then + cat <<'EOF' +ERROR: Security agent space not bootstrapped. + +Before scheduling a security/agentic-readiness/modernization-readiness analysis, +run ONE security analysis locally with admin credentials. This populates the +agent-space ID in your config so subsequent runs (including this schedule) +can use it without admin permissions. + + AWS_PROFILE=<your-admin-profile> AWS_REGION=$REGION atx ct analysis run \ + --type security \ + --source $LOGICAL_SOURCE_NAME \ + --repo "$LOGICAL_SOURCE_NAME::<one-repo>" \ + --telemetry "agent=$AGENT,executionMode=local" + +After it completes, re-run the schedule setup. The check will see agentSpaceId +populated and proceed. +EOF + return 1 + fi +fi +``` + +The agent MUST stop and emit the bootstrap admin handoff to the customer if `agentSpaceId` is empty. **Do NOT create the schedule until bootstrap is done** -- creating it anyway just produces a schedule that fails on every fire. + +#### Mode 2: Remediation (`JOB_TYPE=remediation`) + +Ask the customer which sub-mode: + +- **`findings`** -- they have specific finding IDs to fix (from a prior `atx ct findings list`). The skill captures those IDs NOW and bakes them into the schedule. +- **`transformation`** -- they want to run a specific transformation on a specific repo on schedule (no findings dependency). + +##### Sub-mode: findings + +Pre-flight: capture finding IDs before creating the schedule. The skill uses these as a frozen list -- at fire time, the schedule remediates exactly these IDs (no fresh discovery). + +`TRANSFORMATION_NAME` is **optional**: + +- **Leave empty** when findings have `.fix` populated (typical for `tech-debt-quick`). The backend uses each finding's `.fix.transform_name` automatically. +- **Set explicitly** when findings DON'T have `.fix` populated (typical for `tech-debt-comprehensive`, `security` issues without auto-fix). The skill appends `--transformation-name $TRANSFORMATION_NAME` to override, applying the same transformation to all selected finding IDs. + +Capture pattern depends on which case you're in: + +```bash +if [ "$JOB_TYPE" = "remediation" ] && [ "$REMEDIATION_MODE" = "findings" ]; then + # Optional explicit transformation override (for findings without .fix populated) + TRANSFORMATION_NAME="" # e.g., "AWS/java-version-upgrade" + + if [ -z "$TRANSFORMATION_NAME" ]; then + # ── Default capture: only findings with .fix populated ── + # Backend will pick the transformation per finding from .fix.transform_name + if [ -n "$AID" ]; then + FINDING_IDS=$(atx ct findings list --analysis-id "$AID" --json \ + | jq -r '.[] | select(.fix != null and .status == "open") | .id' \ + | paste -sd, -) + else + FINDING_IDS=$(atx ct findings list --source "$LOGICAL_SOURCE_NAME" --json \ + | jq -r '.[] | select(.fix != null and .status == "open") | .id' \ + | paste -sd, -) + fi + + if [ -z "$FINDING_IDS" ]; then + echo "" + echo "❌ No auto-remediable findings (fix != null, status == open) on source '$LOGICAL_SOURCE_NAME'." + echo "" + echo "Two options:" + echo " 1. Run a fresh tech-debt-quick analysis to surface auto-fixable findings:" + echo " atx ct analysis run --type tech-debt-quick --source $LOGICAL_SOURCE_NAME --wait" + echo " 2. Set TRANSFORMATION_NAME explicitly to remediate findings WITHOUT .fix populated" + echo " (e.g., from a tech-debt-comprehensive analysis). Then capture by category instead:" + echo " FINDING_IDS=\$(atx ct findings list --analysis-id <AID> --json \\" + echo " | jq -r '.[] | select(.category == \"Java\") | .id' | paste -sd, -)" + echo " TRANSFORMATION_NAME=\"AWS/java-version-upgrade\"" + return 1 + fi + + else + # ── Hybrid capture: any findings, override transformation explicitly ── + # Customer specifies TRANSFORMATION_NAME, so .fix is not required. + # Filter by category/severity/repo as needed (must produce a coherent group + # the chosen transformation applies to). + if [ -z "$AID" ]; then + echo "ERROR: TRANSFORMATION_NAME requires AID (analysis ID) so we can scope finding capture" + return 1 + fi + # Default: capture ALL open findings under the analysis. Customer should + # narrow this by category/repo for a coherent transformation target. + FINDING_IDS=$(atx ct findings list --analysis-id "$AID" --json \ + | jq -r '.[] | select(.status == "open") | .id' \ + | paste -sd, -) + + if [ -z "$FINDING_IDS" ]; then + echo "❌ No open findings under analysis '$AID'" + return 1 + fi + fi + + COUNT=$(echo "$FINDING_IDS" | tr ',' '\n' | wc -l | tr -d ' ') + if [ -n "$TRANSFORMATION_NAME" ]; then + echo "✅ Captured $COUNT finding IDs to remediate with transformation: $TRANSFORMATION_NAME" + else + echo "✅ Captured $COUNT auto-remediable finding IDs (each will use its own .fix.transform_name)" + fi + echo "First 3: $(echo $FINDING_IDS | cut -d, -f1-3)..." +fi +``` + +**How to choose TRANSFORMATION_NAME for hybrid mode:** + +When `TRANSFORMATION_NAME` is needed (findings without `.fix`), the agent should: + +1. List the finding categories present in the captured set: + + ```bash + atx ct findings list --analysis-id "$AID" --json \ + | jq -r '[.[] | .category] | unique | .[]' + ``` + +2. Match category to a known transformation. Common mappings: + + | Finding category/title | Likely TRANSFORMATION_NAME | + | ------------------------------------ | ------------------------------------ | + | "Java" / "Java 8" / "Java 11" | `AWS/java-version-upgrade` | + | "Python" / "Python 2" / "Python 3.6" | `AWS/python-version-upgrade` | + | "Node.js" / "Node 14" / "Node 16" | `AWS/nodejs-version-upgrade` | + | "AWS SDK" / "boto2" / "JS SDK v2" | `AWS/aws-sdk-upgrade` | + | ".NET Framework" / ".NET Core" | `AWS/dotnet-upgrade` | + | "Code Quality" / "Complexity" | (no transformation -- manual review) | + | "Deprecated APIs" (mixed) | varies -- match to specific upgrade | + +3. **Filter `FINDING_IDS` to only the matching category** -- applying one transformation to mixed findings is incorrect. Re-run the capture with `select(.category == "Java")` or whatever matches. +4. Confirm with the customer before scheduling. + +##### Sub-mode: transformation + +Customer provides a transformation + repo. No findings discovery needed. + +```bash +if [ "$JOB_TYPE" = "remediation" ] && [ "$REMEDIATION_MODE" = "transformation" ]; then + # Required: transformation name and at least one repo + TRANSFORMATION_NAME="AWS/java-version-upgrade" + REPO_FILTER="--repo my-org-github::my-java-repo" # required (single repo) or comma-separated for multiple + REMEDIATION_CONFIG="" # optional, becomes the `-g` flag value + + [ -z "$TRANSFORMATION_NAME" ] && { echo "ERROR: REMEDIATION_MODE=transformation requires TRANSFORMATION_NAME"; return 1; } + [ -z "$REPO_FILTER" ] && { echo "ERROR: REMEDIATION_MODE=transformation requires --repo (REPO_FILTER)"; return 1; } +fi +``` + +If the customer hasn't run `atx ct source add` yet, hand off to [continuous-modernization-source](workload-continuous-modernization-source.md) BEFORE creating the schedule. Saves confusion when the schedule fires and fails silently. + +## Step 5: Construct and Create the Schedule + +The schedule's payload depends on `PATH_TYPE` and `JOB_TYPE`: + +- **EC2 path**: a wrapper script base64-encoded inside an SSM SendCommand. Same pattern as the EC2 skill's `build_command_*()` (avoids quoting issues when the SSM payload contains nested quotes). +- **Batch path**: a JSON payload that the EventBridge scheduler passes directly to the `atx-trigger-batch-jobs` Lambda. The Lambda submits a Fargate job with the command baked in. + +The wrapper/command body branches on `JOB_TYPE`: + +- `analysis` -- runs `atx ct analysis run` +- `remediation` -- runs `atx ct remediation create` (findings or transformation sub-mode), polls until terminal, optionally uploads (only for `local` provider) + +### Build common pieces (used by both paths) + +```bash +# For analysis mode: extra flags for --type custom +EXTRA_FLAGS="" +if [ "$JOB_TYPE" = "analysis" ] && [ "$ANALYSIS_TYPE" = "custom" ]; then + [ -z "$TRANSFORMATION_NAME" ] && { echo "ERROR: --type custom requires TRANSFORMATION_NAME"; return 1; } + EXTRA_FLAGS="--transformation-name $TRANSFORMATION_NAME" + [ -n "$CONFIGURATION" ] && EXTRA_FLAGS="$EXTRA_FLAGS -g \"$CONFIGURATION\"" +fi + +# For remediation+transformation mode: extra flags +REMED_FLAGS="" +if [ "$JOB_TYPE" = "remediation" ] && [ "$REMEDIATION_MODE" = "transformation" ]; then + REMED_FLAGS="--transformation-name $TRANSFORMATION_NAME $REPO_FILTER" + [ -n "$REMEDIATION_CONFIG" ] && REMED_FLAGS="$REMED_FLAGS -g \"$REMEDIATION_CONFIG\"" +fi +``` + +### EC2 path: build SSM SendCommand target + +The EC2 wrapper is base64'd through SSM, so we can use full bash idioms (`$()`, `select(...)`, etc.) -- the Lambda allowlist does NOT apply here. + +```bash +if [ "$PATH_TYPE" = "ec2" ]; then + # ───────────────────────────────────────────────────────────────── + # Resolve target worker container (multi-worker stacks) + # ───────────────────────────────────────────────────────────────── + # WorkerCount comes from the CFN stack parameter (defaults to 1 for legacy stacks + # that don't have the parameter). WORKER_NUM is the 1-indexed worker to schedule + # against (defaults to 1). For WorkerCount=1, container is "atx-ct" (existing + # behavior). For WorkerCount>1, container is "atx-ct-${WORKER_NUM}". + WORKER_COUNT=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION \ + --query 'Stacks[0].Parameters[?ParameterKey==`WorkerCount`].ParameterValue' --output text 2>/dev/null) + WORKER_COUNT=$(echo "$WORKER_COUNT" | xargs) # strip whitespace defensively + [ -z "$WORKER_COUNT" ] || [ "$WORKER_COUNT" = "None" ] && WORKER_COUNT=1 + WORKER_NUM="${WORKER_NUM:-1}" + if [ "$WORKER_COUNT" -eq 1 ]; then + CONTAINER_NAME="atx-ct" + else + if [ "$WORKER_NUM" -lt 1 ] || [ "$WORKER_NUM" -gt "$WORKER_COUNT" ]; then + echo "ERROR: WORKER_NUM ($WORKER_NUM) must be 1-${WORKER_COUNT} for this multi-worker stack." >&2 + return 1 + fi + CONTAINER_NAME="atx-ct-${WORKER_NUM}" + fi + echo "Targeting container: $CONTAINER_NAME (worker $WORKER_NUM of $WORKER_COUNT)" + + # ───────────────────────────────────────────────────────────────── + # Build the wrapper script body based on JOB_TYPE + # ───────────────────────────────────────────────────────────────── + + if [ "$JOB_TYPE" = "analysis" ]; then + # Analysis mode: skip upload for tech-debt-quick (read-only) + UPLOAD_LINE="sudo docker exec ${CONTAINER_NAME} /app/upload-ct-artifacts.sh \$AID atx-ct-output-${ACCOUNT_ID}" + [ "$ANALYSIS_TYPE" = "tech-debt-quick" ] && UPLOAD_LINE='echo "[skip upload -- tech-debt-quick is read-only]"' + + SCRIPT_BODY=$(cat <<EOF +#!/bin/bash +LOG=/tmp/atxct-sched-\$(date +%s).log +echo "=== \$(date) [START] scheduled $ANALYSIS_TYPE analysis on $LOGICAL_SOURCE_NAME ===" >> \$LOG + +sudo docker exec ${CONTAINER_NAME} bash -c "source /home/atxuser/.nvm/nvm.sh && nvm use 22 >/dev/null 2>&1 && export PATH=/home/atxuser/.local/bin:\\\$PATH && atx ct analysis run --type $ANALYSIS_TYPE $EXTRA_FLAGS --source $LOGICAL_SOURCE_NAME $REPO_FILTER --wait --telemetry \"agent=${AGENT},executionMode=ec2\"" >> \$LOG 2>&1 +ANALYSIS_RC=\$? + +AID=\$(grep -oE '01[A-Z0-9]+' \$LOG | head -1) +[ -n "\$AID" ] && echo "ANALYSIS_STARTED: \$AID" # to stdout for 'aws ssm get-command-invocation' visibility (regardless of pass/fail) + +if [ \$ANALYSIS_RC -ne 0 ]; then + echo "=== \$(date) [ERROR] analysis failed (rc=\$ANALYSIS_RC, AID=\$AID) ===" >> \$LOG + exit \$ANALYSIS_RC +fi + +[ -z "\$AID" ] && { echo "=== \$(date) [ERROR] success but no AID extracted ===" >> \$LOG; exit 1; } +echo "=== \$(date) [DONE] analysis \$AID ===" >> \$LOG + +$UPLOAD_LINE >> \$LOG 2>&1 +echo "=== \$(date) [DONE] upload ===" >> \$LOG +EOF +) + + elif [ "$JOB_TYPE" = "remediation" ]; then + # Remediation mode: build the create command, poll until terminal, + # upload artifacts only for local provider (github/gitlab/bitbucket + # push results to source repo automatically). + + # Build the remediation create line based on sub-mode + if [ "$REMEDIATION_MODE" = "findings" ]; then + [ -z "$FINDING_IDS" ] && { echo "ERROR: REMEDIATION_MODE=findings requires FINDING_IDS"; return 1; } + # Optional --transformation-name override (when findings don't have .fix populated, e.g. comprehensive) + REMED_TRANSFORM_FLAG="" + [ -n "$TRANSFORMATION_NAME" ] && REMED_TRANSFORM_FLAG=" --transformation-name $TRANSFORMATION_NAME" + REMED_CREATE_LINE="atx ct remediation create --ids $FINDING_IDS$REMED_TRANSFORM_FLAG --name $SCHEDULE_NAME-rem" + elif [ "$REMEDIATION_MODE" = "transformation" ]; then + REMED_CREATE_LINE="atx ct remediation create $REMED_FLAGS --name $SCHEDULE_NAME-rem" + else + echo "ERROR: REMEDIATION_MODE must be 'findings' or 'transformation'" + return 1 + fi + + # --local flag for local provider (github/gitlab/bitbucket: backend pushes to source repo) + [ "$PROVIDER" = "local" ] && REMED_CREATE_LINE="$REMED_CREATE_LINE --local" + + # Upload only for local provider + UPLOAD_REMED_LINE='echo "[skip upload -- github/gitlab/bitbucket pushes results to source repo]"' + [ "$PROVIDER" = "local" ] && UPLOAD_REMED_LINE="sudo docker exec ${CONTAINER_NAME} /app/upload-ct-artifacts.sh \$RID atx-ct-output-${ACCOUNT_ID}" + + SCRIPT_BODY=$(cat <<EOF +#!/bin/bash +LOG=/tmp/atxct-sched-\$(date +%s).log +echo "=== \$(date) [START] scheduled remediation ($REMEDIATION_MODE) on $LOGICAL_SOURCE_NAME ===" >> \$LOG + +sudo docker exec ${CONTAINER_NAME} bash -c "source /home/atxuser/.nvm/nvm.sh && nvm use 22 >/dev/null 2>&1 && export PATH=/home/atxuser/.local/bin:\\\$PATH && $REMED_CREATE_LINE --telemetry \"agent=${AGENT},executionMode=ec2\"" >> \$LOG 2>&1 +CREATE_RC=\$? + +RID=\$(grep -oE '01[A-Z0-9]+' \$LOG | tail -1) +[ -n "\$RID" ] && echo "REMEDIATION_STARTED: \$RID" # to stdout for SSM visibility (regardless of pass/fail) + +if [ \$CREATE_RC -ne 0 ]; then + echo "=== \$(date) [ERROR] remediation create failed (rc=\$CREATE_RC, RID=\$RID) ===" >> \$LOG + exit \$CREATE_RC +fi + +[ -z "\$RID" ] && { echo "=== \$(date) [ERROR] success but no RID extracted ===" >> \$LOG; exit 1; } +echo "=== \$(date) [REMED] remediation \$RID started -- polling status ===" >> \$LOG + +# Poll every 30s until terminal status (atx ct remediation create does not support --wait) +STATUS="" +while true; do + STATUS=\$(sudo docker exec ${CONTAINER_NAME} bash -c "source /home/atxuser/.nvm/nvm.sh && nvm use 22 >/dev/null 2>&1 && export PATH=/home/atxuser/.local/bin:\\\$PATH && atx ct remediation status --id \$RID --json" 2>>\$LOG | jq -r .status 2>/dev/null) + case "\$STATUS" in + complete|completed|failed|cancelled) + echo "=== \$(date) [REMED] remediation \$RID terminal: \$STATUS ===" >> \$LOG + break + ;; + esac + sleep 30 +done + +$UPLOAD_REMED_LINE >> \$LOG 2>&1 +echo "=== \$(date) [DONE] remediation flow complete (status=\$STATUS) ===" >> \$LOG + +# Exit non-zero if remediation didn't complete successfully (so the SSM invocation reports failure) +[ "\$STATUS" != "complete" ] && exit 1 +exit 0 +EOF +) + else + echo "ERROR: JOB_TYPE must be 'analysis' or 'remediation' (got: '$JOB_TYPE')" + return 1 + fi + + # ───────────────────────────────────────────────────────────────── + # Encode and submit via SSM SendCommand + # ───────────────────────────────────────────────────────────────── + + # Encode the script body -- base64 chars (A-Za-z0-9+/=) survive any quoting layer + B64=$(echo "$SCRIPT_BODY" | base64 | tr -d '\n') + + # The command the schedule fires on the instance: decode the script and run it. + COMMAND="echo $B64 | base64 -d > /tmp/atxct-sched.sh && bash /tmp/atxct-sched.sh" + + # SSM SendCommand timeout: 4h for analysis-only, 8h if remediation involved. + # Bump for source-level comprehensive analyses on many repos. + TIMEOUT=14400 + [ "$JOB_TYPE" = "remediation" ] && TIMEOUT=28800 + + INPUT_JSON=$(jq -n \ + --arg id "$INSTANCE_ID" \ + --arg cmd "$COMMAND" \ + --argjson timeout $TIMEOUT \ + '{InstanceIds: [$id], DocumentName: "AWS-RunShellScript", TimeoutSeconds: $timeout, Parameters: {commands: [$cmd]}}') + + TARGET=$(jq -n \ + --arg arn "arn:aws:scheduler:::aws-sdk:ssm:sendCommand" \ + --arg role "arn:aws:iam::$ACCOUNT_ID:role/AtxSchedulerInvocationRole" \ + --arg input "$INPUT_JSON" \ + '{Arn: $arn, RoleArn: $role, Input: $input}') +fi +``` + +### Batch path: build Lambda Invoke target + +The Batch JOB_COMMAND must comply with the atx-trigger-batch-jobs Lambda allowlist (see [continuous-modernization-batch-execution.md Step 5](workload-continuous-modernization-batch-execution.md) for the canonical rules). Lambda rejects strings containing `$`, `^`, `()`, `{}`, `*`, backticks, or non-ASCII characters (em-dashes, en-dashes, smart quotes, and any other non-ASCII punctuation). + +We build the JOB_COMMAND in three pieces: + +1. **Provider preamble** (per-provider source/token setup) -- same across job types +2. **Job body** (analysis run OR remediation create) -- branches on `JOB_TYPE` +3. **Trailing chain** (poll, upload) -- per-mode, with provider-specific upload suffix + +```bash +if [ "$PATH_TYPE" = "batch" ]; then + # ───────────────────────────────────────────────────────────────── + # Build the JOB_BODY based on JOB_TYPE × REMEDIATION_MODE × PROVIDER + # ───────────────────────────────────────────────────────────────── + + # Skip analysis-artifact upload for tech-debt-quick (read-only) + ANALYSIS_UPLOAD_SUFFIX="" + if [ "$JOB_TYPE" = "analysis" ] && [ "$ANALYSIS_TYPE" != "tech-debt-quick" ]; then + ANALYSIS_UPLOAD_SUFFIX=" && grep -oE '01[A-Z0-9]+' /tmp/run.log | head -1 | xargs -I AID /app/upload-ct-artifacts.sh AID atx-ct-output-${ACCOUNT_ID}" + fi + + # Remediation suffix (poll status until terminal, upload only if local provider) + # Lambda-safe: no $(), no select(...), no em-dash. See allowlist constraints above. + REMED_POLL_SUFFIX=" && grep -oE '01[A-Z0-9]+' /tmp/rem.log | tail -1 > /tmp/rid.txt && while true ; do cat /tmp/rid.txt | xargs -I RID atx ct remediation status --id RID > /tmp/status.txt ; grep -qE 'complete|completed|failed|cancelled' /tmp/status.txt && break ; sleep 30 ; done" + + REMED_UPLOAD_SUFFIX="" + if [ "$PROVIDER" = "local" ]; then + REMED_UPLOAD_SUFFIX=" && cat /tmp/rid.txt | xargs -I RID /app/upload-ct-artifacts.sh RID atx-ct-output-${ACCOUNT_ID}" + fi + + # Build the JOB_BODY (the "do the work" part of the command) + if [ "$JOB_TYPE" = "analysis" ]; then + JOB_BODY="atx ct analysis run --type ${ANALYSIS_TYPE} ${EXTRA_FLAGS} --source ${LOGICAL_SOURCE_NAME} ${REPO_FILTER} --wait --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/run.log${ANALYSIS_UPLOAD_SUFFIX}" + elif [ "$JOB_TYPE" = "remediation" ]; then + # --local flag only for local provider + LOCAL_FLAG="" + [ "$PROVIDER" = "local" ] && LOCAL_FLAG=" --local" + + if [ "$REMEDIATION_MODE" = "findings" ]; then + [ -z "$FINDING_IDS" ] && { echo "ERROR: REMEDIATION_MODE=findings requires FINDING_IDS"; return 1; } + # Optional --transformation-name override (when findings don't have .fix populated, e.g. comprehensive) + REMED_TRANSFORM_FLAG="" + [ -n "$TRANSFORMATION_NAME" ] && REMED_TRANSFORM_FLAG=" --transformation-name $TRANSFORMATION_NAME" + # NOTE: remediation name uses <aws.scheduler.scheduled-time> for uniqueness on recurring schedules + JOB_BODY="atx ct remediation create --ids ${FINDING_IDS}${REMED_TRANSFORM_FLAG} --name \"${SCHEDULE_NAME}-rem-<aws.scheduler.scheduled-time>\"${LOCAL_FLAG} --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/rem.log${REMED_POLL_SUFFIX}${REMED_UPLOAD_SUFFIX}" + elif [ "$REMEDIATION_MODE" = "transformation" ]; then + JOB_BODY="atx ct remediation create ${REMED_FLAGS} --name \"${SCHEDULE_NAME}-rem-<aws.scheduler.scheduled-time>\"${LOCAL_FLAG} --telemetry \"agent=${AGENT},executionMode=fargate\" 2>&1 | tee /tmp/rem.log${REMED_POLL_SUFFIX}${REMED_UPLOAD_SUFFIX}" + else + echo "ERROR: REMEDIATION_MODE must be 'findings' or 'transformation'" + return 1 + fi + else + echo "ERROR: JOB_TYPE must be 'analysis' or 'remediation' (got: '$JOB_TYPE')" + return 1 + fi + + # ───────────────────────────────────────────────────────────────── + # Provider-specific preamble (sets up source registration, tokens) + # ───────────────────────────────────────────────────────────────── + PREAMBLE_COMMON="atx ct --version > /dev/null 2>&1 ; set -o pipefail && source /home/atxuser/.bashrc && export PATH=/home/atxuser/.local/bin:/usr/local/bin:/usr/bin:/bin && source /home/atxuser/.nvm/nvm.sh && nvm use 22 ; mkdir -p /home/atxuser/.aws/atx/logs ; atx ct server > /home/atxuser/.aws/atx/logs/server.log 2>&1 & sleep 15" + + case "$PROVIDER" in + github) + PREAMBLE="${PREAMBLE_COMMON} ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id atx/github-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/github_token" + ;; + gitlab) + PREAMBLE="${PREAMBLE_COMMON} ; mkdir -p /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME} && aws secretsmanager get-secret-value --secret-id atx/gitlab-token --query SecretString --output text > /home/atxuser/.atxct/sources/${LOGICAL_SOURCE_NAME}/gitlab_token" + ;; + local) + [ -z "$ZIP_NAME" ] && { echo "ERROR: Batch + local provider requires ZIP_NAME"; return 1; } + PREAMBLE="${PREAMBLE_COMMON} ; mkdir -p /home/atxuser/repos && aws s3 cp s3://atx-source-code-${ACCOUNT_ID}/repos/${ZIP_NAME}.zip /tmp/${ZIP_NAME}.zip && unzip -q /tmp/${ZIP_NAME}.zip -d /home/atxuser/repos/ && atx ct discovery scan --source ${LOGICAL_SOURCE_NAME} --path /home/atxuser/repos" + ;; + *) + echo "ERROR: PROVIDER must be github, gitlab, or local (got: '$PROVIDER')" + return 1 + ;; + esac + + # Combine preamble + job body + JOB_COMMAND="${PREAMBLE} && ${JOB_BODY}" + + # Build Lambda payload -- the schema atx-trigger-batch-jobs expects. + # batchName uses <aws.scheduler.scheduled-time> for uniqueness on recurring fires. + LAMBDA_PAYLOAD=$(jq -nc \ + --arg cmd "$JOB_COMMAND" \ + --arg base "${SCHEDULE_NAME}" \ + '{batchName: ($base + "-<aws.scheduler.scheduled-time>"), jobs: [{command: $cmd, jobName: ($base + "-job")}]}') + + # Schedule's target -- direct Lambda Invoke + TARGET=$(jq -n \ + --arg arn "arn:aws:lambda:$REGION:$ACCOUNT_ID:function:atx-trigger-batch-jobs" \ + --arg role "arn:aws:iam::$ACCOUNT_ID:role/AtxSchedulerInvocationRole" \ + --arg input "$LAMBDA_PAYLOAD" \ + '{Arn: $arn, RoleArn: $role, Input: $input}') +fi +``` + +### Create the schedule (executor) + +`scheduler:CreateSchedule` is in the executor policy, scoped to the `atx-control-tower` group. `iam:PassRole` on `AtxSchedulerInvocationRole` is also in the executor policy (scoped to `iam:PassedToService=scheduler.amazonaws.com`). The agent runs this directly: + +```bash +aws --profile $PROFILE --region $REGION scheduler create-schedule \ + --name "$SCHEDULE_NAME" \ + --group-name atx-control-tower \ + --schedule-expression "$CRON_EXPR" \ + --schedule-expression-timezone "$TIMEZONE" \ + --flexible-time-window '{"Mode":"OFF"}' \ + --target "$TARGET" \ + --action-after-completion NONE +``` + +No admin handoff needed for routine scheduling -- the one-time IAM setup in Step 3 (admin handoff) provisioned `AtxSchedulerInvocationRole` and the `atx-control-tower` group, and the executor's `iam:PassRole` is bounded to that role only. The schedule's target can therefore only invoke what `AtxSchedulerInvocationRole` is allowed to invoke (which admin scoped to `ssm:SendCommand` on tagged instances or `lambda:Invoke` on `atx-trigger-batch-jobs`). Privilege surface is unchanged from what admin pre-vetted. + +**Why the upload step matters:** Without the trailing `/app/upload-ct-artifacts.sh` call, scheduled analyses leave findings in the backend (queryable via `atx ct findings list`) but don't upload `code.zip` artifacts to S3. For analysis types that produce working-tree changes (tech-debt-comprehensive, security, agentic-readiness, modernization-readiness), the customer typically wants the artifacts for `git diff` review -- so the upload step is essential. tech-debt-quick is read-only (no working-tree changes), so its upload is intentionally skipped. + +**EventBridge contextual variables:** `<aws.scheduler.scheduled-time>` in the Batch payload is replaced by EventBridge at fire time with the scheduled fire timestamp. This makes each batch's name unique, so `atx-get-batch-status` and `atx-terminate-batch-jobs` can target a specific firing if needed. See [AWS Scheduler context attributes](https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-schedule-context-attributes.html). + +## Step 6: Verify and Report + +```bash +# Confirm the schedule exists and show next firing time +aws --profile $PROFILE --region $REGION scheduler get-schedule \ + --name "$SCHEDULE_NAME" \ + --group-name atx-control-tower \ + --query '{Name:Name, State:State, Cron:ScheduleExpression, TZ:ScheduleExpressionTimezone}' \ + --output table + +echo "" +echo "Schedule '$SCHEDULE_NAME' created." +echo "" +echo "What it does:" +echo " Cron: $CRON_EXPR ($TIMEZONE)" +echo " Targets instance: $INSTANCE_ID" +echo " Runs: $COMMAND" +echo "" +echo "Manage your schedules:" +echo " List: aws --region $REGION scheduler list-schedules --group-name atx-control-tower" +echo " Get: aws --region $REGION scheduler get-schedule --name $SCHEDULE_NAME --group-name atx-control-tower" +echo " Disable: aws --region $REGION scheduler update-schedule --name $SCHEDULE_NAME --group-name atx-control-tower --state DISABLED [...]" +echo " Delete: aws --region $REGION scheduler delete-schedule --name $SCHEDULE_NAME --group-name atx-control-tower" +echo "" +echo "AWS Console: Amazon EventBridge → Schedules → atx-control-tower" +``` + +## Quick Test (Optional, Recommended) + +For first-time setup, fire a one-off SendCommand manually to confirm the SSM path works before relying on the schedule: + +```bash +TEST_COMMAND_ID=$(aws --profile $PROFILE --region $REGION ssm send-command \ + --instance-ids "$INSTANCE_ID" \ + --document-name "AWS-RunShellScript" \ + --parameters "commands=[\"sudo docker exec ${CONTAINER_NAME} date\"]" \ + --timeout-seconds 60 \ + --query 'Command.CommandId' --output text) + +sleep 10 + +aws --profile $PROFILE --region $REGION ssm get-command-invocation \ + --command-id "$TEST_COMMAND_ID" \ + --instance-id "$INSTANCE_ID" \ + --query '{Status:Status, Output:StandardOutputContent, Error:StandardErrorContent}' \ + --output table +``` + +If `Status: Success` with the current date in `Output`, the SSM path works and the schedule will fire correctly. We use plain `date` here (not `atx ct status`) so the test isolates the SSM mechanism from any backend auth or CT server issues -- those are validated separately when the actual scheduled `atx ct analysis run` fires. + +If `Status: Failed` or `TimedOut`, debug: + +- SSM agent reachable? `aws ssm describe-instance-information --filters "Key=InstanceIds,Values=$INSTANCE_ID"` +- Container running? `aws ssm send-command --instance-ids $INSTANCE_ID --document-name AWS-RunShellScript --parameters 'commands=["sudo docker ps | grep atx-ct"]'` +- Instance role has `AmazonSSMManagedInstanceCore`? `aws iam list-attached-role-policies --role-name "${ROLE_ARN##*/}"` (using the role ARN discovered in Step 1) + +## Schedule Management + +### List schedules + +```bash +aws --profile $PROFILE --region $REGION scheduler list-schedules \ + --group-name atx-control-tower \ + --query 'Schedules[].{Name:Name, State:State, NextRun:Target.RoleArn}' \ + --output table +``` + +### Disable temporarily (executor) + +`scheduler:UpdateSchedule` is in the executor policy -- the agent runs this directly. + +```bash +# Get current target then disable +TARGET=$(aws --profile $PROFILE --region $REGION scheduler get-schedule \ + --name "$SCHEDULE_NAME" --group-name atx-control-tower \ + --query 'Target' --output json) + +aws --profile $PROFILE --region $REGION scheduler update-schedule \ + --name "$SCHEDULE_NAME" \ + --group-name atx-control-tower \ + --state DISABLED \ + --schedule-expression "$CRON_EXPR" \ + --schedule-expression-timezone "$TIMEZONE" \ + --flexible-time-window '{"Mode":"OFF"}' \ + --target "$TARGET" +``` + +### Re-enable (executor) + +Same as Disable but with `--state ENABLED`. + +### Delete permanently (executor) + +`scheduler:DeleteSchedule` is in the executor policy, scoped to the `atx-control-tower` group. The agent runs this directly: + +```bash +aws --profile $PROFILE --region $REGION scheduler delete-schedule \ + --name "$SCHEDULE_NAME" \ + --group-name atx-control-tower +``` + +### View invocation history (CloudWatch) + +EventBridge Scheduler logs invocations to CloudWatch. To inspect: + +```bash +aws --profile $PROFILE --region $REGION logs tail /aws/events/scheduler --since 7d +``` + +To inspect what the EC2 instance actually ran on a given schedule fire: + +```bash +# List recent SSM command invocations against the instance +aws --profile $PROFILE --region $REGION ssm list-command-invocations \ + --instance-id "$INSTANCE_ID" \ + --query 'CommandInvocations[].{Time:RequestedDateTime, Status:Status, CommandId:CommandId}' \ + --output table + +# Inspect the output of one +aws --profile $PROFILE --region $REGION ssm get-command-invocation \ + --command-id <CommandId> \ + --instance-id "$INSTANCE_ID" \ + --query '{Status:Status, Output:StandardOutputContent, Error:StandardErrorContent}' \ + --output table +``` + +## Edge Cases + +| Scenario | What Happens | Mitigation | +| ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | +| EC2 instance stopped at fire time | SSM SendCommand fails, schedule retries 2x with exponential backoff, then waits for next fire | Customer keeps instance running; or document instance-start step | +| Container `atx-ct` not running | `docker exec` fails inside the SSM command, returns non-zero | Step 10 of continuous-modernization-ec2-execution adds `--restart unless-stopped` so container survives reboot. Customer can verify with `docker ps` | +| Instance rebooted | If `--restart unless-stopped` set on docker run: container auto-starts. If not: container is gone, schedule fails | Always use `--restart unless-stopped` (now baked into Step 10) | +| Two schedules fire simultaneously | Both `docker exec` calls arrive; CT server may serialize internally | Avoid overlapping schedules during testing | +| Customer terminates instance, schedule remains | SSM SendCommand fails silently against the orphaned instance ID | Skill should warn during cleanup; or explicitly delete schedules before terminating instance | +| Schedule fires but command runs longer than `TimeoutSeconds` | Command is killed at the timeout; partial findings persist if CT pushed any | Bump `TimeoutSeconds` (default 14400 = 4h here, max 172800 = 48h) | +| SSM agent loses connection mid-run | Command may be marked Failed in SSM but actually ran to completion on the instance | Verify findings via `atx ct findings list` before assuming the schedule failed | + +## Why SSM Instead of SSH + +- SSH requires a key pair, public IP, and port 22 inbound -- Scheduler can't authenticate over SSH +- SSM uses outbound HTTPS only; agent is pre-installed on Amazon Linux 2023 +- SSM Send-command is an AWS API call, which Scheduler natively supports +- IAM-controlled (no key management) + +## Pricing + +EventBridge Scheduler: free tier covers 14 million invocations/month (so cron-style schedules are effectively free). Beyond that, see [AWS pricing](https://aws.amazon.com/eventbridge/pricing/). + +SSM SendCommand: no charge for the API call itself; you pay for whatever the EC2 instance does at runtime. + +The EC2 instance and other costs are unchanged from the manual analysis flow -- see [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md) for those. + +## Related Skills + +- [continuous-modernization-ec2-execution](workload-continuous-modernization-ec2-execution.md) -- sets up the EC2 instance + container that this schedule reuses +- [continuous-modernization-analysis](workload-continuous-modernization-analysis.md) -- the underlying `atx ct analysis run` command details +- [continuous-modernization-status](workload-continuous-modernization-status.md) -- check what analyses exist after a schedule fires +- [continuous-modernization-findings](workload-continuous-modernization-findings.md) -- query findings produced by the scheduled analysis diff --git a/aws-transform/steering/workload-continuous-modernization-security-agent.md b/aws-transform/steering/workload-continuous-modernization-security-agent.md new file mode 100644 index 00000000..06ce74b8 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-security-agent.md @@ -0,0 +1,278 @@ +--- +name: security-agent-setup +description: Set up and use the security agent for vulnerability scanning. Covers admin setup (manual terminal commands) and executor runtime (agent-driven analysis). Replaces the inline security agent steps in EC2/Batch execution skills. +--- + +# Security Agent Setup + +This skill covers the security agent lifecycle with a clear split between **admin** (infrastructure provisioning) and **executor** (runtime analysis) roles. + +## ⚠️ MANDATORY: Permission Consent (MUST be first interaction) + +**CRITICAL: Before ANY security agent setup or analysis steps, present this consent message and wait for a response.** + +"To run security analysis, the executor role needs access to: SecurityAgent APIs (for code review and findings), the security agent S3 bucket (for uploading source code to scan), and iam:PassRole for the security agent role. Do you have these permissions configured?" + +- If the customer says **yes** → proceed with the executor flow. +- If the customer says **no** → respond with: "If you don't have sufficient permissions you may encounter errors during the flow. Your administrator can set up the required resources using the Admin Setup commands below." Then proceed with the workflow. + +**Record the customer's response** -- if they later file a bug about permission errors, we refer to their choice here. + +--- + +## Admin Setup (Manual Terminal Commands) + +**These commands create IAM roles and deploy CloudFormation stacks, so they require admin/role-creation permissions (`iam:CreateRole`, `iam:PutRolePolicy`, `iam:PassRole`, `cloudformation:CreateChangeSet`). Run them with an admin identity. Read-only or runtime credentials are enough for everything afterward.** + +**The agent MUST NOT execute these commands using agentic tools. Instead, present them as instructions for the customer or their administrator to copy and run.** + +The admin provisions the security agent infrastructure: an IAM role, a managed policy, and an S3 bucket, all deployed via a CloudFormation stack. + +Tell the customer: + +> "This deploys the security agent infrastructure (IAM role, S3 bucket, CloudFormation stack). It requires admin/role-creation permissions. Run it with an admin identity. Read-only or runtime credentials are enough for everything afterward." + +```bash +# Ensure atx ct is installed and up to date +INSTALLED=$(atx ct --version 2>/dev/null | head -1) +LATEST=$(curl -fsSL "https://transform-cli.awsstatic.com/index.json" 2>/dev/null | grep -o '"latest"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*"latest"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/') +echo "Installed: ${INSTALLED:-not found}, Latest: ${LATEST:-unknown}" + +# If not installed or outdated: +curl -fsSL https://transform-cli.awsstatic.com/install.sh | bash +source ~/.bashrc + +# Start the server if not running +atx ct server & +sleep 5 + +# Deploy security agent infrastructure (creates IAM role, S3 bucket, CloudFormation stack) +atx ct setup security-agent +``` + +### What Admin Setup Creates + +| Resource | Name Pattern | Purpose | +|----------|-------------|---------| +| CloudFormation stack | `kct-security-agent-<suffix>` | Manages all resources atomically | +| IAM role | `security-agent-kct-agent-space-<suffix>` | Role the security agent service assumes | +| IAM managed policy | `kct-security-agent-<suffix>` | Permissions attached to the role | +| S3 bucket | `kct-security-agent-<suffix>` | Stores source code zips for scanning | + +### Admin Setup for EC2/Batch Job Roles + +When using security analysis on EC2 or Batch, the **admin** must also attach executor permissions to the compute role. Present these commands as instructions: + +> "The compute role needs security agent permissions added. This modifies IAM policies, so it requires admin/role-creation permissions. Run these with an admin identity:" + +**For Batch (ATXBatchJobRole):** + +```bash +# Get security agent config values +SEC_BUCKET=$(jq -r '.s3Bucket' ~/.atxct/shared/security_agent_config.json) +SEC_AGENT_ROLE_ARN=$(jq -r '.role_arn // .roleArn' ~/.atxct/shared/security_agent_config.json) +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) + +# 1. Security Agent API access +aws iam put-role-policy --role-name ATXBatchJobRole \ + --policy-name AtxCtSecurityAgentAPI \ + --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"securityagent:ListAgentSpaces\",\"securityagent:CreateAgentSpace\",\"securityagent:CreateCodeReview\",\"securityagent:StartCodeReviewJob\",\"securityagent:ListCodeReviewJobsForCodeReview\",\"securityagent:ListFindings\",\"securityagent:BatchGetFindings\",\"securityagent:StartCodeRemediation\"],\"Resource\":\"arn:aws:securityagent:*:*:agent-space*\",\"Condition\":{\"StringEquals\":{\"aws:ResourceAccount\":\"${ACCOUNT_ID}\"}}}]}" + +# 2. S3 access for security agent bucket +aws iam put-role-policy --role-name ATXBatchJobRole \ + --policy-name AtxCtSecurityAgentS3Access \ + --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"s3:PutObject\",\"s3:GetObject\",\"s3:ListBucket\"],\"Resource\":[\"arn:aws:s3:::${SEC_BUCKET}\",\"arn:aws:s3:::${SEC_BUCKET}/*\"]}]}" + +# 3. PassRole for security agent role +aws iam put-role-policy --role-name ATXBatchJobRole \ + --policy-name AtxCtSecurityAgentPassRole \ + --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":\"iam:PassRole\",\"Resource\":\"${SEC_AGENT_ROLE_ARN}\",\"Condition\":{\"StringEquals\":{\"iam:PassedToService\":\"securityagent.amazonaws.com\"}}}]}" +``` + +**For EC2 (stack-managed role):** + +```bash +SEC_BUCKET=$(jq -r '.s3Bucket' ~/.atxct/shared/security_agent_config.json) +SEC_AGENT_ROLE_ARN=$(jq -r '.role_arn // .roleArn' ~/.atxct/shared/security_agent_config.json) +STACK_NAME="<the-ec2-stack-name>" +REGION="${AWS_REGION:-us-east-1}" + +ROLE_NAME=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region $REGION \ + --query 'Stacks[0].Outputs[?OutputKey==`RoleArn`].OutputValue' --output text | awk -F/ '{print $NF}') + +# 1. Security Agent API access +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +aws iam put-role-policy --role-name "$ROLE_NAME" \ + --policy-name AtxCtSecurityAgentAPI \ + --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"securityagent:ListAgentSpaces\",\"securityagent:CreateAgentSpace\",\"securityagent:CreateCodeReview\",\"securityagent:StartCodeReviewJob\",\"securityagent:ListCodeReviewJobsForCodeReview\",\"securityagent:ListFindings\",\"securityagent:BatchGetFindings\",\"securityagent:StartCodeRemediation\"],\"Resource\":\"arn:aws:securityagent:*:*:agent-space*\",\"Condition\":{\"StringEquals\":{\"aws:ResourceAccount\":\"${ACCOUNT_ID}\"}}}]}" + +# 2. S3 access to the security agent bucket +aws iam put-role-policy --role-name "$ROLE_NAME" \ + --policy-name AtxCtSecurityAgentS3Access \ + --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"s3:PutObject\",\"s3:GetObject\",\"s3:ListBucket\"],\"Resource\":[\"arn:aws:s3:::${SEC_BUCKET}\",\"arn:aws:s3:::${SEC_BUCKET}/*\"]}]}" + +# 3. PassRole for security agent role +aws iam put-role-policy --role-name "$ROLE_NAME" \ + --policy-name AtxCtSecurityAgentPassRole \ + --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":\"iam:PassRole\",\"Resource\":\"${SEC_AGENT_ROLE_ARN}\",\"Condition\":{\"StringEquals\":{\"iam:PassedToService\":\"securityagent.amazonaws.com\"}}}]}" +``` + +### Check Admin Setup Status + +```bash +atx ct setup security-agent --status +``` + +Returns: `configured`, `setup_in_progress`, `failed`, or `not_configured`. + +### Delete (Teardown) + +```bash +atx ct setup security-agent --delete +``` + +--- + +## Executor Flow (Agent-Driven) + +This is what the agent does at runtime after admin setup is complete. The agent MAY execute these steps using agentic tools. + +### Step 1: Verify Security Agent is Configured + +Check that the security agent config file exists: + +```bash +cat ~/.atxct/shared/security_agent_config.json +``` + +**If the file does NOT exist**: Try to reconstruct it from the existing CloudFormation stack before asking the customer to re-run admin setup. This allows any team member with AWS account access to self-service without needing the original admin. + +#### Reconstruct Config from Existing Stack + +```bash +# Find the security agent stack (tagged during admin setup) +STACK_NAME=$(aws cloudformation describe-stacks \ + --query "Stacks[?Tags[?Key=='atx-remote-infra' && Value=='true']].StackName" \ + --output text --no-cli-pager --region us-east-1) + +# If not found by tag, try prefix match +if [ -z "$STACK_NAME" ]; then + STACK_NAME=$(aws cloudformation list-stacks \ + --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE \ + --query "StackSummaries[?starts_with(StackName,'kct-security-agent-')].StackName" \ + --output text --no-cli-pager --region us-east-1) +fi + +echo "Found stack: ${STACK_NAME:-none}" +``` + +**If a stack is found**, extract the config and write it locally: + +```bash +# Extract parameters and outputs from the stack +ACCOUNT_ID=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region us-east-1 \ + --query "Stacks[0].Parameters[?ParameterKey=='AccountId'].ParameterValue" --output text --no-cli-pager) +AGENT_SPACE_NAME=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region us-east-1 \ + --query "Stacks[0].Parameters[?ParameterKey=='AgentSpaceName'].ParameterValue" --output text --no-cli-pager) +S3_BUCKET=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region us-east-1 \ + --query "Stacks[0].Parameters[?ParameterKey=='S3Resource'].ParameterValue" --output text --no-cli-pager) +ROLE_ARN=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region us-east-1 \ + --query "Stacks[0].Outputs[?OutputKey=='RoleArn'].OutputValue" --output text --no-cli-pager) + +# Write the config file +mkdir -p ~/.atxct/shared +cat > ~/.atxct/shared/security_agent_config.json << EOF +{ + "agentSpaceId": "", + "agentSpaceName": "${AGENT_SPACE_NAME}", + "s3Bucket": "${S3_BUCKET}", + "roleArn": "${ROLE_ARN}", + "accountId": "${ACCOUNT_ID}", + "stackName": "${STACK_NAME}", + "configuredAt": "$(date -u +%Y-%m-%dT%H:%M:%S.000Z)" +} +EOF + +cat ~/.atxct/shared/security_agent_config.json +``` + +**If no stack is found**: Tell the customer: + +> "Security agent is not configured and no existing stack was found in this account. An administrator needs to run the initial setup:" +> +> ```bash +> atx ct setup security-agent +> ``` +> +> "Once complete, let me know and I'll continue." + +Do NOT proceed until the config file exists. + +### Step 2: Read Config Values + +```bash +SEC_BUCKET=$(jq -r '.s3Bucket' ~/.atxct/shared/security_agent_config.json) +SEC_AGENT_ROLE_ARN=$(jq -r '.role_arn // .roleArn' ~/.atxct/shared/security_agent_config.json) +AGENT_SPACE_NAME=$(jq -r '.agentSpaceName' ~/.atxct/shared/security_agent_config.json) +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +``` + +### Step 3: Verify Executor Permissions (Read-Only Check) + +For EC2/Batch compute roles, verify the required inline policies exist: + +```bash +aws iam get-role-policy --role-name <ROLE_NAME> --policy-name AtxCtSecurityAgentAPI 2>&1 +aws iam get-role-policy --role-name <ROLE_NAME> --policy-name AtxCtSecurityAgentS3Access 2>&1 +aws iam get-role-policy --role-name <ROLE_NAME> --policy-name AtxCtSecurityAgentPassRole 2>&1 +``` + +**If any returns `NoSuchEntity`**: Do NOT add the policy. Instead, tell the customer: + +> "The compute role is missing security agent permissions. This requires admin/role-creation privileges to fix. Run the following with an admin identity:" + +Then show the relevant commands from the Admin Setup section above. + +### Step 4: Sync Config to Compute (EC2 only) + +For EC2, sync the security agent config into the container(s): + +```bash +aws s3 cp ~/.atxct/shared/security_agent_config.json \ + s3://atx-source-code-${ACCOUNT_ID}/temp/security_agent_config.json + +ssm_run "aws s3 cp s3://atx-source-code-${ACCOUNT_ID}/temp/security_agent_config.json /tmp/sa.json && \ + for c in \$(sudo docker ps --filter name=atx-ct --format '{{.Names}}'); do \ + sudo docker cp /tmp/sa.json \$c:/home/atxuser/.atxct/shared/security_agent_config.json && \ + sudo docker exec \$c chown 1000:1000 /home/atxuser/.atxct/shared/security_agent_config.json; \ + done" + +aws s3 rm s3://atx-source-code-${ACCOUNT_ID}/temp/security_agent_config.json +``` + +### Step 5: Proceed with Analysis + +Once permissions are verified, proceed with the normal analysis flow using `--type security`. + +The executor IAM policy required for runtime is documented in `AWSTransformSecurityAgentExecutorAccess.json` in the ATXControlTowerPolicies package. + +--- + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| `Access denied calling Security Agent API` | Missing `AtxCtSecurityAgentAPI` policy on compute role | Admin must add the policy (see Admin Setup) | +| `s3:PutObject` access denied | Missing `AtxCtSecurityAgentS3Access` policy | Admin must add S3 policy | +| `iam:PassRole` denied | Missing `AtxCtSecurityAgentPassRole` policy | Admin must add PassRole policy | +| Config file not found | Admin setup never ran | Admin must run `atx ct setup security-agent` | +| `not_configured` status | Setup failed or never completed | Admin must re-run setup | + +--- + +## IAM Policy Reference + +| Policy | File | Purpose | Who Uses It | +|--------|------|---------|-------------| +| Full admin + executor | `AWSTransformSecurityAnalysisAccess.json` | All permissions including CFN, CreateRole, CreateBucket | Administrator (setup) | +| Executor only | `AWSTransformSecurityAgentExecutorAccess.json` | Runtime permissions only: SecurityAgent API, S3 read/upload, PassRole | Compute role (EC2/Batch job role) | diff --git a/aws-transform/steering/workload-continuous-modernization-server.md b/aws-transform/steering/workload-continuous-modernization-server.md new file mode 100644 index 00000000..98c2ff8b --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-server.md @@ -0,0 +1,57 @@ +--- +name: server +description: Start, stop, or restart the AWS Transform - continuous modernization (continuous modernization) server (`atx ct server`). +--- + +# Server + +## Supported Regions + +AWS Transform - continuous modernization is available in these regions only: + +| Region | Code | +|--------|------| +| US East (N. Virginia) | `us-east-1` | +| Europe (Frankfurt) | `eu-central-1` | +| Asia Pacific (Mumbai) | `ap-south-1` | +| Asia Pacific (Sydney) | `ap-southeast-2` | +| Asia Pacific (Tokyo) | `ap-northeast-1` | +| Europe (London) | `eu-west-2` | +| Asia Pacific (Seoul) | `ap-northeast-2` | +| Canada (Central) | `ca-central-1` | + +## Region Selection + +Before starting the server, ask the user which region they want to use if they haven't already specified one, (render this menu as plain numbered markdown text in your response and wait for the user to type a choice; do NOT route it through any structured choice/picker tool like `AskUserQuestion` in Claude Code, or any equivalent multi-select/option UI in other harnesses): + +> "Which AWS region do you want to use? AWS Transform - continuous modernization supports: us-east-1, eu-central-1, ap-south-1, ap-southeast-2, ap-northeast-1, eu-west-2, ap-northeast-2, ca-central-1." + +If the user provides a region not in the supported list, let them know it isn't supported and suggest `us-east-1` as the default: + +> "That region isn't supported by AWS Transform - continuous modernization. Would you like to use `us-east-1` (US East, N. Virginia) instead?" + +Once confirmed, set `ATX_REGION` to the chosen region. + +## Start + +```bash +AWS_REGION=$ATX_REGION atx ct server & +``` + +## Stop + +```bash +pkill -f "atx ct server" +``` + +## Restart + +```bash +pkill -f "atx ct server"; AWS_REGION=$ATX_REGION atx ct server & +``` + +## Check if running + +```bash +atx ct status --health +``` diff --git a/aws-transform/steering/workload-continuous-modernization-setup.md b/aws-transform/steering/workload-continuous-modernization-setup.md new file mode 100644 index 00000000..9220555c --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-setup.md @@ -0,0 +1,73 @@ +--- +name: setup +description: Set up/configure/provision AWS Transform - continuous modernization (continuous modernization) components — security agent, sources, infrastructure. Delegates to atx ct setup CLI. +--- + +# Setup + +## CRITICAL Prerequisites + +**Use `atx ct` (with a space) when invoking AWS Transform - continuous modernization (continuous modernization) commands.** `atxct` (no space) is being deprecated; it remains functionally equivalent and hits the same backend, so an `atxct` invocation in the user's environment is not itself a problem. Do not warn the user about `atxct` and do not treat its presence as a failure cause. + +### Step 1: Install or update `atx ct` + +Run this single command to check install status AND version in one shot: + +```bash +INSTALLED=$(atx ct --version 2>/dev/null | head -1) +LATEST=$(curl -fsSL "https://transform-cli.awsstatic.com/index.json" 2>/dev/null | grep -o '"latest"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*"latest"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/') +echo "Installed: ${INSTALLED:-not found}, Latest: ${LATEST:-unknown}" +``` + +If `INSTALLED` is empty OR `LATEST` is newer than `INSTALLED` → reinstall: + +```bash +curl -fsSL https://transform-cli.awsstatic.com/install.sh | bash +source ~/.bashrc # or ~/.zshrc +``` + +If both are the same → `atx ct` is up to date, proceed to Step 2. + +Verify: `atx ct --help` must show CT subcommands. + +### Step 2: Start the server + +The `atx ct` CLI requires a running server. Before any other command, start it: + +```bash +atx ct server & +sleep 5 +atx ct status --health +``` + +If `atx ct status --health` returns a connection error, the server isn't running. Check `atx ct server` output for errors. + +After installation, restart your shell or run `source ~/.bashrc` (or `~/.zshrc`) to update PATH. + +## Security Agent + +See [workload-continuous-modernization-security-agent.md](workload-continuous-modernization-security-agent.md) for the full security agent setup (admin) and runtime verification (executor) flow. + +Quick reference (admin commands, run manually in terminal): + +```bash +# Set up security agent +atx ct setup security-agent + +# Check status +atx ct setup security-agent --status + +# Remove +atx ct setup security-agent --delete +``` + +## Behavior + +- If `atx ct` is not installed, install it using the curl command above before proceeding. +- If `atx ct` is installed but a newer version is available, reinstall it using the same curl command. +- If already configured, returns the existing config immediately. +- If not configured, kicks off async provisioning and returns immediately. Use `--status` to check progress. +- `--status` checks current state: `configured`, `setup_in_progress`, `failed`, or `not_configured`. +- `--delete` tears down AWS resources (CloudFormation stack, S3 bucket, config). +- Requires valid AWS credentials (`aws sts get-caller-identity` must succeed). +- If credentials are expired, ask the user to refresh them first (`ada credentials update`). diff --git a/aws-transform/steering/workload-continuous-modernization-source.md b/aws-transform/steering/workload-continuous-modernization-source.md new file mode 100644 index 00000000..9737ceb3 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-source.md @@ -0,0 +1,144 @@ +--- +name: source +description: Add/list/remove source connections (GitHub org, GitLab group/user, Bitbucket workspace, local folder). List, get, update, and delete repos under sources. Filter and label groups of repos for targeted analysis. +--- + +# Source + +## Prerequisites + +Check if the server is running with `atx ct status --health`. If any command fails with a connection error, use the `server` skill to start the server. + +## Token handling + +**Never ask the user to paste or type a token into this chat.** Tokens entered into the chat are visible in the conversation transcript. + +When a source requires a token, give the user the exact one-liner to run in their own terminal — fill in the placeholders, then say: "Run this in your terminal, then tell me when it's done." + +`read -s` prompts silently in the terminal — the token is pasted directly into the terminal (not into this chat), nothing echoes, and the value is never captured in shell history. `unset TOKEN` clears it from the shell immediately after. + +After the user says "done", run `atx ct source list --json` to verify the source was added. If it appears, continue. If not, ask the user to retry. + +## Commands + +Supported provider types: `github`, `gitlab`, `bitbucket`, `local` + +When adding a source, the agent should inform the customer what PAT scopes are needed and why: + +"Your personal access token requires read access to list and scan your repositories for modernization findings, write access to push remediation branches, and pull request (or merge request) creation permissions to deliver the automated fixes for your review." + +Then show the specific scopes for their provider: + +- **GitHub:** + - Classic token: `repo` scope + - Fine-grained token: Read access to metadata (default), Read and Write access to code and pull requests +- **GitLab:** `api` scope (covers project listing, merge request creation, and git push over HTTPS). +- **Bitbucket:** `read:repository:bitbucket`, `read:account`, `write:repository:bitbucket`, `read:pullrequest:bitbucket`, `write:pullrequest:bitbucket`. + +```bash +# Add a GitHub org +# The GitHub PAT requires the `repo` scope (classic token), or for fine-grained tokens: Read access to metadata (default), Read and Write access to code and pull requests. +read -s TOKEN && atx ct source add --name <name> --provider github --org <org> --token "$TOKEN"; unset TOKEN + +# Add a GitLab group or user (gitlab.com) +# The GitLab PAT requires the `api` scope. +read -s TOKEN && atx ct source add --name <name> --provider gitlab --org <group-or-username> --token "$TOKEN"; unset TOKEN + +# Add a GitLab group or user (self-hosted) +# The GitLab PAT requires the `api` scope. +read -s TOKEN && atx ct source add --name <name> --provider gitlab --org <group-or-username> --token "$TOKEN" --url https://gitlab.example.com; unset TOKEN + +# Add a Bitbucket workspace (Cloud -- API token with scopes) +# The Bitbucket PAT requires scopes: read:repository:bitbucket, read:account, write:repository:bitbucket, read:pullrequest:bitbucket, write:pullrequest:bitbucket +read -s TOKEN && atx ct source add --name <name> --provider bitbucket --org <workspace> --token "$TOKEN" --email <bitbucket-email> --username <bitbucket-username>; unset TOKEN + +# Add a Bitbucket project (Data Center / self-hosted) +# The Bitbucket PAT requires scopes: read:repository:bitbucket, read:account, write:repository:bitbucket, read:pullrequest:bitbucket, write:pullrequest:bitbucket +read -s TOKEN && atx ct source add --name <name> --provider bitbucket --org <project-key> --token "$TOKEN" --url https://bitbucket.example.com; unset TOKEN +``` + +Add a local folder source (no token required): +```bash +atx ct source add --name <name> --provider local --path <dir> +``` + +```bash +# List sources +atx ct source list + +# Remove +atx ct source remove --name <name> +``` + +After adding a source, run `atx ct discovery scan --source <name>` to discover repos. See [continuous-modernization-discovery](workload-continuous-modernization-discovery.md). Local sources also require `--path` at scan time. + +## Provider details + +- **github**: Scans a GitHub organization or user for repositories. Requires a PAT or GitHub App. During remediation, pushes a branch and creates a Pull Request automatically — this includes **security** remediation, where the Security Agent's diff is applied and opened as a PR (`pr_open`). GitHub is the only provider that gets an auto-opened PR from a security diff; gitlab/bitbucket/local stay diff-only. +- **gitlab**: Scans a GitLab group or user for projects. Requires a PAT with `api` scope. Supports self-hosted instances via `--url` (required for self-hosted; omit for gitlab.com). During remediation, pushes a branch and creates a Merge Request automatically. If `--org` is a user (not a group), falls back to listing the user's projects. +- **bitbucket**: Scans a Bitbucket workspace (Cloud) or project (Data Center) for repositories. Cloud requires an API token with scopes (created at https://id.atlassian.com/manage-profile/security/api-tokens → "Create API token with scopes"). Required scopes: `read:repository:bitbucket`, `write:repository:bitbucket`, `read:pullrequest:bitbucket`, `write:pullrequest:bitbucket`. Also requires `--email` (Bitbucket account email, for API auth) and `--username` (Bitbucket username, for git clone/push). Data Center requires an HTTP Access Token and `--url`. During remediation, pushes a branch and creates a Pull Request automatically. +- **local**: Scans a local directory for packages. The directory path is provided at `source add` time via `--path` and stored on the source. Subsequent `discovery scan --source <name>` calls reuse the stored path automatically; pass `--path <new-dir>` only to override and update the source's stored path. Supports analysis and remediation (remediation leaves changes on a new `atx/<transform>-<timestamp>` branch per run — previous branches are never overwritten, no remote push). **Important:** `--path` must point to a parent directory that *contains* git repos as subdirectories — not to a repo itself. The scanner looks for child directories with `.git` inside them. If `--path` points directly to a repo (e.g. `/home/user/my-app` which has `.git`), the scan returns 0 repos. Use the parent instead (e.g. `/home/user/repos` which contains `my-app/`, `my-service/`, etc.). + +## Repository Commands + +```bash +# List all repos (shows slug, language, workflow status, labels) +atx ct repository list + +# Filter by source +atx ct repository list --source <name> + +# Filter by labels (AND-semantics: all specified labels must be present) +atx ct repository list --labels "team:frontend,priority:high" + +# Get a single repo +atx ct repository get --repo "<source>::<slug>" --source <source> + +# Set labels on a single repo (replace semantics) +atx ct repository update --source <source> --repo "<source>::<slug>" --labels "team:frontend,priority:high" + +# Clear all labels from a single repo +atx ct repository update --source <source> --repo "<source>::<slug>" --labels "" + +# Bulk update labels (set-union: merges with existing labels) +atx ct repository update --source <source> --repo "<slug1>,<slug2>" --labels "migration:v2" + +# Bulk update all repos under a source (set-union) +atx ct repository update --source <source> --labels "migration:v2" + +# Delete a repo +atx ct repository delete --repo "<source>::<slug>" --source <source> +``` + +## Labels + +Labels are user-defined identifiers for organizing and filtering groups of repositories. + +**Format:** Unicode letters, digits, `_./:-`. Max 63 chars per label, max 64 per repo. Colons are conventional for key:value grouping (e.g. `team:frontend`, `priority:high`). + +**Semantics:** +- `repository list --labels`: AND-filter (only repos with ALL specified labels are returned). +- `repository update` single repo: replace (new labels fully replace existing). +- `repository update` bulk (multiple repos or `--source` only): set-union (new labels merge with existing). Clearing is not supported in bulk mode. + +**Validation:** Invalid labels (bad characters, too long, duplicates, >64 count) return an error identifying the offending label and constraint. + +## Workflow: Label repos after adding a source for targeted analysis + +After adding a source and discovering repos, label a subset to scope analysis or remediation to just those repos: + +```bash +# 1. Add source and discover repos +read -s TOKEN && atx ct source add --name my-org --provider github --org acme-corp --token "$TOKEN"; unset TOKEN +atx ct discovery scan --source my-org + +# 2. Label the repos you want to analyze together +atx ct repository update --source my-org --repo "my-org::service-a,my-org::service-b" --labels "batch:java-upgrade" + +# 3. Verify the label took +atx ct repository list --labels "batch:java-upgrade" + +# 4. Use the label to scope analysis or remediation to just that group +``` + +This lets customers organize large orgs into manageable groups (by team, priority, migration wave, etc.) without creating separate sources. diff --git a/aws-transform/steering/workload-continuous-modernization-status.md b/aws-transform/steering/workload-continuous-modernization-status.md new file mode 100644 index 00000000..e9ba1a53 --- /dev/null +++ b/aws-transform/steering/workload-continuous-modernization-status.md @@ -0,0 +1,16 @@ +--- +name: status +description: Show AWS Transform - continuous modernization (continuous modernization) system overview — source count, repo count, analysis results, finding totals, remediation progress. +--- + +# Status + +## Prerequisites + +Check if the server is running with `atx ct status --health`. If any command fails with a connection error, use the `server` skill to start the server. + +## Commands + +```bash +atx ct status [--source <name>] +``` diff --git a/aws-transform/steering/workload-custom-remote-execution.md b/aws-transform/steering/workload-custom-remote-execution.md index 423dce50..3b4c26fa 100644 --- a/aws-transform/steering/workload-custom-remote-execution.md +++ b/aws-transform/steering/workload-custom-remote-execution.md @@ -88,10 +88,24 @@ If `git pull` reports a merge conflict, resolve it by keeping both the upstream changes and the user's customizations in the `CUSTOM LANGUAGES AND TOOLS` section of the Dockerfile, then commit the merge. -Ensure `prebuiltImageUri` is set in `cdk.json` (it should be set to "public.ecr.aws/d9h8z6l7/aws-transform:latest" by default). Then deploy: +Ensure `prebuiltImageUri` is set in `cdk.json` (it should be set to "public.ecr.aws/d9h8z6l7/aws-transform:latest" by default). + +**Network configuration (MANDATORY):** Before deploying, you MUST collect VPC, subnet, +and security group IDs from the user and write them into `cdk.json` context. AWS +Transform does NOT create VPCs, subnets, or NAT gateways — the user provides those. +If no suitable VPC exists, direct the user to: + +``` +cd "$ATX_INFRA_DIR" && ./create-vpc.sh +``` + +(Run from another terminal with admin credentials.) + +You MUST NOT run `setup.sh` yourself — present it for the user to run from another +terminal with admin credentials: -```bash -cd "$ATX_INFRA_DIR" && ./setup.sh +``` +! cd "$HOME/.aws/atx/custom/remote-infra" && AWS_PROFILE=<your-admin-profile> ./setup.sh ``` The setup script skips the Docker prerequisite check and container build when @@ -118,7 +132,7 @@ cd "$ATX_INFRA_DIR" && sed -i.bak 's|"prebuiltImageUri": ".*"|"prebuiltImageUri" Customize the Dockerfile (see Container Customization below), then deploy: ```bash -cd "$ATX_INFRA_DIR" && ./setup.sh +! cd "$HOME/.aws/atx/custom/remote-infra" && AWS_PROFILE=<your-admin-profile> ./setup.sh ``` This path requires Docker installed and running. First deploy takes ~5-10 minutes @@ -141,62 +155,13 @@ The teardown script handles stacks in any state, including failed rollbacks. ### Attach IAM Policies -After deployment, generate and attach the runtime policy so the caller has -permissions to invoke Lambdas, upload/download from S3, use KMS, etc.: - -```bash -cd "$ATX_INFRA_DIR" && npx ts-node generate-caller-policy.ts -``` - -This produces two JSON files in `$ATX_INFRA_DIR`: - -- `atx-runtime-policy.json` — Day-to-day operations (Lambda invoke, S3, KMS, Secrets Manager, logs) -- `atx-deployment-policy.json` — One-time CDK deploy/destroy (CloudFormation, ECR, IAM, Batch, VPC) - -Attach the runtime policy to the caller: - -```bash -ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) -CALLER_ARN=$(aws sts get-caller-identity --query Arn --output text) - -# Create the managed policy (ignore EntityAlreadyExists, fail on other errors) -if ! create_output=$(aws iam create-policy --policy-name ATXRuntimePolicy \ - --policy-document "file://$ATX_INFRA_DIR/atx-runtime-policy.json" 2>&1); then - echo "$create_output" | grep -q "EntityAlreadyExists" \ - || { echo "Failed to create policy: $create_output" >&2; exit 1; } -fi - -# Attach to the caller (handles IAM users, IAM roles, and SSO/assumed roles) -if echo "$CALLER_ARN" | grep -q ":user/"; then - IDENTITY_NAME=$(echo "$CALLER_ARN" | awk -F'/' '{print $NF}') - aws iam attach-user-policy --user-name "$IDENTITY_NAME" \ - --policy-arn "arn:aws:iam::${ACCOUNT_ID}:policy/ATXRuntimePolicy" -elif echo "$CALLER_ARN" | grep -Eq ":assumed-role/|:role/"; then - ROLE_NAME=$(echo "$CALLER_ARN" | sed 's/.*:\(assumed-\)\{0,1\}role\///' | cut -d'/' -f1) - aws iam attach-role-policy --role-name "$ROLE_NAME" \ - --policy-arn "arn:aws:iam::${ACCOUNT_ID}:policy/ATXRuntimePolicy" -fi -``` - -If the attachment fails (insufficient IAM permissions, or an SSO-managed role with -name starting with `AWSReservedSSO_`), inform the user: - -- The policy JSON is at `$ATX_INFRA_DIR/atx-runtime-policy.json` -- They need their AWS administrator to create and attach it to their identity -- For SSO users, it must be added to their IAM Identity Center permission set - -Verify the policy is working by invoking a Lambda: - -```bash -aws lambda invoke --function-name atx-list-jobs --payload '{}' \ - --cli-binary-format raw-in-base64-out /dev/stdout -``` - -If this succeeds, the runtime policy is active. If not, the attachment hasn't -taken effect yet — wait a few seconds and retry. +After deployment, direct the user to attach the executor policy to their IAM +role/user. You MUST NOT attach it yourself — tell the user to do it manually: -If the caller also needs to deploy/destroy infrastructure (not just run jobs), -repeat the above with `atx-deployment-policy.json` and policy name `ATXDeploymentPolicy`. +"Attach the executor policy at `$HOME/.aws/atx/custom/remote-infra/AWSTransformInfrastructureExecutorAccessBatch.json` +to your IAM role/user so day-to-day job submission needs only least-privilege. +This is a manual step — create an IAM policy from the JSON file and attach it +to the identity you use for running jobs." ## Lambda Function Names @@ -337,8 +302,8 @@ aws secretsmanager create-secret --name "atx/ssh-key" --secret-string "$(cat <pa Setup (requires user consent): 1. Explain which secrets will be created in their AWS account -2. Get explicit confirmation and credentials from the user -3. Create the secret(s) +2. Get explicit confirmation from the user, then give them the `create-secret` command to run in their own terminal — do not ask them to paste the credential value into this chat +3. Wait for the user to confirm they ran the command, then verify the secret exists with `aws secretsmanager describe-secret` 4. Container entrypoint auto-fetches at startup — no image rebuild needed 5. User can delete anytime: `aws secretsmanager delete-secret --secret-id "atx/github-token" --region "$REGION" --force-delete-without-recovery` @@ -393,12 +358,19 @@ Credentials are fetched from AWS Secrets Manager at container startup — never ```json [ - {"path": "/home/atxuser/.npmrc", "content": "//npm.company.com/:_authToken=TOKEN"}, - {"path": "/home/atxuser/.m2/settings.xml", "content": "<settings>...</settings>"}, - {"path": "/home/atxuser/.config/pip/pip.conf", "content": "[global]\nindex-url = https://pypi.company.com/simple"}, - {"path": "/home/atxuser/.gem/credentials", "content": "---\n:rubygems_api_key: KEY", "mode": "0600"}, - {"path": "/home/atxuser/.cargo/credentials.toml", "content": "[registry]\ntoken = \"TOKEN\""}, - {"path": "/home/atxuser/.nuget/NuGet.Config", "content": "<?xml version=\"1.0\"?>..."} + { "path": "/home/atxuser/.npmrc", "content": "//npm.company.com/:_authToken=TOKEN" }, + { "path": "/home/atxuser/.m2/settings.xml", "content": "<settings>...</settings>" }, + { + "path": "/home/atxuser/.config/pip/pip.conf", + "content": "[global]\nindex-url = https://pypi.company.com/simple" + }, + { + "path": "/home/atxuser/.gem/credentials", + "content": "---\n:rubygems_api_key: KEY", + "mode": "0600" + }, + { "path": "/home/atxuser/.cargo/credentials.toml", "content": "[registry]\ntoken = \"TOKEN\"" }, + { "path": "/home/atxuser/.nuget/NuGet.Config", "content": "<?xml version=\"1.0\"?>..." } ] ``` diff --git a/aws-transform/steering/workload-custom-troubleshooting.md b/aws-transform/steering/workload-custom-troubleshooting.md index c429639c..b55fcf2c 100644 --- a/aws-transform/steering/workload-custom-troubleshooting.md +++ b/aws-transform/steering/workload-custom-troubleshooting.md @@ -39,7 +39,7 @@ Ask: "Does your GitHub PAT have access to [repo name]? Fine-grained PATs need each repo explicitly listed." Resolution: the user updates their PAT on GitHub to include the new repo, then -updates the stored secret: +updates the stored secret. Give them this command to run in their own terminal — do not ask them to paste the token into this chat: ```bash aws secretsmanager put-secret-value --secret-id "atx/github-token" --region "$REGION" --secret-string "<updated-token>" @@ -47,7 +47,7 @@ aws secretsmanager put-secret-value --secret-id "atx/github-token" --region "$RE **3. Has the PAT expired?** GitHub PATs can have expiration dates. Ask: "When did you create this PAT? It may -have expired." Resolution: create a new PAT on GitHub, then update the secret: +have expired." Resolution: create a new PAT on GitHub, then give them this command to run in their own terminal — do not ask them to paste the token into this chat: ```bash aws secretsmanager put-secret-value --secret-id "atx/github-token" --region "$REGION" --secret-string "<new-token>" diff --git a/aws-transform/steering/workload-custom.md b/aws-transform/steering/workload-custom.md index b41a3e32..1dec8e60 100644 --- a/aws-transform/steering/workload-custom.md +++ b/aws-transform/steering/workload-custom.md @@ -336,7 +336,7 @@ If NOT_CONFIGURED, explain what's needed and tell the user to run the create com > > Delete anytime: `aws secretsmanager delete-secret --secret-id atx/github-token --region "$REGION" --force-delete-without-recovery`" -Do NOT ask the user to paste their token in chat. They run the command themselves. +Do NOT ask the user to paste their token in chat, or ask them to confirm or repeat the token value in any way. They run the command themselves. Wait for the user to confirm it's done, then verify: ```bash diff --git a/aws-transform/steering/workload-sql.md b/aws-transform/steering/workload-sql.md index 03ce438a..39422f9c 100644 --- a/aws-transform/steering/workload-sql.md +++ b/aws-transform/steering/workload-sql.md @@ -153,8 +153,7 @@ Use this workflow when starting a new MSSQL → PostgreSQL conversion entirely f ### Step 1: Authentication -- **Cookie auth requires the Transform app URL** — Ask for the prod tenant URL: - - Prod: `https://722368900496189c6.transform.us-east-1.on.aws` +- **Cookie auth requires the Transform app URL** — Ask the user for their Transform app URL (e.g., `https://xxxxxxxx.transform.us-east-1.on.aws`). - Do **not** use the SSO/IdC start URL (`https://d-xxx.awsapps.com/start`). - If the auth cookie has expired, ask the user for a new one. @@ -173,7 +172,7 @@ Use this workflow when starting a new MSSQL → PostgreSQL conversion entirely f - The source SQL file can be either a `.sql` file or a `.zip` archive containing SQL files. Both formats are accepted by AWS Transform. **Upload the file as-is — do NOT zip a `.sql` file before uploading.** - If the ActiveFile is a mssql file, use that. - If not, ask for user's permission and search locally for files — use `fileSearch` to find `.sql` or `.zip` files on the user's machine instead of asking for the full path. -- Upload the file as an artifact first using `upload_transform_task_artifact`. +- Upload the file as an artifact first using `upload_artifact`. - Then send the artifact reference in chat using the URI format: ``` @@ -184,7 +183,7 @@ Use this workflow when starting a new MSSQL → PostgreSQL conversion entirely f ### Step 4: Interacting with the Agent -- **Always use `send_transform_message` as the primary form of communication with the job agent.** +- **Always use `send_message` as the primary form of communication with the job agent.** - **Messages can have interaction buttons** — Agent messages include `SELECT` interactions with options. Respond by sending the option's `value` as a chat message. ### Step 5: Monitoring Job Progress