fix: install Claude Code CLI in Docker image#13
Open
dcschreiber wants to merge 93 commits into
Open
Conversation
fix: add releaserc file
fix: tag string for build checkout
Test Infrastructure: - Add 255 tests covering router, guardrails, models, serializers, tool executor, prompt service, and reason codes - Add test_settings.py with SQLite in-memory database for tests - Configure pytest-asyncio for async test support - Add pyproject.toml with pytest and ruff configuration Developer Experience: - Add CLAUDE.md with project overview and development standards - Add .pre-commit-config.yaml with ruff and eslint hooks - Add setup.sh for one-command environment setup - Add start.sh for launching backend + frontend Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create ARCHITECTURE.md with detailed system design - Document flows, components, data models, and API endpoints - Add system flow diagram and directory structure - Reference from CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed files: - server/chat/logging/ - Custom BraintrustLogger module that was never called in the current codebase (only used in claude_service_old.py) - server/chat/agent/claude_service_old.py - Old agent implementation, replaced by current claude_service.py The current implementation uses Braintrust's native @Traced decorator for tracing, which is simpler and provides automatic span management. Updated README.md directory structure and views.py docstring to reflect that Braintrust tracing uses the native SDK decorators. Note: BraintrustLog and ToolCallEvent database models are retained for potential future use with local logging backup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive plan for restructuring trace logging to follow Braintrust best practices for eval-ready data: - Structured input with query + messages array (enables "Try prompt" UI) - Structured output with response + tool_calls + was_refused - Proper metadata organization (session, model config, routing, context) - Tags for categorical filtering (flow, channel, environment) - Channel/site tracking for multi-platform analytics Implementation broken into 7 discrete tasks that can be executed independently. Plan includes verification checklist and migration notes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PR_DESCRIPTION.md: - Added Braintrust logging cleanup section BRAINTRUST_RESTRUCTURE_PLAN.md: - Added reference URLs at top for quick access - Added client_version to metadata (from context.clientVersion) - Clarified prompt versions are already tracked (no work needed) - Condensed plan from 414 to 290 lines for clarity - Added verification checklist item for client_version - Improved task descriptions with exact line numbers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enables refusal logging by ensuring last_user_message is available before the early return. This is a prerequisite for Task 7 in the Braintrust restructure plan (adding logging to refused requests). Also updates the restructure plan with: - TL;DR explaining string→object change - Correct page_type extraction (subdomain, /texts=home) - OpenAI message format rationale - Fix for last_user_message ordering - Updated references (official cookbook vs provided example) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restructures trace logging to follow Braintrust best practices:
- Structured input: {query, messages[]} instead of truncated string
- Structured output: {response, refs[], tool_calls[], was_refused}
- Tags: flow type + environment for filtering in Braintrust UI
- Page context: site, page_type, page_url for traffic segmentation
- Refusal logging: Previously invisible, now logged with full context
Adds helper functions:
- extract_page_type(): Parse Sefaria URLs to identify page types
- extract_refs(): Extract Sefaria refs from tool calls
Skipped tasks 1 & 8 (channel field) - Slack bot uses separate MCP
architecture and doesn't go through this API.
See docs/BRAINTRUST_RESTRUCTURE_PLAN.md for full implementation details.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reorganized PR_DESCRIPTION.md with summary list at top - Simplified BRAINTRUST_RESTRUCTURE_PLAN.md to reflect what was implemented - Marked restructure plan as temp doc to remove after merge Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Consolidate redundant individual tests into parametrized tests - Move shared fixtures to module level to reduce duplication - Remove verbose docstrings that restated test names - Add explicit return type annotations - Replace imperative loops with list comprehensions All 297 tests pass with ~1,000 fewer lines of code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add hooks/pre-commit to run ruff check + format on staged Python files - Update setup.sh to auto-install git hooks - Update start.sh to verify hooks are installed - Add ruff to requirements.txt - Apply ruff --fix and ruff format across all Python files - Configure ruff ignores: E501, E402 (Django setup), B023 (async false positives) - Fix bare except -> except Exception in tool_executor.py - Fix blind Exception -> IntegrityError in test_models.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add CI workflow running tests on push/PR to main/dev - SQLite job for fast feedback - PostgreSQL job (with service container) to catch DB-specific issues - Add test_settings_postgres.py with sensible local defaults Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove SQLite job from CI (redundant with PostgreSQL) - Add TESTING.md documenting local vs CI testing strategy - Reference TESTING.md from CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The refusal logging code was accessing route_result.safety.reason_codes, but SafetyResult only has 'allowed' and 'refusal_message' fields. The reason_codes field belongs to RouteResult. This would cause an AttributeError when any request was refused. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests _create_refusal_response method to ensure: - Refusal responses are created correctly - Reason codes are extracted from route_result.reason_codes This covers a gap in test coverage that allowed the previous AttributeError bug to go undetected. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These models were designed for local logging backup but were never wired up to production code - only used in tests. - Removed BraintrustLog model (was for eval-ready log storage) - Removed ToolCallEvent model (was for tool call tracking) - Added migration 0004 to drop both tables - Updated README.md to remove from Database Models table - Added test troubleshooting note to CLAUDE.md The current implementation uses Braintrust's native @Traced decorator for tracing, which sends data directly to Braintrust. Local persistence is handled by ChatMessage which captures similar metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prevents "port already in use" errors by automatically killing any processes using ports 8001 and 5173 before starting backend/frontend. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Excludes node_modules, dist, venv, caches, logs, and lock files to reduce token usage when Claude Code indexes the codebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dev tooling, test infrastructure, and Braintrust logging restructure confirmed by Akiva
Braintrust sdk
Add /api/v2/chat/anthropic endpoint that accepts and returns Anthropic Messages API format. This enables calling the agent from Braintrust playground and running evaluations with datasets. - New endpoint reuses existing ClaudeAgentService - Transforms requests/responses to Anthropic format - Supports content blocks and multi-turn messages - Includes 24 tests covering helpers and endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add eval script that can test the Anthropic-compatible endpoint using Braintrust's Eval framework. Supports: - Running against local or remote endpoints - Using Braintrust datasets or sample data - AutoEvals scorers (Factuality, AnswerRelevance) when available - Custom scorers for keyword matching and content validation Usage: BRAINTRUST_API_KEY=<key> python -m braintrust eval chat/evals/eval_anthropic_endpoint.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TestChatAnthropicHTTPIntegration class with 9 tests covering full HTTP request-response cycle (JSON serialization, URL routing, headers) - Tests include Hebrew/Unicode handling, large messages, error formats - Remove chat/evals/ directory (using Braintrust UI for evals) - Fix flaky test_prompt_service test that depended on env vars - Add testing guideline to CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The error handler was returning str(e) which could leak sensitive internal information. Now returns a generic "Internal server error" message while still logging the full exception for debugging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add APIKey model for service authentication with SHA-256 hashed storage - Add service_id field to ChatSession and ChatMessage (nullable user_id) - Add database constraint ensuring at least one identity is set - Create auth module with Actor dataclass and authenticate_request() - Support API key auth (Authorization: Bearer) and user token auth - Extract shared services (chat_service, session_service) for code reuse - Add X-Session-ID header support for multi-turn conversations - Add session ownership validation for security - Update turn logging service to accept Actor instead of user_id - Add comprehensive test coverage (222 tests passing) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Support API key auth via Authorization: Bearer header - Support user token auth via userId query param - Services filter messages by service_id, users by user_id - Enforce session ownership (services only see their sessions) - Add comprehensive tests for auth, messages, and pagination Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Migration was created prematurely - removing until ready. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove API key authentication, keeping only user token auth. - Remove APIKey model and service_id fields from models - Simplify Actor to only support user_id - Remove InvalidAPIKey exception and related error handling - Update all views to use user token auth only - Update tests to use user token authentication Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Accept userId via X-User-Id header to maintain Anthropic API structure compatibility. Header takes precedence over body userId field. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed from X-User-Id to X-Api-Key header for Anthropic standard compliance - Reverted history endpoint security changes to minimize diff scope - Removed test_history_endpoint.py (was testing reverted feature) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add AnthropicRequestSerializer to validate request format consistently with chat_stream_v2. Replaces manual validation with DRF serializer. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Make singleton agent service thread-safe with double-check locking - Remove debug log that exposed user_id - Add stricter validation for content blocks in extract_user_message - Move hardcoded model default to settings.DEFAULT_MODEL - Clean up redundant `or ""` in metadata.get() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Revert get_agent_service() to create new instance each call (original behavior) - Remove DEFAULT_MODEL setting (keep hardcoded default) - Keep: removed debug log, stricter content validation, redundant `or ""` cleanup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove run_agent_turn() which was never called - Remove unused imports (Callable, AgentResponse, ConversationMessage, get_agent_service) - Remove validate_session_ownership from exports (only used internally) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ownership validation was ineffective because update_or_create overwrites the user_id before we check it. Now we query the existing session first and validate ownership before any update occurs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Anthropic-compatible endpoint for Braintrust integration
The claude-agent-sdk Python package is a wrapper around the Claude Code CLI - it spawns `claude` as a subprocess. The deployed server was failing with "Claude Code not found" because the Dockerfile only installed Python dependencies. This adds Node.js and the @anthropic-ai/claude-code npm package to the server stage of the multi-stage build. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
claude-agent-sdkPython package spawns the Claude CLI as a subprocess@anthropic-ai/claude-codeRoot Cause
The
claude-agent-sdkis a Python wrapper around the Claude Code CLI. It usesshutil.which("claude")to find the CLI and spawns it with--output-format stream-json. Without the CLI installed, the agent service fails.Changes
Added to the server stage of the multi-stage Dockerfile:
RUN apk add --no-cache nodejs npm \ && npm install -g @anthropic-ai/claude-codeTest plan
/usr/local/bin/claude)claude --version→2.1.31)🤖 Generated with Claude Code