Skip to content

[EPIC-5-1] FastAPI Service with Health Check and Scheduler#28

Open
deserat wants to merge 2 commits into
developfrom
feature/EPIC-5-1-fastapi-service
Open

[EPIC-5-1] FastAPI Service with Health Check and Scheduler#28
deserat wants to merge 2 commits into
developfrom
feature/EPIC-5-1-fastapi-service

Conversation

@deserat

@deserat deserat commented Jan 26, 2026

Copy link
Copy Markdown
Owner

Summary

Created the FastAPI application entry point with modern lifespan events, comprehensive health check system, APScheduler integration for background jobs, and placeholder items router for infrastructure testing.

Related Issue

Closes #16 (EPIC-5-1)

Changes

  • FastAPI Application (app/main.py):

    • Modern @asynccontextmanager lifespan pattern for startup/shutdown
    • Startup sequence: load config → connect database → start scheduler
    • Shutdown sequence: gracefully stop scheduler
    • CORS middleware with configurable origins
    • Router registration for health and items endpoints
    • Fallback defaults for test imports
  • APScheduler Service (app/services/scheduler.py):

    • AsyncIOScheduler for FastAPI async compatibility
    • Surfacing job with interval trigger (configurable seconds, default 24h)
    • Cleanup job with cron trigger (daily at 2 AM)
    • Graceful startup/shutdown with error handling
    • Status reporting function for health checks
  • Health Check Endpoints (app/api/endpoints/health.py):

    • GET /health: Detailed status (database + scheduler + version)
    • GET /health/ready: Kubernetes readiness probe (database check only)
    • GET /health/live: Kubernetes liveness probe (always 200)
    • Pydantic response models for type safety
  • Placeholder Items Router (app/api/endpoints/items.py):

    • GET /items: Returns empty list []
    • GET /items/{id}: Returns 404 Not Found
    • POST /items: Returns 501 Not Implemented
    • PUT /items/{id}: Returns 501 Not Implemented
    • DELETE /items/{id}: Returns 501 Not Implemented
    • Purpose: Infrastructure testing before real CRUD implementation
  • Startup Script (run.sh):

    • Load environment variables from .env
    • Validate config.yaml exists
    • Run Alembic migrations (alembic upgrade head)
    • Start uvicorn server with debug mode support
    • Executable permissions set

Testing

  • Unit tests pass (30 tests for scheduler, health, items)
  • Integration tests pass (9 tests for app lifespan and routing)
  • Coverage >= 90% (achieved 100%)
  • Manual testing completed
  • All quality checks pass (ruff format, ruff check, pyright)

Test Summary:

  • 39 tests total, all passing
  • 12 scheduler tests (job execution, start/stop, config, error handling)
  • 8 health endpoint tests (all status combinations)
  • 9 items endpoint tests (all placeholder endpoints)
  • 10 integration tests (app creation, lifespan, routing, failure scenarios)
  • Coverage: 100% on new code

Database Changes

  • None (infrastructure only)

Dependencies Added

  • apscheduler>=3.10.4,<3.11.0 - Background job scheduling
  • httpx>=0.25.0,<0.28.0 - Pinned to avoid breaking changes in 0.28+

Verification Steps

cd api-service

# Install dependencies
uv sync

# Run quality checks
uv run ruff format app/
uv run ruff check app/
uv run pyright app/

# Run tests with coverage
DATABASE_URL="postgresql://test:test@localhost:5432/test" \
  uv run pytest -v --cov=app --cov-fail-under=90

# Start API server
./run.sh

# In another terminal - test endpoints
curl http://localhost:8000/
curl http://localhost:8000/health
curl http://localhost:8000/health/ready
curl http://localhost:8000/health/live
curl http://localhost:8000/api/v1/items/
curl http://localhost:8000/api/v1/items/123

# Check OpenAPI docs
open http://localhost:8000/docs

Checklist

  • Code follows api-service conventions (app/ layout)
  • api-service/AGENTS.md reviewed for context
  • Documentation updated (implementation plan in docs/dev/plans/)
  • No secrets or sensitive data committed
  • Migration tested (N/A - no migrations in this change)
  • Tests achieve 100% coverage on new code
  • All quality gates passed (ruff, pyright)

🤖 Generated with Claude Code

- What: Created FastAPI application entry point with modern lifespan events, APScheduler integration, health check endpoints, and placeholder items router
- Why: Establishes core API service infrastructure for Noosphere (Issue #16)
- How: Modern FastAPI patterns with @asynccontextmanager lifespan, AsyncIOScheduler for background jobs, and fail-fast startup strategy

Acceptance criteria addressed:
- [x] FastAPI application with modern lifespan events (startup/shutdown)
- [x] Three health check endpoints (/health, /health/ready, /health/live)
- [x] APScheduler integration for background jobs (surfacing, cleanup)
- [x] Placeholder items router (GET/POST/PUT/DELETE with 501/404 responses)
- [x] CORS middleware configuration
- [x] Startup script (run.sh) for development
- [x] Comprehensive test suite with 100% coverage

Implementation details:
- app/main.py: FastAPI app with @asynccontextmanager lifespan pattern
  - Startup: load config → connect database → start scheduler
  - Shutdown: gracefully stop scheduler
  - CORS middleware with configurable origins
  - Router registration for health and items endpoints
- app/services/scheduler.py: APScheduler service (AsyncIOScheduler)
  - Surfacing job (interval trigger, configurable seconds)
  - Cleanup job (cron trigger, daily at 2 AM)
  - Graceful startup/shutdown with error handling
- app/api/endpoints/health.py: Three health check endpoints
  - GET /health: Detailed status (database + scheduler + version)
  - GET /health/ready: Kubernetes readiness probe (database check)
  - GET /health/live: Kubernetes liveness probe (always 200)
- app/api/endpoints/items.py: Placeholder CRUD endpoints
  - GET /items: Returns empty list []
  - GET /items/{id}: Returns 404 Not Found
  - POST/PUT/DELETE: Returns 501 Not Implemented
- run.sh: Startup script with .env loading, config validation, migrations, uvicorn server

Testing:
- Unit tests: 30 tests for scheduler, health endpoints, and items endpoints
- Integration tests: 9 tests for app creation, lifespan, and router registration
- Total: 39 tests, all passing
- Coverage: 100% on new code (exceeds 90% target)

Database changes:
- None (infrastructure only)

Dependencies added:
- apscheduler>=3.10.4,<3.11.0 for background job scheduling
- httpx>=0.25.0,<0.28.0 pinned to avoid breaking changes in 0.28+

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@github-actions github-actions Bot added service/api-service FastAPI Python backend service/database PostgreSQL schema/migrations labels Jan 26, 2026
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @deserat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for the Noosphere API service by establishing its core FastAPI application. It focuses on creating a resilient and observable service foundation, integrating essential components like a configurable background job scheduler and a comprehensive health monitoring system. The inclusion of placeholder API routes ensures that the infrastructure can be tested and validated independently of future business logic development.

Highlights

  • FastAPI Application Core: Introduced the main FastAPI application (app/main.py) with a modern @asynccontextmanager lifespan for robust startup and shutdown sequences, including configuration loading, database connection, and scheduler management.
  • Comprehensive Health Checks: Implemented detailed health check endpoints (/health, /health/ready, /health/live) in app/api/endpoints/health.py to monitor database and scheduler status, crucial for Kubernetes deployments.
  • Background Job Scheduler: Integrated APScheduler (app/services/scheduler.py) to manage background tasks like 'surfacing' and 'cleanup' jobs, configurable via config.yaml and gracefully handled during application lifecycle.
  • Placeholder API Endpoints: Added placeholder CRUD endpoints for 'items' (app/api/endpoints/items.py) that return empty lists, 404s, or 501s, enabling early infrastructure testing without full business logic implementation.
  • Automated Startup Script: Provided a run.sh script to streamline service startup, handling environment variable loading, configuration validation, Alembic database migrations, and launching the Uvicorn server in either debug or production mode.
  • Dependency Management: Added apscheduler for background job scheduling and httpx (pinned to a specific version range) to requirements.txt.
  • Extensive Testing: Included a comprehensive suite of unit and integration tests (39 total, achieving 100% coverage on new code) for the FastAPI application, health endpoints, items endpoints, and scheduler service.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request lays a solid foundation for the FastAPI service, with a well-structured application, modern practices like lifespan management, and comprehensive tests. My review focuses on improving the correctness of the health check endpoints to align with standard practices, enhancing the security and robustness of the startup script, and a few maintainability improvements to reduce hardcoded values and dead code.

Comment thread api-service/app/api/endpoints/health.py Outdated
Comment on lines +40 to +66
async def health():
"""Detailed health check endpoint.

Returns comprehensive health status of all components.
Returns 503 if any component is unhealthy.

Returns:
HealthResponse: Detailed status of database, scheduler, and service
"""
db_healthy = check_database_health()
scheduler_status = get_scheduler_status()

# Determine overall status
overall_status = "healthy" if db_healthy and scheduler_status == "running" else "degraded"

response = HealthResponse(
status=overall_status,
database=db_healthy,
scheduler=scheduler_status,
version="0.1.0",
)

# Return 503 if any component is unhealthy
if not db_healthy or scheduler_status != "running":
return response

return response

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The docstring for the /health endpoint states it "Returns 503 if any component is unhealthy," but the current implementation always returns a 200 OK status code, which can be misleading for monitoring systems. Additionally, the final if/return block is redundant. You can use JSONResponse to conditionally set the correct status code while returning the same response body.

You will need to add from fastapi.responses import JSONResponse to the imports at the top of the file.

async def health():
    """Detailed health check endpoint.

    Returns comprehensive health status of all components.
    Returns 503 if any component is unhealthy.

    Returns:
        HealthResponse: Detailed status of database, scheduler, and service
    """
    db_healthy = check_database_health()
    scheduler_status = get_scheduler_status()

    # Determine overall status
    is_healthy = db_healthy and scheduler_status == "running"
    overall_status = "healthy" if is_healthy else "degraded"

    response_data = HealthResponse(
        status=overall_status,
        database=db_healthy,
        scheduler=scheduler_status,
        version="0.1.0",
    )

    if not is_healthy:
        return JSONResponse(status_code=503, content=response_data.model_dump())

    return response_data

Comment thread api-service/app/api/endpoints/health.py Outdated
Comment on lines +77 to +92
async def ready():
"""Readiness probe endpoint for Kubernetes.

Checks if the service is ready to handle requests.
Load balancers use this to decide if instance should receive traffic.

Returns:
ReadyResponse: Readiness status
"""
# Check database connection - critical for readiness
db_healthy = check_database_health()

if not db_healthy:
return ReadyResponse(ready=False)

return ReadyResponse(ready=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The /health/ready probe should return a 503 status code when the service is not ready to handle traffic. Currently, it returns a 200 OK with {"ready": false}, which may be misinterpreted by systems like Kubernetes that expect a non-2xx status code to mark a pod as unready.

To fix this, you can use JSONResponse to return the correct status code. You'll need to add from fastapi.responses import JSONResponse if you haven't already.

Suggested change
async def ready():
"""Readiness probe endpoint for Kubernetes.
Checks if the service is ready to handle requests.
Load balancers use this to decide if instance should receive traffic.
Returns:
ReadyResponse: Readiness status
"""
# Check database connection - critical for readiness
db_healthy = check_database_health()
if not db_healthy:
return ReadyResponse(ready=False)
return ReadyResponse(ready=True)
async def ready():
"""Readiness probe endpoint for Kubernetes.
Checks if the service is ready to handle requests.
Load balancers use this to decide if instance should receive traffic.
Returns:
ReadyResponse: Readiness status
"""
# Check database connection - critical for readiness
db_healthy = check_database_health()
if not db_healthy:
return JSONResponse(status_code=503, content={"ready": False})
return ReadyResponse(ready=True)

Comment thread api-service/run.sh Outdated
# Load environment variables
if [ -f .env ]; then
echo "✓ Loading environment variables from .env..."
export $(grep -v '^#' .env | xargs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The command export $(grep -v '^#' .env | xargs) is not a safe way to parse .env files. It is vulnerable to command injection if the .env file is compromised and will fail for variable values containing spaces. A more robust and secure method should be used.

Suggested change
export $(grep -v '^#' .env | xargs)
# Use `set -a` to automatically export all variables defined from now on.
set -a
# shellcheck source=.env
source .env
# Disable auto-exporting.
set +a

Comment thread api-service/app/main.py
Comment on lines +85 to +90
app = FastAPI(
title="Noosphere API",
description="Knowledge management system with spaced repetition and AI classification",
version="0.1.0",
lifespan=lifespan,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The application version '0.1.0' is hardcoded here and in other places (the root endpoint and health.py). To avoid inconsistencies and make version bumps easier, it's best practice to define the version in a single place. Consider adding __version__ = "0.1.0" at the top of this module and referencing it everywhere, or loading it from a configuration file.

Comment thread api-service/app/main.py Outdated
Comment on lines +93 to +102
try:
config = get_config()
cors_origins = config.api.get("cors_origins", ["http://localhost:3000"])
api_prefix = config.api.get("api_prefix", "/api")
api_version = config.api.get("api_version", "v1")
except RuntimeError:
# Config not loaded yet (e.g., during test imports) - use defaults
cors_origins = ["http://localhost:3000"]
api_prefix = "/api"
api_version = "v1"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This try...except block duplicates the default values for cors_origins, api_prefix, and api_version. This could lead to inconsistencies if one is updated and the other is forgotten. Consider defining these defaults as constants at the module level to ensure a single source of truth for both the .get() calls and the except block.

Comment thread api-service/app/services/scheduler.py Outdated

# Get configuration values
surfacing_interval = scheduler_config.get("surfacing_interval_seconds", 86400)
batch_size = scheduler_config.get("batch_size", 50)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batch_size variable is retrieved from the configuration and logged, but it's never actually used. The scheduled jobs (surfacing_job, cleanup_job) don't accept it as an argument. This is unused code and can be confusing. If it's intended for future use, consider adding a TODO or passing it to the jobs. Otherwise, it should be removed for now.

Comment thread api-service/app/services/scheduler.py Outdated
"""
if scheduler.running:
return "running"
elif scheduler.state == 0: # STATE_STOPPED

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the magic number 0 for STATE_STOPPED is brittle and harms readability. The scheduler instance itself exposes state constants like STATE_STOPPED, which should be used to avoid issues if the underlying values change in future library versions.

Suggested change
elif scheduler.state == 0: # STATE_STOPPED
elif scheduler.state == scheduler.STATE_STOPPED: # STATE_STOPPED

@deserat deserat self-assigned this Jan 26, 2026
- What: Fixed 2 critical health endpoint bugs, 1 high-priority security vulnerability, and 4 code quality issues
- Why: Gemini Code Assist review identified critical status code bugs, command injection risk, and code quality improvements
- How: Implemented 6-phase fix addressing all identified issues

Critical Fixes:
- [CRITICAL] Health endpoint now returns HTTP 503 when unhealthy (was 200)
- [CRITICAL] Ready endpoint returns HTTP 503 when not ready (was 200)
- [HIGH] Fixed command injection vulnerability in run.sh .env loading

Code Quality Improvements:
- [MEDIUM] Centralized version management in app/__init__.py
- [MEDIUM] Extracted configuration defaults to module-level constants
- [MEDIUM] Removed unused batch_size variable from scheduler
- [MEDIUM] Replaced magic number 0 with STATE_STOPPED constant

Review comments addressed:
- [x] Issue #1: /health endpoint returns 503 when degraded (health.py:66)
- [x] Issue #2: /health/ready returns 503 when not ready (health.py:92)
- [x] Issue #3: Fixed unsafe .env parsing in run.sh (run.sh:19)
- [x] Issue #4: Version hardcoded in 3 locations (centralized to app/__init__.py)
- [x] Issue #5: Duplicate default values in main.py (extracted to constants)
- [x] Issue #6: Unused batch_size variable (scheduler.py:66)
- [x] Issue #7: Magic number 0 instead of STATE_STOPPED (scheduler.py:131)

Testing:
- Updated 4 health endpoint tests to expect 503 status codes
- All 143 tests passing
- Coverage: 97% (target: 90%+)

Files modified:
- Created: app/__init__.py (version constant)
- Modified: app/main.py (version import, config constants)
- Modified: app/api/endpoints/health.py (JSONResponse, 503 status codes)
- Modified: app/services/scheduler.py (STATE_STOPPED constant, removed batch_size)
- Modified: run.sh (secure .env loading)
- Modified: tests/unit/api/test_health.py (503 status code assertions)

Security Impact:
- Fixed command injection vulnerability in .env file parsing
- Now uses secure 'set -a; source .env; set +a' pattern
- Properly handles values with spaces and special characters

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@deserat

deserat commented Jan 26, 2026

Copy link
Copy Markdown
Owner Author

Review Comments Addressed ✓

Thank you for the thorough review! I've addressed all 7 comments from Gemini Code Assist. Here's a summary of the fixes:

Critical Issues Fixed (2)

1. ✅ Health endpoint status codes (CRITICAL)

Issue: /health endpoint always returned 200 OK, even when unhealthy
Location: app/api/endpoints/health.py:66
Fix: Now returns HTTP 503 when database or scheduler is unhealthy
Impact: Monitoring systems can now properly detect service degradation

2. ✅ Ready endpoint status codes (CRITICAL)

Issue: /health/ready returned 200 with {"ready": false} when database down
Location: app/api/endpoints/health.py:92
Fix: Now returns HTTP 503 when database is unhealthy
Impact: Kubernetes will properly remove unhealthy pods from load balancer

High Priority Security Fix (1)

3. ✅ Command injection vulnerability in run.sh (HIGH)

Issue: Unsafe .env parsing using export $(grep -v '^#' .env | xargs)
Location: run.sh:19
Vulnerability: Could execute arbitrary commands from malicious .env values
Fix: Replaced with secure pattern:

set -a  # Automatically export all variables
source .env
set +a  # Disable auto-export

Benefits:

  • No command injection risk
  • Properly handles values with spaces
  • Standard bash security pattern

Code Quality Improvements (4)

4. ✅ Version hardcoded in 3 locations (MEDIUM)

Issue: Version "0.1.0" duplicated in main.py:90,139 and health.py:59
Fix: Created app/__init__.py with __version__ = "0.1.0" constant
Impact: Single source of truth for version number

5. ✅ Duplicate default values (MEDIUM)

Issue: Configuration defaults duplicated in try/except block in main.py:95-102
Fix: Extracted to module-level constants:

DEFAULT_CORS_ORIGINS = ["http://localhost:3000"]
DEFAULT_API_PREFIX = "/api"
DEFAULT_API_VERSION = "v1"

6. ✅ Unused batch_size variable (MEDIUM)

Issue: batch_size retrieved from config but never used (scheduler.py:66)
Fix: Removed unused variable and updated logging

7. ✅ Magic number instead of constant (MEDIUM)

Issue: Using 0 instead of STATE_STOPPED constant (scheduler.py:131)
Fix: Added import and used proper constant:

from apscheduler.schedulers.base import STATE_STOPPED
# ...
elif scheduler.state == STATE_STOPPED:

Verification Results ✓

All quality checks passing:

✅ ruff format app/     # 1 file reformatted, 21 unchanged
✅ ruff check app/      # 0 errors in modified files
✅ pyright app/         # 0 errors, 0 warnings
✅ pytest --cov=app     # 143 tests passed, 97% coverage (target: 90%+)

Test Updates

Updated 4 health endpoint tests to expect 503 status codes:

  • test_health_endpoint_database_unhealthy - expects 503
  • test_health_endpoint_scheduler_not_running - expects 503
  • test_health_endpoint_all_unhealthy - expects 503
  • test_ready_endpoint_when_not_ready - expects 503

Files Changed

File Change Reason
app/__init__.py Created Single source for version constant
app/main.py Modified Import __version__, config constants
app/api/endpoints/health.py Modified Return 503 when unhealthy, import __version__
app/services/scheduler.py Modified Use STATE_STOPPED, remove batch_size
run.sh Modified Secure .env loading
tests/unit/api/test_health.py Modified Expect 503 for unhealthy states

Breaking Changes

⚠️ Status Code Change: Health endpoints now return HTTP 503 when unhealthy instead of 200 OK. This is the correct behavior per Kubernetes and HTTP standards, but may affect monitoring systems expecting 200.

Migration: Update monitoring systems to treat 503 as unhealthy (this is standard).


Commit: 7c06291
All review comments resolved ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

service/api-service FastAPI Python backend service/database PostgreSQL schema/migrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant