A PostgreSQL proxy with query analysis and data masking capabilities.
- JWT/OIDC Authentication: Authenticate clients via Keycloak or Azure Entra ID JWT tokens instead of database passwords
- Query Analysis: AI-powered query efficiency analysis via Portkey LLM gateway (Claude on AWS Bedrock)
- Data Masking: Automatic masking of sensitive columns in SELECT query results
- Three Proxy Modes:
blocking- Block inefficient queriesnon-blocking- Log issues but allow all queries throughshunt- Route bad queries to a separate "slow" database node
- Config Generator: Auto-generate masking configuration by scanning database schema with AI
- Audit Trail: JWT user identity injected into
application_namefor backend visibility - RDS / Aurora IAM Authentication: Token-based auth with automatic refresh and cross-account STS support
- ElastiCache / Valkey IAM Authentication: AWS SigV4 token auth with automatic refresh
- Client TLS: Optional TLS termination with configurable enforcement; filters
SCRAM-SHA-256-PLUSchannel binding appropriately - Redshift Support: Specialized AI agent with Redshift system catalogs (
stv_*,svv_*) - Health Checks: HTTP
/healthand/readyendpoints for Kubernetes / NLB probes - Metrics: Per-query LLM, backend, and total latency recorded to
query_metricstable - TCP Keepalives: Prevents silent connection drops through NAT gateways and load balancers
- Go 1.26+
- PostgreSQL database
- Docker (for local development)
- AWS credentials (for AI features)
docker compose up -d postgres-fast postgres-shunt postgres-reporter valkeyThis starts:
- PostgreSQL "fast" node on port 5433 (primary database)
- PostgreSQL "shunt" node on port 5434 (for rejected queries in shunt mode)
- PostgreSQL "reporter" node on port 5435 (stores
query_issues+query_metrics) - Valkey on port 6379 (query decision cache)
Copy and edit the environment file:
cp .env.example .env
# Edit .env with your settingsKey configuration options:
| Variable | Description | Default |
|---|---|---|
LISTEN_ADDR |
Proxy listen address | :5432 |
HEALTH_CHECK_ADDR |
HTTP health check endpoint address | :8080 |
PG_BACKEND_URL |
Backend PostgreSQL URL | required |
SHUNT_NODE_URL |
Slow node URL (required for shunt mode) | - |
PROXY_MODE |
blocking, non-blocking, or shunt |
blocking |
MASKING_ENABLED |
Enable data masking | false |
MASKING_CONFIG_PATH |
Path to masking config | ./masking_config.yaml |
ANALYSIS_ENABLED |
Enable AI query analysis | false |
CONFIG_GENERATOR_MODE |
Run config generator instead of proxy | false |
OIDC_AUTH |
Enable JWT/OIDC authentication | false |
OIDC_DISCOVERY_URL |
OIDC discovery endpoint URL | - |
OIDC_ISSUER_OVERRIDE |
Override expected issuer claim (for EntraID) | - |
OIDC_AUDIENCE |
Expected JWT audience (comma-separated) | - |
OIDC_ALLOWED_ROLES |
Required roles (comma-separated, empty = no role check) | - |
OIDC_CERT_PATH |
Custom CA certificate for OIDC provider | - |
TLS_ENABLED |
Enable TLS for client connections | false |
TLS_CERT_PATH |
Path to TLS certificate | - |
TLS_KEY_PATH |
Path to TLS private key | - |
REQUIRE_CLIENT_TLS |
Reject non-TLS client connections | false |
PG_IAM_AUTH |
Enable RDS IAM token authentication | false |
PG_IAM_USERNAME |
Database user with rds_iam role |
- |
PG_AWS_REGION |
AWS region for RDS IAM tokens | - |
VALKEY_IAM_AUTH |
Enable ElastiCache/Valkey IAM authentication | false |
VALKEY_URL |
Valkey/Redis URL for query cache | - |
METADATA_CACHE_TTL |
TTL for schema metadata cache | 1h |
PORTKEY_BASE_URL |
Portkey LLM gateway URL | - |
PORTKEY_API_KEY |
Portkey API key | - |
PORTKEY_VIRTUAL_KEY |
Portkey virtual key | - |
BEDROCK_MODEL_ID |
Claude model ID via Bedrock | eu.anthropic.claude-sonnet-4-20250514-v1:0 |
PG_BACKEND_CA_PATH |
PEM CA bundle for backend TLS verification (verify-full/verify-ca). Use AWS RDS CA bundle for RDS IAM deployments. |
- |
MAX_CONNS_PER_IP_PER_MIN |
Max new connections per source IP per minute (0 = disabled) | 60 |
MAX_MESSAGE_SIZE_MB |
Max PostgreSQL protocol message size in MiB | 4 |
ALLOW_MD5_AUTH |
Allow MD5 password authentication (deprecated since PostgreSQL 10) | false |
STORE_RAW_QUERY |
Store raw SQL text in the reporter (default: fingerprint only) | false |
VALKEY_TLS_SKIP_VERIFY |
Disable Valkey TLS certificate verification (dev only) | false |
OIDC_ALLOWED_DATABASES |
Comma-separated list of databases JWT users may access (empty = allow all — not recommended) | - |
# Build
go build -o pgproxy ./cmd/pgproxy
# Run proxy
./pgproxyPGPASSWORD=testpass psql -h localhost -p 5432 -U testuser -d testdbPGProxy exposes HTTP endpoints on port 8080 (configurable via HEALTH_CHECK_ADDR) for load balancer health checks:
GET /health- Liveness probe. Returns200 OKif the process is running.GET /ready- Readiness probe. Returns200 OKif the database connection is healthy,503 Service Unavailableotherwise.
For AWS Network Load Balancer health checks:
resource "aws_lb_target_group" "pgproxy" {
port = 5432
protocol = "TCP"
target_type = "ip"
vpc_id = var.vpc_id
health_check {
enabled = true
protocol = "HTTP"
port = 8080
path = "/ready"
interval = 30
timeout = 5
healthy_threshold = 2
unhealthy_threshold = 2
}
}# Liveness check
curl http://localhost:8080/health
# Readiness check
curl http://localhost:8080/readyPGProxy supports JWT/OIDC authentication via Keycloak (or any OIDC-compliant provider). When enabled, clients authenticate using a JWT token instead of the database password.
- Client connects to the proxy and sends a startup message (with any username)
- Proxy requests a password (cleartext)
- Client sends the JWT token as the password
- Proxy validates the token against the OIDC provider (signature, expiry, audience, roles)
- If valid, proxy replaces the client's username/database with the service account credentials from
PG_BACKEND_URL - Proxy authenticates to the backend using the service account credentials
- The JWT user's email is injected into
application_namefor audit visibility
# Enable authentication
OIDC_AUTH=true
# OIDC discovery endpoint
# Keycloak: https://keycloak.example.com/auth/realms/myrealm/.well-known/openid-configuration
# EntraID: https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration
OIDC_DISCOVERY_URL=https://keycloak.example.com/auth/realms/myrealm/.well-known/openid-configuration
# Override issuer validation (optional)
# Required for EntraID where discovery URL differs from token issuer
# EntraID example: https://sts.windows.net/{tenant-id}/
OIDC_ISSUER_OVERRIDE=
# Expected audience in the JWT
OIDC_AUDIENCE=my-app
# Required roles (comma-separated, leave empty to skip role check)
OIDC_ALLOWED_ROLES=db_reader,analyst
# Custom CA certificate (if using internal CA)
OIDC_CERT_PATH=./certs/ca-bundle.pemUse the JWT token as the password when connecting:
# Using psql
PGPASSWORD="eyJhbGciOiJSUzI1NiIs..." psql -h localhost -p 5432 -U anyuser -d anydb
# Using connection string
psql "postgres://anyuser:eyJhbGciOiJSUzI1NiIs...@localhost:5432/anydb"Important: The username and database in the client connection are ignored when JWT auth is enabled. The proxy replaces them with the service account credentials from PG_BACKEND_URL (e.g., postgres://pgproxy_service:secret@backend:5433/production). The actual user identity comes from the JWT token's email or upn claim.
When JWT authentication is enabled, the proxy automatically sets the backend connection's application_name to the authenticated user's email, prefixed with jwt::
application_name = jwt:user@example.com
This provides visibility into who is running queries:
PostgreSQL:
SELECT usename, application_name, query
FROM pg_stat_activity
WHERE application_name LIKE 'jwt:%';Redshift:
SELECT userid, application_name, querytxt
FROM stv_recents
WHERE application_name LIKE 'jwt:%';This works with both PostgreSQL and Redshift backends, enabling DBAs to see which JWT user is executing each query in real-time.
The proxy handles backend authentication automatically using credentials from PG_BACKEND_URL:
| Backend Auth Method | Supported |
|---|---|
| Trust | Yes |
| Cleartext Password | Yes |
| MD5 | Yes (client-side requires ALLOW_MD5_AUTH=true) |
| SCRAM-SHA-256 | Yes |
| SCRAM-SHA-256-PLUS | Yes (via pgx v5.9+) |
| RDS IAM Token | Yes (PG_IAM_AUTH=true) |
Create a masking_config.yaml file:
version: "1.0"
default_schema: public
unknown_table_behavior: passthrough # or llm_inference
tables:
- schema: public
table: users
columns:
- name: email
mask_type: email
- name: password_hash
mask_type: full
- name: username
mask_type: partial
params:
visible_chars: 3| Type | Description | Example |
|---|---|---|
full |
Complete redaction | ***MASKED*** |
partial |
Show first N characters | joh*** |
email |
Mask email preserving domain | j***@example.com |
hash |
SHA-256 hash (deterministic) | a1b2c3... |
Use the config generator to scan your database and create a masking config:
# Set environment
export CONFIG_GENERATOR_MODE=true
export CONFIG_GENERATOR_SCHEMAS=public # or "all" for all schemas
export MASKING_CONFIG_PATH=./masking_config.yaml
# Run generator
./pgproxyThe generator:
- Scans all tables in specified schemas
- Queries a sample row from each table
- Uses AI to identify sensitive columns
- Outputs YAML config with confidence comments for review
Build and run the load generator:
# Build
go build -o loadgen ./cmd/loadgen
# Run against proxy (port 5432)
./loadgen -conn "postgres://testuser:testpass@localhost:5432/testdb?sslmode=disable" \
-workers 10 -duration 30s
# Run against direct database (port 5433) for comparison
./loadgen -conn "postgres://testuser:testpass@localhost:5433/testdb?sslmode=disable" \
-workers 10 -duration 30sOptions:
-conn: PostgreSQL connection string-workers: Number of concurrent workers (default: 10)-duration: Test duration (default: 30s)-rampup: Ramp-up time (default: 5s)
The project includes a Makefile for common development tasks:
# Run all tests (unit + BDD, excludes @real-bedrock tests)
make test
# Run only unit tests (no Docker needed)
make test-unit
# Run only BDD integration tests
make test-bdd
# Run ALL tests including real AWS Bedrock tests (requires AWS creds, costs money)
make test-all
# Generate combined test coverage report (unit + BDD)
make coverage
# Start test infrastructure (PostgreSQL + Valkey)
make infra-up
# Stop and remove test infrastructure
make infra-down
# Clean up coverage files and test cache
make clean
# Show all available targets
make helpPre-commit workflow:
# Before committing, run the full test suite
make test
# Or run coverage to see code coverage
make coveragepgproxy/
├── cmd/
│ ├── pgproxy/ # Main proxy binary
│ └── loadgen/ # Load testing tool
├── internal/
│ ├── agent/ # AI/Bedrock client
│ ├── auth/ # JWT/OIDC authentication (Keycloak)
│ ├── cache/ # Query approval cache
│ ├── config/ # Configuration
│ ├── masking/ # Data masking logic
│ ├── metadata/ # Database metadata cache
│ ├── proxy/ # PostgreSQL proxy
│ └── reporter/ # Issue reporting (query_issues) + metrics (query_metrics)
├── docker-compose.yml # Local PostgreSQL
├── init.sql # Test data
└── masking_config.yaml
# Run all tests
go test ./...
# Run tests with verbose output
go test -v ./...
# Run tests for a specific package
go test -v ./internal/masking/...
go test -v ./internal/proxy/...
go test -v ./internal/config/...
# Run specific test by name
go test -v ./internal/masking/... -run TestQuoteIdentifierWe use godog for BDD-style integration tests. These tests spin up the actual proxy binary and run real queries against PostgreSQL.
Prerequisites:
- Docker (for test databases)
- The test databases running
1. Start the test infrastructure:
docker compose up -d postgres-fast postgres-shunt postgres-reporter valkeyThis starts:
postgres-fast(port 5433) - Primary "fast" databasepostgres-shunt(port 5434) - "Penalty box" for bad queriespostgres-reporter(port 5435) - Stores query issues & performance metricsvalkey(port 6379) - Cache for query decisions
Wait ~10 seconds for containers to be healthy:
docker compose ps2. Run the tests:
# Run all tests EXCEPT real Bedrock (no AWS costs)
INTEGRATION_TESTS=true GODOG_TAGS=~@real-bedrock go test ./tests/... -v
# Run ALL tests including real Bedrock (requires AWS creds, costs money)
INTEGRATION_TESTS=true go test ./tests/... -v3. Run specific test suites:
# Just blocking mode tests
INTEGRATION_TESTS=true GODOG_TAGS=@blocking go test ./tests/... -v
# Just shunt mode tests
INTEGRATION_TESTS=true GODOG_TAGS=@shunt go test ./tests/... -v
# Just masking tests
INTEGRATION_TESTS=true GODOG_TAGS=@masking go test ./tests/... -v
# Just real Bedrock tests (verifies AI analysis works)
INTEGRATION_TESTS=true GODOG_TAGS=@real-bedrock go test ./tests/... -v4. Generate test reports:
# JUnit XML report (for CI systems)
INTEGRATION_TESTS=true GODOG_FORMAT=junit GODOG_TAGS=~@real-bedrock go test ./tests/... -v 2>&1 | tee test-report.xml
# Cucumber JSON report
INTEGRATION_TESTS=true GODOG_FORMAT=cucumber GODOG_TAGS=~@real-bedrock go test ./tests/... -v > test-report.json5. Cleanup:
docker compose down -vTest structure:
tests/
├── features/
│ ├── analysis/
│ │ ├── blocking.feature # Blocking mode scenarios
│ │ ├── non_blocking.feature # Non-blocking mode scenarios
│ │ ├── shunt.feature # Shunt mode scenarios
│ │ └── real_bedrock.feature # Real AWS Bedrock tests
│ ├── masking/
│ │ └── masking.feature # Data masking scenarios
│ └── config_generation/
│ └── config.feature # Config validation
├── steps/
│ └── common_steps.go # Step definitions
├── support/
│ └── context.go # Test helpers & mock Bedrock
└── integration_test.go # Test runner
Writing new tests:
Tests use Gherkin syntax. Example:
@shunt @cartesian
Scenario: Cartesian product goes to shunt node
Given the proxy is running in "shunt" mode
And I am connected to the database through the proxy
And the AI will reject queries containing "users, orders" with reason "Cartesian product"
When I execute the query "SELECT * FROM users, orders"
Then the query should succeed
And the query should be routed to the shunt nodeWe use gremlins for mutation testing to verify test quality.
# Install gremlins
go install github.com/go-gremlins/gremlins/cmd/gremlins@latest
# Dry run (shows what mutations would be tested)
~/go/bin/gremlins unleash --dry-run ./internal/masking
# Run mutation tests (with longer timeout for reliability)
~/go/bin/gremlins unleash --timeout-coefficient 10 ./internal/masking
~/go/bin/gremlins unleash --timeout-coefficient 10 ./internal/config
~/go/bin/gremlins unleash --timeout-coefficient 10 ./internal/proxyMutation Testing Results Explained:
- Killed: Tests caught the mutation (good)
- Lived: Tests didn't catch the mutation (test gap)
- Not Covered: Code not covered by tests
- Test Efficacy: Percentage of covered mutations caught by tests
Target: >75% test efficacy, >50% mutator coverage.
# Generate unit test coverage report
go test -coverprofile=coverage.out ./...
# View coverage in browser
go tool cover -html=coverage.out
# View coverage by function
go tool cover -func=coverage.outCombined Coverage (Unit + BDD):
The BDD tests exercise the proxy binary end-to-end. To measure coverage from BDD tests:
# Create coverage directory
mkdir -p coverage/bdd
# Run BDD tests with instrumented binary
GOCOVERDIR=$(pwd)/coverage/bdd INTEGRATION_TESTS=true GODOG_TAGS=~@real-bedrock go test ./tests/... -v
# Convert to text format
go tool covdata textfmt -i=coverage/bdd -o=coverage/bdd.out
# View BDD coverage
go tool cover -func=coverage/bdd.outThe unit tests and BDD tests provide complementary coverage:
- Unit tests: Test pure functions, mocked agent loops, auth validation
- BDD tests: Test proxy startup, wire protocol, masking pipeline, query routing
See .env for all available configuration options.
Client -> PGProxy (port 5432) -> PostgreSQL (port 5433)
|
├── JWT Authentication (optional)
├── Query Analysis (SELECT only)
├── Data Masking (optional)
└── Query Blocking (blocking mode only)
Client PGProxy Backend
| | |
|--- Startup Message ----->| |
|<-- Password Request -----| |
|--- JWT Token ----------->| |
| |-- Validate JWT (Keycloak) --> |
| |<- Token Valid ----------------|
| | |
| |-- Startup (user from URL) --->|
| | application_name=jwt:email |
| |<- Auth Request (SCRAM/MD5) ---|
| |-- Auth Response ------------->|
| |<- Auth OK --------------------|
|<-- Auth OK --------------| |
| | |
|--- Query --------------->|--- Query -------------------->|
|<-- Results --------------|<-- Results -------------------|
Client -> PGProxy (port 5432) --[good queries]--> Fast Node (port 5433)
| |
| └-[bad queries]--> Shunt Node (port 5434)
|
├── Query Analysis (SELECT only)
├── Route decision cached
└── Writes/transactions blocked
In shunt mode:
- Good SELECT queries → routed to the fast (primary) node
- Bad SELECT queries → routed to the shunt (slow) node
- INSERT/UPDATE/DELETE → blocked entirely
- Transactions (BEGIN) → blocked entirely
The proxy intercepts PostgreSQL wire protocol messages, analyzes/transforms queries, and forwards them to the appropriate backend database.
- ARCHITECTURE.md - Detailed system architecture with diagrams
- CODE_REVIEW.md - Code review and quality analysis
- docs/authentication.md - JWT/OIDC authentication setup
- docs/masking-design.md - Data masking architecture
- docs/tls-design.md - TLS implementation details
- docs/metrics-guide.md - Query performance metrics and monitoring
- examples/masking_config.yaml - Masking configuration
- .env.example - Environment variables template