PGProxy

A PostgreSQL proxy with query analysis and data masking capabilities.

Features

JWT/OIDC Authentication: Authenticate clients via Keycloak or Azure Entra ID JWT tokens instead of database passwords
Query Analysis: AI-powered query efficiency analysis via Portkey LLM gateway (Claude on AWS Bedrock)
Data Masking: Automatic masking of sensitive columns in SELECT query results
Three Proxy Modes:
- blocking - Block inefficient queries
- non-blocking - Log issues but allow all queries through
- shunt - Route bad queries to a separate "slow" database node
Config Generator: Auto-generate masking configuration by scanning database schema with AI
Audit Trail: JWT user identity injected into application_name for backend visibility
RDS / Aurora IAM Authentication: Token-based auth with automatic refresh and cross-account STS support
ElastiCache / Valkey IAM Authentication: AWS SigV4 token auth with automatic refresh
Client TLS: Optional TLS termination with configurable enforcement; filters SCRAM-SHA-256-PLUS channel binding appropriately
Redshift Support: Specialized AI agent with Redshift system catalogs (stv_*, svv_*)
Health Checks: HTTP /health and /ready endpoints for Kubernetes / NLB probes
Metrics: Per-query LLM, backend, and total latency recorded to query_metrics table
TCP Keepalives: Prevents silent connection drops through NAT gateways and load balancers

Prerequisites

Go 1.26+
PostgreSQL database
Docker (for local development)
AWS credentials (for AI features)

Quick Start

1. Start the test infrastructure

docker compose up -d postgres-fast postgres-shunt postgres-reporter valkey

This starts:

PostgreSQL "fast" node on port 5433 (primary database)
PostgreSQL "shunt" node on port 5434 (for rejected queries in shunt mode)
PostgreSQL "reporter" node on port 5435 (stores query_issues + query_metrics)
Valkey on port 6379 (query decision cache)

2. Configure environment

Copy and edit the environment file:

cp .env.example .env
# Edit .env with your settings

Key configuration options:

Variable	Description	Default
`LISTEN_ADDR`	Proxy listen address	`:5432`
`HEALTH_CHECK_ADDR`	HTTP health check endpoint address	`:8080`
`PG_BACKEND_URL`	Backend PostgreSQL URL	required
`SHUNT_NODE_URL`	Slow node URL (required for shunt mode)	-
`PROXY_MODE`	`blocking`, `non-blocking`, or `shunt`	`blocking`
`MASKING_ENABLED`	Enable data masking	`false`
`MASKING_CONFIG_PATH`	Path to masking config	`./masking_config.yaml`
`ANALYSIS_ENABLED`	Enable AI query analysis	`false`
`CONFIG_GENERATOR_MODE`	Run config generator instead of proxy	`false`
`OIDC_AUTH`	Enable JWT/OIDC authentication	`false`
`OIDC_DISCOVERY_URL`	OIDC discovery endpoint URL	-
`OIDC_ISSUER_OVERRIDE`	Override expected issuer claim (for EntraID)	-
`OIDC_AUDIENCE`	Expected JWT audience (comma-separated)	-
`OIDC_ALLOWED_ROLES`	Required roles (comma-separated, empty = no role check)	-
`OIDC_CERT_PATH`	Custom CA certificate for OIDC provider	-
`TLS_ENABLED`	Enable TLS for client connections	`false`
`TLS_CERT_PATH`	Path to TLS certificate	-
`TLS_KEY_PATH`	Path to TLS private key	-
`REQUIRE_CLIENT_TLS`	Reject non-TLS client connections	`false`
`PG_IAM_AUTH`	Enable RDS IAM token authentication	`false`
`PG_IAM_USERNAME`	Database user with `rds_iam` role	-
`PG_AWS_REGION`	AWS region for RDS IAM tokens	-
`VALKEY_IAM_AUTH`	Enable ElastiCache/Valkey IAM authentication	`false`
`VALKEY_URL`	Valkey/Redis URL for query cache	-
`METADATA_CACHE_TTL`	TTL for schema metadata cache	`1h`
`PORTKEY_BASE_URL`	Portkey LLM gateway URL	-
`PORTKEY_API_KEY`	Portkey API key	-
`PORTKEY_VIRTUAL_KEY`	Portkey virtual key	-
`BEDROCK_MODEL_ID`	Claude model ID via Bedrock	`eu.anthropic.claude-sonnet-4-20250514-v1:0`
`PG_BACKEND_CA_PATH`	PEM CA bundle for backend TLS verification (`verify-full`/`verify-ca`). Use AWS RDS CA bundle for RDS IAM deployments.	-
`MAX_CONNS_PER_IP_PER_MIN`	Max new connections per source IP per minute (0 = disabled)	`60`
`MAX_MESSAGE_SIZE_MB`	Max PostgreSQL protocol message size in MiB	`4`
`ALLOW_MD5_AUTH`	Allow MD5 password authentication (deprecated since PostgreSQL 10)	`false`
`STORE_RAW_QUERY`	Store raw SQL text in the reporter (default: fingerprint only)	`false`
`VALKEY_TLS_SKIP_VERIFY`	Disable Valkey TLS certificate verification (dev only)	`false`
`OIDC_ALLOWED_DATABASES`	Comma-separated list of databases JWT users may access (empty = allow all — not recommended)	-

3. Build and run

# Build
go build -o pgproxy ./cmd/pgproxy

# Run proxy
./pgproxy

4. Connect through the proxy

PGPASSWORD=testpass psql -h localhost -p 5432 -U testuser -d testdb

Health Checks

PGProxy exposes HTTP endpoints on port 8080 (configurable via HEALTH_CHECK_ADDR) for load balancer health checks:

GET /health - Liveness probe. Returns 200 OK if the process is running.
GET /ready - Readiness probe. Returns 200 OK if the database connection is healthy, 503 Service Unavailable otherwise.

AWS NLB Configuration

For AWS Network Load Balancer health checks:

resource "aws_lb_target_group" "pgproxy" {
  port        = 5432
  protocol    = "TCP"
  target_type = "ip"
  vpc_id      = var.vpc_id

  health_check {
    enabled             = true
    protocol            = "HTTP"
    port                = 8080
    path                = "/ready"
    interval            = 30
    timeout             = 5
    healthy_threshold   = 2
    unhealthy_threshold = 2
  }
}

Testing Health Endpoints

# Liveness check
curl http://localhost:8080/health

# Readiness check
curl http://localhost:8080/ready

JWT Authentication

PGProxy supports JWT/OIDC authentication via Keycloak (or any OIDC-compliant provider). When enabled, clients authenticate using a JWT token instead of the database password.

How It Works

Client connects to the proxy and sends a startup message (with any username)
Proxy requests a password (cleartext)
Client sends the JWT token as the password
Proxy validates the token against the OIDC provider (signature, expiry, audience, roles)
If valid, proxy replaces the client's username/database with the service account credentials from PG_BACKEND_URL
Proxy authenticates to the backend using the service account credentials
The JWT user's email is injected into application_name for audit visibility

Configuration

# Enable authentication
OIDC_AUTH=true

# OIDC discovery endpoint
# Keycloak: https://keycloak.example.com/auth/realms/myrealm/.well-known/openid-configuration
# EntraID:  https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration
OIDC_DISCOVERY_URL=https://keycloak.example.com/auth/realms/myrealm/.well-known/openid-configuration

# Override issuer validation (optional)
# Required for EntraID where discovery URL differs from token issuer
# EntraID example: https://sts.windows.net/{tenant-id}/
OIDC_ISSUER_OVERRIDE=

# Expected audience in the JWT
OIDC_AUDIENCE=my-app

# Required roles (comma-separated, leave empty to skip role check)
OIDC_ALLOWED_ROLES=db_reader,analyst

# Custom CA certificate (if using internal CA)
OIDC_CERT_PATH=./certs/ca-bundle.pem

Connecting with JWT

Use the JWT token as the password when connecting:

# Using psql
PGPASSWORD="eyJhbGciOiJSUzI1NiIs..." psql -h localhost -p 5432 -U anyuser -d anydb

# Using connection string
psql "postgres://anyuser:eyJhbGciOiJSUzI1NiIs...@localhost:5432/anydb"

Important: The username and database in the client connection are ignored when JWT auth is enabled. The proxy replaces them with the service account credentials from PG_BACKEND_URL (e.g., postgres://pgproxy_service:secret@backend:5433/production). The actual user identity comes from the JWT token's email or upn claim.

Audit Trail (application_name)

When JWT authentication is enabled, the proxy automatically sets the backend connection's application_name to the authenticated user's email, prefixed with jwt::

application_name = jwt:user@example.com

This provides visibility into who is running queries:

PostgreSQL:

SELECT usename, application_name, query
FROM pg_stat_activity
WHERE application_name LIKE 'jwt:%';

Redshift:

SELECT userid, application_name, querytxt
FROM stv_recents
WHERE application_name LIKE 'jwt:%';

This works with both PostgreSQL and Redshift backends, enabling DBAs to see which JWT user is executing each query in real-time.

Supported Authentication Methods

The proxy handles backend authentication automatically using credentials from PG_BACKEND_URL:

Backend Auth Method	Supported
Trust	Yes
Cleartext Password	Yes
MD5	Yes (client-side requires `ALLOW_MD5_AUTH=true`)
SCRAM-SHA-256	Yes
SCRAM-SHA-256-PLUS	Yes (via pgx v5.9+)
RDS IAM Token	Yes (`PG_IAM_AUTH=true`)

Data Masking

Configuration

Create a masking_config.yaml file:

version: "1.0"
default_schema: public
unknown_table_behavior: passthrough  # or llm_inference

tables:
  - schema: public
    table: users
    columns:
      - name: email
        mask_type: email
      - name: password_hash
        mask_type: full
      - name: username
        mask_type: partial
        params:
          visible_chars: 3

Mask Types

Type	Description	Example
`full`	Complete redaction	`*MASKED*`
`partial`	Show first N characters	`joh***`
`email`	Mask email preserving domain	`j***@example.com`
`hash`	SHA-256 hash (deterministic)	`a1b2c3...`

Auto-Generate Config

Use the config generator to scan your database and create a masking config:

# Set environment
export CONFIG_GENERATOR_MODE=true
export CONFIG_GENERATOR_SCHEMAS=public  # or "all" for all schemas
export MASKING_CONFIG_PATH=./masking_config.yaml

# Run generator
./pgproxy

The generator:

Scans all tables in specified schemas
Queries a sample row from each table
Uses AI to identify sensitive columns
Outputs YAML config with confidence comments for review

Load Testing

Build and run the load generator:

# Build
go build -o loadgen ./cmd/loadgen

# Run against proxy (port 5432)
./loadgen -conn "postgres://testuser:testpass@localhost:5432/testdb?sslmode=disable" \
  -workers 10 -duration 30s

# Run against direct database (port 5433) for comparison
./loadgen -conn "postgres://testuser:testpass@localhost:5433/testdb?sslmode=disable" \
  -workers 10 -duration 30s

Options:

-conn: PostgreSQL connection string
-workers: Number of concurrent workers (default: 10)
-duration: Test duration (default: 30s)
-rampup: Ramp-up time (default: 5s)

Development

Quick Commands (Makefile)

The project includes a Makefile for common development tasks:

# Run all tests (unit + BDD, excludes @real-bedrock tests)
make test

# Run only unit tests (no Docker needed)
make test-unit

# Run only BDD integration tests
make test-bdd

# Run ALL tests including real AWS Bedrock tests (requires AWS creds, costs money)
make test-all

# Generate combined test coverage report (unit + BDD)
make coverage

# Start test infrastructure (PostgreSQL + Valkey)
make infra-up

# Stop and remove test infrastructure
make infra-down

# Clean up coverage files and test cache
make clean

# Show all available targets
make help

Pre-commit workflow:

# Before committing, run the full test suite
make test

# Or run coverage to see code coverage
make coverage

Project Structure

pgproxy/
├── cmd/
│   ├── pgproxy/       # Main proxy binary
│   └── loadgen/       # Load testing tool
├── internal/
│   ├── agent/         # AI/Bedrock client
│   ├── auth/          # JWT/OIDC authentication (Keycloak)
│   ├── cache/         # Query approval cache
│   ├── config/        # Configuration
│   ├── masking/       # Data masking logic
│   ├── metadata/      # Database metadata cache
│   ├── proxy/         # PostgreSQL proxy
│   └── reporter/      # Issue reporting (query_issues) + metrics (query_metrics)
├── docker-compose.yml # Local PostgreSQL
├── init.sql           # Test data
└── masking_config.yaml

Running Tests

Unit Tests

# Run all tests
go test ./...

# Run tests with verbose output
go test -v ./...

# Run tests for a specific package
go test -v ./internal/masking/...
go test -v ./internal/proxy/...
go test -v ./internal/config/...

# Run specific test by name
go test -v ./internal/masking/... -run TestQuoteIdentifier

Integration Tests (BDD)

We use godog for BDD-style integration tests. These tests spin up the actual proxy binary and run real queries against PostgreSQL.

Prerequisites:

Docker (for test databases)
The test databases running

1. Start the test infrastructure:

docker compose up -d postgres-fast postgres-shunt postgres-reporter valkey

This starts:

postgres-fast (port 5433) - Primary "fast" database
postgres-shunt (port 5434) - "Penalty box" for bad queries
postgres-reporter (port 5435) - Stores query issues & performance metrics
valkey (port 6379) - Cache for query decisions

Wait ~10 seconds for containers to be healthy:

docker compose ps

2. Run the tests:

# Run all tests EXCEPT real Bedrock (no AWS costs)
INTEGRATION_TESTS=true GODOG_TAGS=~@real-bedrock go test ./tests/... -v

# Run ALL tests including real Bedrock (requires AWS creds, costs money)
INTEGRATION_TESTS=true go test ./tests/... -v

3. Run specific test suites:

# Just blocking mode tests
INTEGRATION_TESTS=true GODOG_TAGS=@blocking go test ./tests/... -v

# Just shunt mode tests
INTEGRATION_TESTS=true GODOG_TAGS=@shunt go test ./tests/... -v

# Just masking tests
INTEGRATION_TESTS=true GODOG_TAGS=@masking go test ./tests/... -v

# Just real Bedrock tests (verifies AI analysis works)
INTEGRATION_TESTS=true GODOG_TAGS=@real-bedrock go test ./tests/... -v

4. Generate test reports:

# JUnit XML report (for CI systems)
INTEGRATION_TESTS=true GODOG_FORMAT=junit GODOG_TAGS=~@real-bedrock go test ./tests/... -v 2>&1 | tee test-report.xml

# Cucumber JSON report
INTEGRATION_TESTS=true GODOG_FORMAT=cucumber GODOG_TAGS=~@real-bedrock go test ./tests/... -v > test-report.json

5. Cleanup:

docker compose down -v

Test structure:

tests/
├── features/
│   ├── analysis/
│   │   ├── blocking.feature      # Blocking mode scenarios
│   │   ├── non_blocking.feature  # Non-blocking mode scenarios
│   │   ├── shunt.feature         # Shunt mode scenarios
│   │   └── real_bedrock.feature  # Real AWS Bedrock tests
│   ├── masking/
│   │   └── masking.feature       # Data masking scenarios
│   └── config_generation/
│       └── config.feature        # Config validation
├── steps/
│   └── common_steps.go           # Step definitions
├── support/
│   └── context.go                # Test helpers & mock Bedrock
└── integration_test.go           # Test runner

Writing new tests:

Tests use Gherkin syntax. Example:

@shunt @cartesian
Scenario: Cartesian product goes to shunt node
  Given the proxy is running in "shunt" mode
  And I am connected to the database through the proxy
  And the AI will reject queries containing "users, orders" with reason "Cartesian product"
  When I execute the query "SELECT * FROM users, orders"
  Then the query should succeed
  And the query should be routed to the shunt node

Mutation Testing

We use gremlins for mutation testing to verify test quality.

# Install gremlins
go install github.com/go-gremlins/gremlins/cmd/gremlins@latest

# Dry run (shows what mutations would be tested)
~/go/bin/gremlins unleash --dry-run ./internal/masking

# Run mutation tests (with longer timeout for reliability)
~/go/bin/gremlins unleash --timeout-coefficient 10 ./internal/masking
~/go/bin/gremlins unleash --timeout-coefficient 10 ./internal/config
~/go/bin/gremlins unleash --timeout-coefficient 10 ./internal/proxy

Mutation Testing Results Explained:

Killed: Tests caught the mutation (good)
Lived: Tests didn't catch the mutation (test gap)
Not Covered: Code not covered by tests
Test Efficacy: Percentage of covered mutations caught by tests

Target: >75% test efficacy, >50% mutator coverage.

Test Coverage

# Generate unit test coverage report
go test -coverprofile=coverage.out ./...

# View coverage in browser
go tool cover -html=coverage.out

# View coverage by function
go tool cover -func=coverage.out

Combined Coverage (Unit + BDD):

The BDD tests exercise the proxy binary end-to-end. To measure coverage from BDD tests:

# Create coverage directory
mkdir -p coverage/bdd

# Run BDD tests with instrumented binary
GOCOVERDIR=$(pwd)/coverage/bdd INTEGRATION_TESTS=true GODOG_TAGS=~@real-bedrock go test ./tests/... -v

# Convert to text format
go tool covdata textfmt -i=coverage/bdd -o=coverage/bdd.out

# View BDD coverage
go tool cover -func=coverage/bdd.out

The unit tests and BDD tests provide complementary coverage:

Unit tests: Test pure functions, mocked agent loops, auth validation
BDD tests: Test proxy startup, wire protocol, masking pipeline, query routing

Environment Variables

See .env for all available configuration options.

Architecture

Standard Mode (blocking/non-blocking)

Client -> PGProxy (port 5432) -> PostgreSQL (port 5433)
              |
              ├── JWT Authentication (optional)
              ├── Query Analysis (SELECT only)
              ├── Data Masking (optional)
              └── Query Blocking (blocking mode only)

With JWT Authentication

Client                    PGProxy                         Backend
  |                          |                               |
  |--- Startup Message ----->|                               |
  |<-- Password Request -----|                               |
  |--- JWT Token ----------->|                               |
  |                          |-- Validate JWT (Keycloak) --> |
  |                          |<- Token Valid ----------------|
  |                          |                               |
  |                          |-- Startup (user from URL) --->|
  |                          |   application_name=jwt:email  |
  |                          |<- Auth Request (SCRAM/MD5) ---|
  |                          |-- Auth Response ------------->|
  |                          |<- Auth OK --------------------|
  |<-- Auth OK --------------|                               |
  |                          |                               |
  |--- Query --------------->|--- Query -------------------->|
  |<-- Results --------------|<-- Results -------------------|

Shunt Mode

Client -> PGProxy (port 5432) --[good queries]--> Fast Node (port 5433)
              |                |
              |                └-[bad queries]--> Shunt Node (port 5434)
              |
              ├── Query Analysis (SELECT only)
              ├── Route decision cached
              └── Writes/transactions blocked

In shunt mode:

Good SELECT queries → routed to the fast (primary) node
Bad SELECT queries → routed to the shunt (slow) node
INSERT/UPDATE/DELETE → blocked entirely
Transactions (BEGIN) → blocked entirely

The proxy intercepts PostgreSQL wire protocol messages, analyzes/transforms queries, and forwards them to the appropriate backend database.

Documentation

Core Documentation

ARCHITECTURE.md - Detailed system architecture with diagrams
CODE_REVIEW.md - Code review and quality analysis

Technical Guides

docs/authentication.md - JWT/OIDC authentication setup
docs/masking-design.md - Data masking architecture
docs/tls-design.md - TLS implementation details
docs/metrics-guide.md - Query performance metrics and monitoring

Configuration Examples

examples/masking_config.yaml - Masking configuration
.env.example - Environment variables template

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
cmd		cmd
demo		demo
docs		docs
examples		examples
internal		internal
migrations		migrations
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
VERSION		VERSION
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
init.sql		init.sql
masking_config.yaml		masking_config.yaml

Folders and files

Latest commit

History

Repository files navigation

PGProxy

Features

Prerequisites

Quick Start

1. Start the test infrastructure

2. Configure environment

3. Build and run

4. Connect through the proxy

Health Checks

AWS NLB Configuration

Testing Health Endpoints

JWT Authentication

How It Works

Configuration

Connecting with JWT

Audit Trail (application_name)

Supported Authentication Methods

Data Masking

Configuration

Mask Types

Auto-Generate Config

Load Testing

Development

Quick Commands (Makefile)

Project Structure

Running Tests

Unit Tests

Integration Tests (BDD)

Mutation Testing

Test Coverage

Environment Variables

Architecture

Standard Mode (blocking/non-blocking)

With JWT Authentication

Shunt Mode

Documentation

Core Documentation

Technical Guides

Configuration Examples

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages