pgvector-rag

Lightweight RAG toolkit for PostgreSQL + pgvector. Zero runtime dependencies.

Extracted from a production RAG pipeline serving thousands of queries. Provides the algorithms and SQL you need without the framework lock-in.

Why this exists

	pgvector-rag	LangChain	LlamaIndex
Runtime deps	0	50+	40+
Bundle size	~15 KB	~2 MB	~1.5 MB
DB lock-in	pgvector only	Many adapters	Many adapters
Chunking	Built-in	Built-in	Built-in
Hybrid search SQL	Yes	No (needs driver)	No
MMR	Yes	Yes	Yes
RRF fusion	Yes	No	No
Bring your own DB client	Yes	No	No

Install

npm install pgvector-rag

Quick Start

1. Chunk a document

import { chunk } from 'pgvector-rag';

const chunks = chunk(documentText, {
  maxChunkChars: 1200,
  overlapChars: 250,
});

// chunks = [{ index: 0, content: '...', type: 'heading' }, ...]

2. Create the table

import { createChunksTableSQL, createIndexesSQL } from 'pgvector-rag/sql';
import pg from 'pg';

const pool = new pg.Pool({ connectionString: DATABASE_URL });

const { text: createTable } = createChunksTableSQL({ dimensions: 1536 });
await pool.query(createTable);

for (const { text } of createIndexesSQL()) {
  await pool.query(text);
}

3. Upsert chunks with embeddings

import { upsertChunksSQL } from 'pgvector-rag/sql';

const records = chunks.map((c, i) => ({
  id: crypto.randomUUID(),
  documentId: 'doc-123',
  chunkIndex: c.index,
  content: c.content,
  embedding: embeddings[i], // from your embedding API
  metadata: { chunk_type: c.type },
}));

const { text, params } = upsertChunksSQL(records);
await pool.query(text, params);

4. Search with hybrid SQL

import { hybridSearchSQL } from 'pgvector-rag/sql';
import { selectMMR, buildContext, normalizeScores } from 'pgvector-rag';

// Generate the search SQL
const { text, params } = hybridSearchSQL({
  documentId: 'doc-123',
  queryText: 'How does photosynthesis work?',
  embedding: queryEmbedding, // from your embedding API
  limit: 50,
});

// Execute with your DB client
const { rows } = await pool.query(text, params);

// Map to ScoredChunks
const scored = normalizeScores(rows.map(r => ({
  rrfScore: r.rrf_score,
  id: r.id,
  chunkIndex: r.chunk_index,
  content: r.content,
  embedding: r.embedding, // if you fetched it
})));

// Diversify with MMR
const selected = selectMMR(scored, 10, 0.7);

// Build context string for your LLM
const context = buildContext(selected, 5000);

API Reference

Chunking

`chunk(text, options?)`

Split text into chunks with section-awareness, sentence boundaries, and overlap.

chunk(text: string, options?: {
  maxChunkChars?: number;  // default: 1200
  overlapChars?: number;   // default: 250
  maxChunks?: number;      // default: Infinity
}): Chunk[]

`sanitizeText(text)`

Strip null bytes and control characters.

`detectChunkType(content)`

Classify a chunk as 'heading', 'list', or 'paragraph'.

Vector Math

`cosineSimilarity(a, b)`

Cosine similarity between two vectors. Returns [-1, 1].

`l2Normalize(vector)`

L2-normalize a vector to unit length. Returns a new array.

MMR (Maximal Marginal Relevance)

`selectMMR(candidates, k, lambda)`

Select k chunks balancing relevance and diversity.

lambda = 1.0 → pure relevance (no diversity)
lambda = 0.0 → pure diversity (ignore scores)
lambda = 0.7 → good default for QA
lambda = 0.5 → good default for summaries

Falls back to Jaccard token similarity when embeddings are absent.

Context Building

`buildContext(chunks, maxChars)`

Format chunks into an LLM context string. Sorts by index, adds --- gap separators, respects character budget.

Scoring

`normalizeScores(chunkRows)`

Convert raw ChunkRow objects (from hybrid search) into ScoredChunk objects.

`deduplicateByIndex(items)`

Keep highest-scoring entry per chunkIndex.

Summary Sampling

`selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)`

Pick one representative per document section for broad coverage.

Query Classification

`classifyQueryType(query)`

Regex-based classification: 'instructional', 'informational', or 'definitional'.

`isInstructionalQuery(query)`

Quick boolean check.

RRF (Reciprocal Rank Fusion)

`getRRFWeights(query, queryType?, config?)`

Get RRF signal weights tuned for the query type.

`getThresholds(queryType?, config?)`

Get similarity/BM25 thresholds for the query type.

Legacy Reranker

`legacyRerank(query, candidates, topN)`

Term-frequency + proximity reranker. Use as a fallback when a cross-encoder (Cohere, etc.) is unavailable.

Configuration

`createConfig(overrides?)`

Create a RAGConfig with sensible production defaults, optionally overriding specific values.

`DEFAULT_CONFIG`

Frozen default config with 25+ tuning knobs. See src/core/config.ts.

Concurrency

`new Semaphore(max)`

Counting semaphore for rate-limiting concurrent operations (e.g., embedding API calls).

const sem = new Semaphore(4);
await sem.acquire();
try { /* work */ } finally { sem.release(); }

SQL Generators (`pgvector-rag/sql`)

`hybridSearchSQL(options)`

3-CTE query combining vector similarity + BM25 + phrase matching via RRF.

`createChunksTableSQL(options?)`

CREATE TABLE with vector column, tsvector, and unique constraint.

`createIndexesSQL(options?)`

HNSW vector index + GIN text search index + document_id index.

`upsertChunksSQL(chunks, tableName?)`

Batch INSERT … ON CONFLICT with vector and jsonb casting.

`deleteChunksSQL(documentId, tableName?)`

DELETE all chunks for a document.

Using with ORMs

Knex

import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await knex.raw(text, params);

Drizzle

import { sql } from 'drizzle-orm';
import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await db.execute(sql.raw(text, ...params));

Prisma

import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await prisma.$queryRawUnsafe(text, ...params);

Configuration

Every algorithm is configurable via createConfig():

import { createConfig } from 'pgvector-rag';

const config = createConfig({
  rrfK: 100,          // RRF constant (default: 60)
  kQA: 15,            // Final chunks for QA (default: 10)
  kSummary: 20,       // Final chunks for summaries (default: 14)
  mmrLambdaQA: 0.8,   // MMR trade-off for QA (default: 0.7)
  simThreshold: 0.2,  // Minimum cosine similarity (default: 0.15)
});

Pass config to getRRFWeights() and getThresholds().

Coming Soon

Pipeline builder (createPipeline({ embedder, db }))
Embedder adapters (OpenAI, Cohere, HuggingFace)
Reranker adapters (Cohere cross-encoder, BGE)
Streaming chunk insertion
Chunk overlap deduplication

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

pgvector-rag

Why this exists

Install

Quick Start

1. Chunk a document

2. Create the table

3. Upsert chunks with embeddings

4. Search with hybrid SQL

API Reference

Chunking

chunk(text, options?)

sanitizeText(text)

detectChunkType(content)

Vector Math

cosineSimilarity(a, b)

l2Normalize(vector)

MMR (Maximal Marginal Relevance)

selectMMR(candidates, k, lambda)

Context Building

buildContext(chunks, maxChars)

Scoring

normalizeScores(chunkRows)

deduplicateByIndex(items)

Summary Sampling

selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)

Query Classification

classifyQueryType(query)

isInstructionalQuery(query)

RRF (Reciprocal Rank Fusion)

getRRFWeights(query, queryType?, config?)

getThresholds(queryType?, config?)

Legacy Reranker

legacyRerank(query, candidates, topN)

Configuration

createConfig(overrides?)

DEFAULT_CONFIG

Concurrency

new Semaphore(max)

SQL Generators (pgvector-rag/sql)

hybridSearchSQL(options)

createChunksTableSQL(options?)

createIndexesSQL(options?)

upsertChunksSQL(chunks, tableName?)

deleteChunksSQL(documentId, tableName?)

Using with ORMs

Knex

Drizzle

Prisma

Configuration

Coming Soon

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`chunk(text, options?)`

`sanitizeText(text)`

`detectChunkType(content)`

`cosineSimilarity(a, b)`

`l2Normalize(vector)`

`selectMMR(candidates, k, lambda)`

`buildContext(chunks, maxChars)`

`normalizeScores(chunkRows)`

`deduplicateByIndex(items)`

`selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)`

`classifyQueryType(query)`

`isInstructionalQuery(query)`

`getRRFWeights(query, queryType?, config?)`

`getThresholds(queryType?, config?)`

`legacyRerank(query, candidates, topN)`

`createConfig(overrides?)`

`DEFAULT_CONFIG`

`new Semaphore(max)`

SQL Generators (`pgvector-rag/sql`)

`hybridSearchSQL(options)`

`createChunksTableSQL(options?)`

`createIndexesSQL(options?)`

`upsertChunksSQL(chunks, tableName?)`

`deleteChunksSQL(documentId, tableName?)`

Packages