Lightweight RAG toolkit for PostgreSQL + pgvector. Zero runtime dependencies.
Extracted from a production RAG pipeline serving thousands of queries. Provides the algorithms and SQL you need without the framework lock-in.
| pgvector-rag | LangChain | LlamaIndex | |
|---|---|---|---|
| Runtime deps | 0 | 50+ | 40+ |
| Bundle size | ~15 KB | ~2 MB | ~1.5 MB |
| DB lock-in | pgvector only | Many adapters | Many adapters |
| Chunking | Built-in | Built-in | Built-in |
| Hybrid search SQL | Yes | No (needs driver) | No |
| MMR | Yes | Yes | Yes |
| RRF fusion | Yes | No | No |
| Bring your own DB client | Yes | No | No |
npm install pgvector-ragimport { chunk } from 'pgvector-rag';
const chunks = chunk(documentText, {
maxChunkChars: 1200,
overlapChars: 250,
});
// chunks = [{ index: 0, content: '...', type: 'heading' }, ...]import { createChunksTableSQL, createIndexesSQL } from 'pgvector-rag/sql';
import pg from 'pg';
const pool = new pg.Pool({ connectionString: DATABASE_URL });
const { text: createTable } = createChunksTableSQL({ dimensions: 1536 });
await pool.query(createTable);
for (const { text } of createIndexesSQL()) {
await pool.query(text);
}import { upsertChunksSQL } from 'pgvector-rag/sql';
const records = chunks.map((c, i) => ({
id: crypto.randomUUID(),
documentId: 'doc-123',
chunkIndex: c.index,
content: c.content,
embedding: embeddings[i], // from your embedding API
metadata: { chunk_type: c.type },
}));
const { text, params } = upsertChunksSQL(records);
await pool.query(text, params);import { hybridSearchSQL } from 'pgvector-rag/sql';
import { selectMMR, buildContext, normalizeScores } from 'pgvector-rag';
// Generate the search SQL
const { text, params } = hybridSearchSQL({
documentId: 'doc-123',
queryText: 'How does photosynthesis work?',
embedding: queryEmbedding, // from your embedding API
limit: 50,
});
// Execute with your DB client
const { rows } = await pool.query(text, params);
// Map to ScoredChunks
const scored = normalizeScores(rows.map(r => ({
rrfScore: r.rrf_score,
id: r.id,
chunkIndex: r.chunk_index,
content: r.content,
embedding: r.embedding, // if you fetched it
})));
// Diversify with MMR
const selected = selectMMR(scored, 10, 0.7);
// Build context string for your LLM
const context = buildContext(selected, 5000);Split text into chunks with section-awareness, sentence boundaries, and overlap.
chunk(text: string, options?: {
maxChunkChars?: number; // default: 1200
overlapChars?: number; // default: 250
maxChunks?: number; // default: Infinity
}): Chunk[]Strip null bytes and control characters.
Classify a chunk as 'heading', 'list', or 'paragraph'.
Cosine similarity between two vectors. Returns [-1, 1].
L2-normalize a vector to unit length. Returns a new array.
Select k chunks balancing relevance and diversity.
lambda = 1.0→ pure relevance (no diversity)lambda = 0.0→ pure diversity (ignore scores)lambda = 0.7→ good default for QAlambda = 0.5→ good default for summaries
Falls back to Jaccard token similarity when embeddings are absent.
Format chunks into an LLM context string. Sorts by index, adds --- gap separators, respects character budget.
Convert raw ChunkRow objects (from hybrid search) into ScoredChunk objects.
Keep highest-scoring entry per chunkIndex.
Pick one representative per document section for broad coverage.
Regex-based classification: 'instructional', 'informational', or 'definitional'.
Quick boolean check.
Get RRF signal weights tuned for the query type.
Get similarity/BM25 thresholds for the query type.
Term-frequency + proximity reranker. Use as a fallback when a cross-encoder (Cohere, etc.) is unavailable.
Create a RAGConfig with sensible production defaults, optionally overriding specific values.
Frozen default config with 25+ tuning knobs. See src/core/config.ts.
Counting semaphore for rate-limiting concurrent operations (e.g., embedding API calls).
const sem = new Semaphore(4);
await sem.acquire();
try { /* work */ } finally { sem.release(); }3-CTE query combining vector similarity + BM25 + phrase matching via RRF.
CREATE TABLE with vector column, tsvector, and unique constraint.
HNSW vector index + GIN text search index + document_id index.
Batch INSERT … ON CONFLICT with vector and jsonb casting.
DELETE all chunks for a document.
import { hybridSearchSQL } from 'pgvector-rag/sql';
const { text, params } = hybridSearchSQL({ ... });
const rows = await knex.raw(text, params);import { sql } from 'drizzle-orm';
import { hybridSearchSQL } from 'pgvector-rag/sql';
const { text, params } = hybridSearchSQL({ ... });
const rows = await db.execute(sql.raw(text, ...params));import { hybridSearchSQL } from 'pgvector-rag/sql';
const { text, params } = hybridSearchSQL({ ... });
const rows = await prisma.$queryRawUnsafe(text, ...params);Every algorithm is configurable via createConfig():
import { createConfig } from 'pgvector-rag';
const config = createConfig({
rrfK: 100, // RRF constant (default: 60)
kQA: 15, // Final chunks for QA (default: 10)
kSummary: 20, // Final chunks for summaries (default: 14)
mmrLambdaQA: 0.8, // MMR trade-off for QA (default: 0.7)
simThreshold: 0.2, // Minimum cosine similarity (default: 0.15)
});Pass config to getRRFWeights() and getThresholds().
- Pipeline builder (
createPipeline({ embedder, db })) - Embedder adapters (OpenAI, Cohere, HuggingFace)
- Reranker adapters (Cohere cross-encoder, BGE)
- Streaming chunk insertion
- Chunk overlap deduplication
MIT