Skip to content

codewiser-io/pgvector-rag

Repository files navigation

pgvector-rag

Lightweight RAG toolkit for PostgreSQL + pgvector. Zero runtime dependencies.

Extracted from a production RAG pipeline serving thousands of queries. Provides the algorithms and SQL you need without the framework lock-in.

Why this exists

pgvector-rag LangChain LlamaIndex
Runtime deps 0 50+ 40+
Bundle size ~15 KB ~2 MB ~1.5 MB
DB lock-in pgvector only Many adapters Many adapters
Chunking Built-in Built-in Built-in
Hybrid search SQL Yes No (needs driver) No
MMR Yes Yes Yes
RRF fusion Yes No No
Bring your own DB client Yes No No

Install

npm install pgvector-rag

Quick Start

1. Chunk a document

import { chunk } from 'pgvector-rag';

const chunks = chunk(documentText, {
  maxChunkChars: 1200,
  overlapChars: 250,
});

// chunks = [{ index: 0, content: '...', type: 'heading' }, ...]

2. Create the table

import { createChunksTableSQL, createIndexesSQL } from 'pgvector-rag/sql';
import pg from 'pg';

const pool = new pg.Pool({ connectionString: DATABASE_URL });

const { text: createTable } = createChunksTableSQL({ dimensions: 1536 });
await pool.query(createTable);

for (const { text } of createIndexesSQL()) {
  await pool.query(text);
}

3. Upsert chunks with embeddings

import { upsertChunksSQL } from 'pgvector-rag/sql';

const records = chunks.map((c, i) => ({
  id: crypto.randomUUID(),
  documentId: 'doc-123',
  chunkIndex: c.index,
  content: c.content,
  embedding: embeddings[i], // from your embedding API
  metadata: { chunk_type: c.type },
}));

const { text, params } = upsertChunksSQL(records);
await pool.query(text, params);

4. Search with hybrid SQL

import { hybridSearchSQL } from 'pgvector-rag/sql';
import { selectMMR, buildContext, normalizeScores } from 'pgvector-rag';

// Generate the search SQL
const { text, params } = hybridSearchSQL({
  documentId: 'doc-123',
  queryText: 'How does photosynthesis work?',
  embedding: queryEmbedding, // from your embedding API
  limit: 50,
});

// Execute with your DB client
const { rows } = await pool.query(text, params);

// Map to ScoredChunks
const scored = normalizeScores(rows.map(r => ({
  rrfScore: r.rrf_score,
  id: r.id,
  chunkIndex: r.chunk_index,
  content: r.content,
  embedding: r.embedding, // if you fetched it
})));

// Diversify with MMR
const selected = selectMMR(scored, 10, 0.7);

// Build context string for your LLM
const context = buildContext(selected, 5000);

API Reference

Chunking

chunk(text, options?)

Split text into chunks with section-awareness, sentence boundaries, and overlap.

chunk(text: string, options?: {
  maxChunkChars?: number;  // default: 1200
  overlapChars?: number;   // default: 250
  maxChunks?: number;      // default: Infinity
}): Chunk[]

sanitizeText(text)

Strip null bytes and control characters.

detectChunkType(content)

Classify a chunk as 'heading', 'list', or 'paragraph'.

Vector Math

cosineSimilarity(a, b)

Cosine similarity between two vectors. Returns [-1, 1].

l2Normalize(vector)

L2-normalize a vector to unit length. Returns a new array.

MMR (Maximal Marginal Relevance)

selectMMR(candidates, k, lambda)

Select k chunks balancing relevance and diversity.

  • lambda = 1.0 → pure relevance (no diversity)
  • lambda = 0.0 → pure diversity (ignore scores)
  • lambda = 0.7 → good default for QA
  • lambda = 0.5 → good default for summaries

Falls back to Jaccard token similarity when embeddings are absent.

Context Building

buildContext(chunks, maxChars)

Format chunks into an LLM context string. Sorts by index, adds --- gap separators, respects character budget.

Scoring

normalizeScores(chunkRows)

Convert raw ChunkRow objects (from hybrid search) into ScoredChunk objects.

deduplicateByIndex(items)

Keep highest-scoring entry per chunkIndex.

Summary Sampling

selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)

Pick one representative per document section for broad coverage.

Query Classification

classifyQueryType(query)

Regex-based classification: 'instructional', 'informational', or 'definitional'.

isInstructionalQuery(query)

Quick boolean check.

RRF (Reciprocal Rank Fusion)

getRRFWeights(query, queryType?, config?)

Get RRF signal weights tuned for the query type.

getThresholds(queryType?, config?)

Get similarity/BM25 thresholds for the query type.

Legacy Reranker

legacyRerank(query, candidates, topN)

Term-frequency + proximity reranker. Use as a fallback when a cross-encoder (Cohere, etc.) is unavailable.

Configuration

createConfig(overrides?)

Create a RAGConfig with sensible production defaults, optionally overriding specific values.

DEFAULT_CONFIG

Frozen default config with 25+ tuning knobs. See src/core/config.ts.

Concurrency

new Semaphore(max)

Counting semaphore for rate-limiting concurrent operations (e.g., embedding API calls).

const sem = new Semaphore(4);
await sem.acquire();
try { /* work */ } finally { sem.release(); }

SQL Generators (pgvector-rag/sql)

hybridSearchSQL(options)

3-CTE query combining vector similarity + BM25 + phrase matching via RRF.

createChunksTableSQL(options?)

CREATE TABLE with vector column, tsvector, and unique constraint.

createIndexesSQL(options?)

HNSW vector index + GIN text search index + document_id index.

upsertChunksSQL(chunks, tableName?)

Batch INSERT … ON CONFLICT with vector and jsonb casting.

deleteChunksSQL(documentId, tableName?)

DELETE all chunks for a document.

Using with ORMs

Knex

import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await knex.raw(text, params);

Drizzle

import { sql } from 'drizzle-orm';
import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await db.execute(sql.raw(text, ...params));

Prisma

import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await prisma.$queryRawUnsafe(text, ...params);

Configuration

Every algorithm is configurable via createConfig():

import { createConfig } from 'pgvector-rag';

const config = createConfig({
  rrfK: 100,          // RRF constant (default: 60)
  kQA: 15,            // Final chunks for QA (default: 10)
  kSummary: 20,       // Final chunks for summaries (default: 14)
  mmrLambdaQA: 0.8,   // MMR trade-off for QA (default: 0.7)
  simThreshold: 0.2,  // Minimum cosine similarity (default: 0.15)
});

Pass config to getRRFWeights() and getThresholds().

Coming Soon

  • Pipeline builder (createPipeline({ embedder, db }))
  • Embedder adapters (OpenAI, Cohere, HuggingFace)
  • Reranker adapters (Cohere cross-encoder, BGE)
  • Streaming chunk insertion
  • Chunk overlap deduplication

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors