Skip to content

strfry/brief

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

brief

Document-OCR toolkit: convert PDFs to layout-true text.

Two rendering paths, same output:

pdfminer — fast, perfect, for digital PDFs with embedded text:

.venv/bin/pdf2txt.py -t xml file.pdf | ./xml2txt

Tesseract — for scanned/image-based PDFs:

./pdf2txt file.pdf

Install

git clone <repo>
cd brief
uv sync

System dependencies for the Tesseract path: pdftocairo, tesseract (with deu), parallel.

Commands

OCR:

Tool Input Output Use case
xml2txt pdfminer XML (stdin/file) layout-true text Digital PDFs, instant
pdf2txt PDF file layout-true text Scanned PDFs via OCR
pdf2tsv.sh PDF file Tesseract TSV cache Bulk OCR, reuse results

Briefversand:

Tool
lxp send file.pdf Brief via LetterXpress versenden
lxp balance Guthaben abrufen
lxp jobs Aufträge auflisten
lxp status <id> Status eines Auftrags
lxp cancel <id> Auftrag stornieren
lxp price --pages 2 Preis berechnen

Auth aus .env (LETTERXPRESS_USERNAME, LETTERXPRESS_MODE) und .secret.lxp (API-Key).

Skills

OpenCode skills in skills/:

  • dokument-ocr — Full OCR pipeline documentation
  • brief-scannen — Brother ADF scanner instructions

License

MIT

About

Document toolkit: OCR PDFs to layout-true text (pdfminer/Tesseract) and send letters via LetterXpress API (lxp CLI)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors