Ghost writes a copy of a PDF with metadata values removed.
The default output path is the original filename with _SANITIZED appended before the
.pdf extension.
Ghost removes values for the following common metadata fields:
- /Author
- /Producer
- /Title
- /Subject
- /Creator
- /Keywords
- /CreationDate
- /ModDate
- /Trapped
- /PTEX.Fullbanner
Any other metadata keys found in the PDF are also blanked in the sanitized copy.
This project was written for Python 3.12 and uses PyPDF2.
Install dependencies with:
python3 -m pip install -r requirements.txtpython3 ghost.py path/to/file.pdfChoose an explicit output path:
python3 ghost.py path/to/file.pdf --output path/to/clean.pdfPrint metadata before sanitizing:
python3 ghost.py path/to/file.pdf --verboseGhost will not overwrite an existing output file unless --overwrite is supplied.
