Clean a messy CSV / Excel file in one command — and get a report of exactly what changed.
Real spreadsheets are full of stray spaces, inconsistent headers, blank rows, duplicates,
and numbers stored as text. spreadsheet-cleaner fixes all of that in one pass and writes
a Changes sheet so you can see what it touched.
python -m spreadsheet_cleaner messy_contacts.csv -o clean.xlsx✓ Wrote clean.xlsx (4 rows kept)
• headers normalized: 5
• whitespace cells trimmed: 7
• empty rows dropped: 1
• empty columns dropped: 0
• duplicate rows dropped: 1
• columns retyped: 1
📚 Built to go with the step-by-step guides at python-excel-automation.com — practical Python + Excel tutorials covering pandas, openpyxl, data cleaning and reporting.
- 🏷️ Headers — trimmed, lower-cased,
snake_cased, and de-duplicated. - ✂️ Whitespace — leading/trailing spaces removed and runs of spaces collapsed.
- 🗑️ Empty rows & columns — fully-blank rows and columns dropped.
- 👯 Duplicate rows — exact duplicates removed.
- 🔢 Types — columns that are entirely numbers or dates stored as text are converted (mixed columns are left alone, so nothing is silently mangled).
- 🩹 Missing values — optionally filled with
zero, the columnmean, orffill/bfill. - 🧾 Change log — every Excel output gets a
Changessheet tallying the above.
Runs straight from a clone of the repository — there's nothing to install from a package index.
git clone https://github.com/python-excel-automation/spreadsheet-cleaner.git
cd spreadsheet-cleanerIt needs Python 3.9+ and two libraries — pandas and openpyxl (listed in
requirements.txt). Make them available in your Python environment using whatever package
manager you use, then run the tool from the repo folder.
# Clean a file (writes clean.xlsx with a Changes sheet)
python -m spreadsheet_cleaner messy.xlsx -o clean.xlsx
# Fill missing numbers with the column mean
python -m spreadsheet_cleaner data.csv --fill mean -o clean.xlsx
# Keep duplicates and original headers, write CSV out
python -m spreadsheet_cleaner data.csv --no-dedupe --no-normalize-headers -o clean.csv| Flag | Description |
|---|---|
-o, --output |
Output path (default: cleaned.xlsx). CSV output skips the change log. |
--fill |
Missing-value strategy: none, zero, mean, ffill, bfill (default: none). |
--no-normalize-headers |
Leave column names as-is. |
--no-strip-whitespace |
Don't trim/collapse whitespace. |
--no-drop-empty |
Keep fully-empty rows and columns. |
--no-dedupe |
Keep duplicate rows. |
--no-coerce-types |
Don't convert text-numbers/dates. |
from spreadsheet_cleaner import clean_frame
import pandas as pd
result = clean_frame(pd.read_csv("messy.csv"))
print(result.changes) # {'headers_normalized': 5, 'duplicate_rows_dropped': 1, ...}
result.frame.to_excel("clean.xlsx", index=False)python -m spreadsheet_cleaner examples/messy_contacts.csv -o demo.xlsxWith the dependencies and pytest available, run the tests from the repo folder:
python -m pytest