spreadsheet-cleaner

Clean a messy CSV / Excel file in one command — and get a report of exactly what changed.

Real spreadsheets are full of stray spaces, inconsistent headers, blank rows, duplicates, and numbers stored as text. spreadsheet-cleaner fixes all of that in one pass and writes a Changes sheet so you can see what it touched.

python -m spreadsheet_cleaner messy_contacts.csv -o clean.xlsx

✓ Wrote clean.xlsx  (4 rows kept)
  • headers normalized: 5
  • whitespace cells trimmed: 7
  • empty rows dropped: 1
  • empty columns dropped: 0
  • duplicate rows dropped: 1
  • columns retyped: 1

📚 Built to go with the step-by-step guides at python-excel-automation.com — practical Python + Excel tutorials covering pandas, openpyxl, data cleaning and reporting.

What it cleans

🏷️ Headers — trimmed, lower-cased, snake_cased, and de-duplicated.
✂️ Whitespace — leading/trailing spaces removed and runs of spaces collapsed.
🗑️ Empty rows & columns — fully-blank rows and columns dropped.
👯 Duplicate rows — exact duplicates removed.
🔢 Types — columns that are entirely numbers or dates stored as text are converted (mixed columns are left alone, so nothing is silently mangled).
🩹 Missing values — optionally filled with zero, the column mean, or ffill/bfill.
🧾 Change log — every Excel output gets a Changes sheet tallying the above.

Setup

Runs straight from a clone of the repository — there's nothing to install from a package index.

git clone https://github.com/python-excel-automation/spreadsheet-cleaner.git
cd spreadsheet-cleaner

It needs Python 3.9+ and two libraries — pandas and openpyxl (listed in requirements.txt). Make them available in your Python environment using whatever package manager you use, then run the tool from the repo folder.

Usage

# Clean a file (writes clean.xlsx with a Changes sheet)
python -m spreadsheet_cleaner messy.xlsx -o clean.xlsx

# Fill missing numbers with the column mean
python -m spreadsheet_cleaner data.csv --fill mean -o clean.xlsx

# Keep duplicates and original headers, write CSV out
python -m spreadsheet_cleaner data.csv --no-dedupe --no-normalize-headers -o clean.csv

Options

Flag	Description
`-o`, `--output`	Output path (default: `cleaned.xlsx`). CSV output skips the change log.
`--fill`	Missing-value strategy: `none`, `zero`, `mean`, `ffill`, `bfill` (default: `none`).
`--no-normalize-headers`	Leave column names as-is.
`--no-strip-whitespace`	Don't trim/collapse whitespace.
`--no-drop-empty`	Keep fully-empty rows and columns.
`--no-dedupe`	Keep duplicate rows.
`--no-coerce-types`	Don't convert text-numbers/dates.

Use it from Python

from spreadsheet_cleaner import clean_frame
import pandas as pd

result = clean_frame(pd.read_csv("messy.csv"))
print(result.changes)        # {'headers_normalized': 5, 'duplicate_rows_dropped': 1, ...}
result.frame.to_excel("clean.xlsx", index=False)

Try it

python -m spreadsheet_cleaner examples/messy_contacts.csv -o demo.xlsx

Develop

With the dependencies and pytest available, run the tests from the repo folder:

python -m pytest

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
spreadsheet_cleaner		spreadsheet_cleaner
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spreadsheet-cleaner

What it cleans

Setup

Usage

Options

Use it from Python

Try it

Develop

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spreadsheet-cleaner

What it cleans

Setup

Usage

Options

Use it from Python

Try it

Develop

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages