ruminate

Chew over your tweet archive.

A toolkit for building, slicing, and analysing a Twitter/X archive across multiple input formats. Format-agnostic by design — drop in a new ingest source and the rest of the pipeline works unchanged.

What it does

Pipeline — turn raw archive data into useful files:

collate — read an input source, dedupe by tweet ID, sort chronologically, and write tweets.json, tweets.md, and tweets-llm.txt
slice — split tweets-llm.txt into per-month files and bundle into tweets-llm.zip
pipeline — do both in one step

Analysis — explore the archive:

cluster — classify tweets into topic clusters (regex + fuzzy keyword matching, configurable via clusters.py)
search — find tweets by plain text, fuzzy match, or /regex/
month — read a single month with topic breakdown
stats — totals by year/month, tweet types, top mentions, top hashtags
debug — explain why a specific tweet matched (or didn't)

Supported Input Sources

Currently supported:

IFTTT xlsx archives (ifttt-xlsx) — The most common format. Reads tweets archived to Google Sheets / Google Drive via IFTTT recipes. Works with single .xlsx files, folders of them, or a .zip containing .xlsx files.

This tool was specifically built to merge outputs from multiple IFTTT recipes and has been tested on archives of ~80k tweets.

Recommended IFTTT applets:

Planned (not yet implemented):

Official Twitter/X archive export (data/tweets.js)
community-archive.org JSON format

Adding a new format is intended to be straightforward — see docs/adding-a-source.md.

Quick start

git clone https://github.com/hivesong/ruminate
cd ruminate
bash setup.sh

# Launch the interactive menu
./ruminate

# Or run individual commands
./ruminate pipeline path/to/archive.zip
./ruminate cluster
./ruminate search "your search term"
./ruminate stats
./ruminate --help

setup.sh creates a local ./venv and installs dependencies there — your system Python and global packages are not touched.

Please note that if you want to work on this repo you need to use the SSH method. Assuming your keys are set up properly you will want to use this line to clone the repo instead:

git clone git@github.com:hivesong/ruminate.git

Requirements

Python 3.10+
openpyxl (for xlsx ingest)
rapidfuzz (for fuzzy keyword matching)

Both are installed automatically into the local venv by setup.sh.

Project layout

ruminate/
├── ruminate                       # bash launcher (top-level command)
├── setup.sh                       # one-time installer
├── clusters.py                    # ← edit this to customise topic clusters
├── pyproject.toml
├── requirements.txt
├── src/
│   └── ruminate/
│       ├── cli.py                 # menu + argparse dispatch
│       ├── tweet.py               # canonical Tweet dataclass
│       ├── parser.py              # tweets-llm.txt reader
│       ├── classifier.py          # regex + fuzzy
│       ├── output.py              # all file writers
│       ├── ingest/                # pluggable input formats
│       │   ├── base.py            # abstract Source class
│       │   └── ifttt_xlsx.py      # IFTTT Google Drive xlsx archives
│       └── commands/              # one module per subcommand
│           ├── collate.py
│           ├── slice.py
│           ├── cluster.py
│           ├── search.py
│           ├── month.py
│           ├── stats.py
│           └── debug.py
└── docs/
    ├── starter-prompt-template.md # for kicking off LLM analysis sessions
    └── adding-a-source.md         # how to wire in a new input format

Customising topic clusters

Open clusters.py and edit the CLUSTERS list. (this would be something an LLM is very good at generating btw)

Each cluster is a 4-tuple:

(
    "my-topic",                # slug — used in output filenames
    "My Topic Name",           # display name in reports
    [r"\bAPI\b", r"\bSDK\b"],  # regex patterns (case-insensitive)
    ["framework", "library"]   # fuzzy keywords (tolerate typos)
)

Regex patterns are best for short acronyms or specific tokens.
Fuzzy keywords tolerate spelling errors — the edit-distance threshold scales with word length:

Keyword length	Allowed edits
≤ 3 chars	0 (exact)
4–9 chars	1
10–14 chars	2
15+ chars	3

After editing, run ./ruminate cluster to regenerate cluster files, or ./ruminate debug "tweet text fragment" to see exactly which patterns fired.

Adding a new input format

See docs/adding-a-source.md. The short version:

Write a new module in src/ruminate/ingest/ that subclasses Source.
Implement detect() (cheap check: does this look like my format?) and iter_tweets() (yield canonical Tweet objects).
Register the class in src/ruminate/ingest/__init__.py.

No other code changes needed — every downstream command consumes canonical Tweet objects.

Output locations

All generated files go to ./output/:

output/tweets.json / tweets.md / tweets-llm.txt — collate outputs
output/months/YYYY-MM.txt — slice outputs
output/tweets-llm.zip — bundled archive for LLM sessions
output/tweet-clusters/cluster_*.txt — per-topic files
output/tweet-clusters.zip — bundled cluster archive
output/search_*.txt — saved search results

Roadmap

For planned features — including local LLM integration (Ollama), network & graph analysis of quote tweets and replies, advanced analysis tools, web UI, and more — see ROADMAP.md.

Contributing

See CONTRIBUTING.md for how to get involved.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ruminate

What it does

Supported Input Sources

Quick start

Requirements

Project layout

Customising topic clusters

Adding a new input format

Output locations

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
src/ruminate		src/ruminate
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
clusters.py		clusters.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
ruminate		ruminate
setup.sh		setup.sh

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ruminate

What it does

Supported Input Sources

Quick start

Requirements

Project layout

Customising topic clusters

Adding a new input format

Output locations

Roadmap

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages