Chew over your tweet archive.
A toolkit for building, slicing, and analysing a Twitter/X archive across multiple input formats. Format-agnostic by design — drop in a new ingest source and the rest of the pipeline works unchanged.
Pipeline — turn raw archive data into useful files:
- collate — read an input source, dedupe by tweet ID, sort chronologically, and write
tweets.json,tweets.md, andtweets-llm.txt - slice — split
tweets-llm.txtinto per-month files and bundle intotweets-llm.zip - pipeline — do both in one step
Analysis — explore the archive:
- cluster — classify tweets into topic clusters (regex + fuzzy keyword matching, configurable via
clusters.py) - search — find tweets by plain text, fuzzy match, or
/regex/ - month — read a single month with topic breakdown
- stats — totals by year/month, tweet types, top mentions, top hashtags
- debug — explain why a specific tweet matched (or didn't)
Currently supported:
-
IFTTT xlsx archives (
ifttt-xlsx) — The most common format. Reads tweets archived to Google Sheets / Google Drive via IFTTT recipes. Works with single.xlsxfiles, folders of them, or a.zipcontaining.xlsxfiles.This tool was specifically built to merge outputs from multiple IFTTT recipes and has been tested on archives of ~80k tweets.
Recommended IFTTT applets:
Planned (not yet implemented):
- Official Twitter/X archive export (
data/tweets.js) - community-archive.org JSON format
Adding a new format is intended to be straightforward — see docs/adding-a-source.md.
git clone https://github.com/hivesong/ruminate
cd ruminate
bash setup.sh
# Launch the interactive menu
./ruminate
# Or run individual commands
./ruminate pipeline path/to/archive.zip
./ruminate cluster
./ruminate search "your search term"
./ruminate stats
./ruminate --helpsetup.sh creates a local ./venv and installs dependencies there — your system Python and global packages are not touched.
Please note that if you want to work on this repo you need to use the SSH method. Assuming your keys are set up properly you will want to use this line to clone the repo instead:
git clone git@github.com:hivesong/ruminate.git- Python 3.10+
openpyxl(for xlsx ingest)rapidfuzz(for fuzzy keyword matching)
Both are installed automatically into the local venv by setup.sh.
ruminate/
├── ruminate # bash launcher (top-level command)
├── setup.sh # one-time installer
├── clusters.py # ← edit this to customise topic clusters
├── pyproject.toml
├── requirements.txt
├── src/
│ └── ruminate/
│ ├── cli.py # menu + argparse dispatch
│ ├── tweet.py # canonical Tweet dataclass
│ ├── parser.py # tweets-llm.txt reader
│ ├── classifier.py # regex + fuzzy
│ ├── output.py # all file writers
│ ├── ingest/ # pluggable input formats
│ │ ├── base.py # abstract Source class
│ │ └── ifttt_xlsx.py # IFTTT Google Drive xlsx archives
│ └── commands/ # one module per subcommand
│ ├── collate.py
│ ├── slice.py
│ ├── cluster.py
│ ├── search.py
│ ├── month.py
│ ├── stats.py
│ └── debug.py
└── docs/
├── starter-prompt-template.md # for kicking off LLM analysis sessions
└── adding-a-source.md # how to wire in a new input format
Open clusters.py and edit the CLUSTERS list.
(this would be something an LLM is very good at generating btw)
Each cluster is a 4-tuple:
(
"my-topic", # slug — used in output filenames
"My Topic Name", # display name in reports
[r"\bAPI\b", r"\bSDK\b"], # regex patterns (case-insensitive)
["framework", "library"] # fuzzy keywords (tolerate typos)
)Regex patterns are best for short acronyms or specific tokens.
Fuzzy keywords tolerate spelling errors — the edit-distance threshold scales with word length:
| Keyword length | Allowed edits |
|---|---|
| ≤ 3 chars | 0 (exact) |
| 4–9 chars | 1 |
| 10–14 chars | 2 |
| 15+ chars | 3 |
After editing, run ./ruminate cluster to regenerate cluster files, or ./ruminate debug "tweet text fragment" to see exactly which patterns fired.
See docs/adding-a-source.md. The short version:
- Write a new module in
src/ruminate/ingest/that subclassesSource. - Implement
detect()(cheap check: does this look like my format?) anditer_tweets()(yield canonicalTweetobjects). - Register the class in
src/ruminate/ingest/__init__.py.
No other code changes needed — every downstream command consumes canonical Tweet objects.
All generated files go to ./output/:
output/tweets.json/tweets.md/tweets-llm.txt— collate outputsoutput/months/YYYY-MM.txt— slice outputsoutput/tweets-llm.zip— bundled archive for LLM sessionsoutput/tweet-clusters/cluster_*.txt— per-topic filesoutput/tweet-clusters.zip— bundled cluster archiveoutput/search_*.txt— saved search results
For planned features — including local LLM integration (Ollama), network & graph analysis of quote tweets and replies, advanced analysis tools, web UI, and more — see ROADMAP.md.
See CONTRIBUTING.md for how to get involved.
MIT — see LICENSE.