yomail (読メール)

yomail extracts the message content from Japanese business emails. It uses a CRF (Conditional Random Field) model to classify each line, then assembles the content from labeled lines.

Features

Handles formal and informal Japanese business emails
Extracts greeting, body, and closing as unified message content
Excludes signatures and trailing/leading quoted content
Works with forwarded emails, replies, and inline quotes
Returns confidence scores for quality control
Small model size (12 KB)
Fast inference (~10-30ms)

Installation

pip install yomail

Requires Python 3.12+.

Usage

from yomail import EmailBodyExtractor

extractor = EmailBodyExtractor()

# Raises on failure
body = extractor.extract(email_text)

# Returns None on failure
body = extractor.extract_safe(email_text)

# Full result with metadata
result = extractor.extract_with_metadata(email_text)
print(result.body)
print(result.confidence)
print(result.signature_detected)

Example

Input:

株式会社サンプル
田中様

お世話になっております。
山田です。

先日ご依頼いただいた資料を添付いたします。
ご確認のほどよろしくお願いいたします。

以上

--
山田太郎
株式会社テスト
TEL: 03-1234-5678

Output:

お世話になっております。
山田です。

先日ご依頼いただいた資料を添付いたします。
ご確認のほどよろしくお願いいたします。

以上

How It Works

The extraction pipeline:

Normalize — Line endings, neologdn normalization, NFKC
Analyze structure — Quote depth, forward/reply headers, delimiters
Extract features — Position, character ratios, pattern matches
Label with CRF — GREETING, BODY, CLOSING, SIGNATURE, QUOTE, OTHER
Assemble body — Find signature boundary, handle inline quotes, merge blocks

See ARCHITECTURE.md for details and API.md for the full API reference.

Label Scheme

Label	Description
GREETING	Opening (お世話になっております)
BODY	Main content
CLOSING	Closing (よろしくお願いいたします)
SIGNATURE	Sender information
QUOTE	Quoted content
OTHER	Separators, blank lines

Performance

Evaluated on 19,642 synthetic test emails:

Metric	Value
Content match	97.9%
Acceptable rate	98.0%
Confident wrong	0.14%

See PERFORMANCE.md for details.

Exceptions

from yomail import (
    ExtractionError,      # Base class
    InvalidInputError,    # Empty or invalid input
    NoBodyDetectedError,  # No body found
    LowConfidenceError,   # Confidence below threshold
)

Configuration

extractor = EmailBodyExtractor(
    model_path="path/to/model.crfsuite",  # Custom model
    confidence_threshold=0.5,              # Minimum confidence
)

Development

# Setup
uv sync

# Run tests
uv run pytest

# Type check
uv run ty check

# Lint
uv run ruff check .

Training

Training data is generated by the yasumail project.

# Train model
python scripts/train.py data/training.jsonl -o models/email_body.crfsuite

# Evaluate
python scripts/evaluate.py data/test.jsonl

Dependencies

neologdn — Japanese text normalization
python-crfsuite — CRF implementation
PyYAML — Name data loading

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
models		models
scripts		scripts
src/yomail		src/yomail
tests		tests
typings		typings
.gitignore		.gitignore
.python-version		.python-version
API.md		API.md
ARCHITECTURE.md		ARCHITECTURE.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
PERFORMANCE.md		PERFORMANCE.md
README.md		README.md
THIRD_PARTY_LICENSES		THIRD_PARTY_LICENSES
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yomail (読メール)

Features

Installation

Usage

Example

How It Works

Label Scheme

Performance

Exceptions

Configuration

Development

Training

Dependencies

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

yomail (読メール)

Features

Installation

Usage

Example

How It Works

Label Scheme

Performance

Exceptions

Configuration

Development

Training

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages