Skip to content

Add ability to read document headers#173

Merged
satoryu merged 1 commit into
masterfrom
add-header-extraction
May 31, 2026
Merged

Add ability to read document headers#173
satoryu merged 1 commit into
masterfrom
add-header-extraction

Conversation

@satoryu
Copy link
Copy Markdown
Member

@satoryu satoryu commented May 31, 2026

Summary

Adds the ability to read document headers (word/header*.xml) via a new Docx::Document#headers accessor. Headers are exposed as a Hash keyed by the header file name (e.g. "header1"), with each value being a parsed Nokogiri::XML document.

doc = Docx::Document.open("with_header.docx")
doc.headers["header1"].text  # => "Hello from the header."

Background / Credit

This is a rebased version of #137 by @FeminismIsAwesome (Ian Norris). That PR had grown stale and conflicted with master after the SimpleInspect and ZIP64 (#168 / #172) changes. The original commit and authorship are preserved here; only the merge conflicts in lib/docx/document.rb were resolved. It addresses #32.

The implementation follows the existing load_styles convention (a private load_headers method called from initialize), keeping it consistent with the current codebase.

Scope

Intentionally minimal — header reading only. Footers and write-back are out of scope and left for a follow-up (this mirrors the deliberate scope of #137, and avoids the riskier round-trip/file-name-mapping concerns raised against the alternative #153).

Known limitations

  • headers exposes mutable Nokogiri documents, but update (on save/stream) does not write headers back. Editing a header and saving will not persist the change (read-only).
  • Footers (word/footer*.xml) are not handled.

Tests

  • Adds spec/fixtures/multi_doc.docx and a read headers spec.
  • Full suite green locally (139 examples, 0 failures).

Closes #137

🤖 Generated with Claude Code

inspired from #73 but stripped down to just the header to see if that might be more amenable to get in.

Also because of the TODO note in the update function, only supports reading these files, not updating them.
@satoryu satoryu closed this May 31, 2026
@satoryu satoryu deleted the add-header-extraction branch May 31, 2026 10:25
@satoryu satoryu restored the add-header-extraction branch May 31, 2026 10:26
@satoryu satoryu reopened this May 31, 2026
@satoryu satoryu merged commit c39e9c8 into master May 31, 2026
10 checks passed
@satoryu satoryu deleted the add-header-extraction branch May 31, 2026 10:27
satoryu added a commit that referenced this pull request May 31, 2026
Adds a Docx::Document#footers accessor that exposes word/footer*.xml as a
Hash keyed by footer file name, mirroring the headers reading feature (#173).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Jawad Ahmad <Jawad79Ahmad@users.noreply.github.com>
pull Bot pushed a commit to NeatNerdPrime/docx that referenced this pull request May 31, 2026
Extends Document#update to write modified headers and footers back into the
archive on save/stream, keyed by their original file name (so multiple and
non-sequential header/footer files round-trip correctly). Builds on the
header/footer reading support (ruby-docx#173, ruby-docx#174).

Based on the original read/write implementation by @aashish in ruby-docx#42.

Co-authored-by: aashish <aashish@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant