Skip to content

Allow access to other XML docs in docx file like the header and footer#73

Open
yjukaku wants to merge 1 commit into
ruby-docx:masterfrom
yjukaku:add-other-doc-access
Open

Allow access to other XML docs in docx file like the header and footer#73
yjukaku wants to merge 1 commit into
ruby-docx:masterfrom
yjukaku:add-other-doc-access

Conversation

@yjukaku
Copy link
Copy Markdown

@yjukaku yjukaku commented Oct 16, 2019

This adds support for retrieving all of the header and footer documents embedded in the docx file, as well as the numbering docs.

This is based on the work in #22 and #42.

It also closes #49 and #32

@yjukaku yjukaku force-pushed the add-other-doc-access branch from ad67fc8 to 990de9c Compare October 22, 2019 20:21
@yjukaku yjukaku force-pushed the add-other-doc-access branch from 990de9c to 46beee1 Compare October 22, 2019 20:24
@fercreek
Copy link
Copy Markdown

@chrahunt we need this solution from @yjukaku

@yjukaku
Copy link
Copy Markdown
Author

yjukaku commented Oct 8, 2020

👋 Is there anything holding up this PR from merging? Anything we can do to help?

@nathanvda
Copy link
Copy Markdown
Contributor

This PR would solve a problem I am currently encoutering (namely: setting a bookmark in a header). I am willing to help to get this PR merged, what is holding this back?

@satoryu
Copy link
Copy Markdown
Member

satoryu commented Jun 29, 2021

There is a conflict file now.

@yjukaku do you have time to resolve the conflict?

@nathanvda
Copy link
Copy Markdown
Contributor

So I was trying if I could get it working, I see the main difference now is that for Office365 files we have to either try document.xml and if that does not exist, use document2.xml.

So I created a local version, where more inline with the current code, instead of iterating over DOCUMENT_PATHS I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

But when trying to adapt the update method accordingly, I noticed we only update the word/document.xml regardless of the source (leaving the document2.xml as is?) and I am not sure if that is ok/a problem? Can I ignore that for now?

@yjukaku
Copy link
Copy Markdown
Author

yjukaku commented Jun 29, 2021

I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

I was trying to DRY the code with the DOCUMENT_PATHS hash, but if that's not needed 🤷‍♂️ .

Can I ignore that for now?

I personally would expect the document file name to be the same as the original when updated. It appears the better way to find the proper document name would be to check the file [Content Types].xml in the zip, then look for an Override tag in that XML file that has a ContentType attribute with the value application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml. That will tell us exactly which file is the "main" one, and a similar method can be used for the headers, footers, numbering, styles, etc.

See http://officeopenxml.com/anatomyofOOXML.php under Content Types

Here's a sample [Content Types].xml:

<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
  <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
  <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
  <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/>
  <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
  <Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
  <Override PartName="/customXml/_rels/item1.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/>
  <Override PartName="/customXml/item1.xml" ContentType="application/xml"/>
  <Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/>
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>

nathanvda added a commit to nathanvda/docx that referenced this pull request Jul 1, 2021
Inspired by PR ruby-docx#73
we adapted the code to work on top of the current
state.
@aunghtain
Copy link
Copy Markdown

Can we merge this PR as well? I need access to numbering and header/footer. Thanks.

@panozzaj
Copy link
Copy Markdown

Thanks, the proposed change seems good at a high level to me. (I'm not affiliated with the project, just someone who has started using the library.) This would be helpful for one case I saw today where the important text information we wanted was in the document footer. Right now that information is inaccessible.

I wouldn't want to delay this PR, but what do you think about adding the header or footer contents to methods like .text on documents? Maybe it could take the contents of any headers and put that at the top of the document text, and the contents of the footers at the end. That way document.text would truly give you all of the text of the document.

FeminismIsAwesome pushed a commit to FeminismIsAwesome/docx that referenced this pull request Jul 30, 2023
inspired from ruby-docx#73 but stripped down to just the header to see if that might be more amenable to get in.

Also because of the TODO note in the update function, only supports reading these files, not updating them.
@aunghtain
Copy link
Copy Markdown

Any update on this? I've been waiting for it for more than a year now.

pull Bot pushed a commit to NeatNerdPrime/docx that referenced this pull request May 31, 2026
inspired from ruby-docx#73 but stripped down to just the header to see if that might be more amenable to get in.

Also because of the TODO note in the update function, only supports reading these files, not updating them.
@satoryu
Copy link
Copy Markdown
Member

satoryu commented May 31, 2026

Hi everyone — a long-overdue update, and an apology. This PR has been open since 2019, and several of you have been waiting a very long time for it. Thank you for your patience and persistence. 🙇

The good news: as of v0.12.0 (released today), the header/footer functionality this PR set out to provide is now in the gem. Rather than merging this PR directly (it had drifted and conflicted with master), we shipped it as a series of small, focused PRs:

Now, to each of you individually:

@nathanvda — your exact use case (setting a bookmark in a header) works now:

doc = Docx::Document.open('example.docx')
doc.bookmarks['header_bookmark'].insert_text_after('Hello from the header')
doc.save('edited.docx')

And we followed your design instinct: instead of the DOCUMENT_PATHS hash, we added explicit load_headers / load_footers methods mirroring load_styles, exactly as you described in your comment. Thank you for that steer — it shaped the final implementation.

@yjukaku — thank you for the original PR, and especially for the [Content Types].xml write-up. For headers and footers, update now writes each part back to its original file name (we key them by their real name), so they round-trip correctly — which matches your "I'd expect the file name to be the same as the original when updated." Full disclosure: we have not yet implemented the [Content Types].xml-based resolution for the main document (e.g. Office 365 document2.xml) or for numbering — that deeper part of your comment is still open, and your write-up will be the reference when we get to it.

@aunghtain — sorry for the very long wait. Header/footer access (read + write) is available now in v0.12.0. Being honest: numbering is still not implemented, so that half of your need isn't covered yet. I'll make sure it's tracked separately so it doesn't get lost again.

@panozzaj — good suggestion about folding header/footer text into document.text. For now we've kept them as separate accessors (doc.headers['header1'].text, doc.footers['footer1'].text) rather than changing .text, to avoid surprising existing callers. If you'd still like .text to optionally include them, a small follow-up issue/PR would be very welcome.

@fercreek — thanks for the early nudge way back. 🙂 It's finally here.

Since the header/footer goal of this PR is now shipped, we'll likely close it in favor of the released versions, while tracking the remaining numbering / content-types work separately. Thank you all again — and sorry it took so long. 🍵

@panozzaj
Copy link
Copy Markdown

For now we've kept them as separate accessors (doc.headers['header1'].text, doc.footers['footer1'].text) rather than changing .text, to avoid surprising existing callers.

I think this is a good approach. Thanks! Great update and hope you are doing well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace Header/Footer bookmarks doesn't work

6 participants