Recover Broken Word 2007 Documents by Hacking the XML

I like Word 2007 but like all Microsoft products, it comes with the affliction of complication. Word (and the Office Open XML format) is so complicated that there are a zillion possible things that could go wrong. Chances are one of them is going to happen at some point. The moral of the story is, observe the motto of “Save Early. Save Often.” and you might avoid such a catastrophe. But if you’re lazy like me and forget to back up, this might happen.

Word is being funny, you are messing around with equations that aren’t behaving right. You try to save and it won’t let you save for some reason. You think “Oh I’ll just restart to make Word all better again.” You restart and your effing document won’t open with an error something like:

The Office Open XML file FileName.docx cannot be opened because there are problems with the contents.

WTF!? This was your only copy of the document! OMG! After you have taken a vallium, do the following:

Click Details on the error dialog and note down the location of the error (e.g. /word/document.xml, Line: 2, Column: 65946).

Rename your faulty document to FileName.docx.zip and extract it into some folder.

Open up the XML file listed in the error message and locate the offending column. For this you might want to download XML Copy Editor or any other editor that has the features we need. You can see the column number in the status bar.

The offending error will probably be an XML tag. In my case column 65946 was at the * below (* has been added)

... </w:t></m:r><m:ctrlPr*><w:rPr><w:rFonts w:ascii="Cambria Math" ...

So the error is in the <m:ctrlPr> tag. Now you need to add a fake attribute to the tag called a="aaaaa" or something easily identifiable that won’t occur naturally in the Office XML. Your tag should look like this.

<m:ctrlPr a="aaaaa">

By default, the XML is all on one line to save on space, but we need to fix that to delete this tag.

In XML Copy Editor, select Pretty print from the XML menu. Now search for “aaaaa” or whatever you called your fake attribute. You should now be able to visually see the entire tag.

<m:ctrlPr a="aaaaa">
  <w:rPr>
    <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
    <w:i/>
    <w:lang w:val="en-AU"/>
  </w:rPr>
</m:ctrlPr>

Now just delete the whole <m:ctrlPr> tag and save the document. You can leave it in Pretty Print, it will still be valid XML.

Re-ZIP the contents of the folder you extracted to and rename document back to FileName.docx.

Try to open in Office 2007. If it works, Hooray! If it doesn’t, and you get another error, repeat this process. Else retype your 10,000 word document!