Generating Excel XML, avoiding “found unreadable content”

In theory, changing content of an Excel file is easy:

* Parse XML from the zip-file
* Change XML
* Save XML into the zip

In practice I got the error: >>Von Excel wurde unlesbares Inhalt in ... gefunden. Möchten Sie den Inhalt dieser Arbeitsmappe wiederherstellen?< < (English: "Excel found unreadable content...")

After long debug, I found that my code is actually correct. The problem has appeared due to:

Misunderstanding of XML namespaces by the developers of the Word XML format. They use:

<worksheet
  xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
  xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
  xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
  xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"

  mc:Ignorable="x14ac">

But the correct use is:

<worksheet
  ...
  mc:Ignorable="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">

One should not rely on the namespace prefixes in XML, these prefixes are arbitrary. One should use the namespace URIs.

After the problem is clear, it is easy to make a fix. Or better to say, to make a workaround. The correct fix is to change the format, but it is not possible.

Solution 1.

Before saving XML, find out which prefix corresponds to "http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" and set it as the value of the attribute mc:Ignorable.

Solution 2.

I use ElementTree in Python and can force the use of desired namespace prefixes.

xml.etree.ElementTree.register_namespace('r', 'http://schemas.openxmlformats.org/officeDocument/2006/relationships')
xml.etree.ElementTree.register_namespace('mc', 'http://schemas.openxmlformats.org/markup-compatibility/2006')
xml.etree.ElementTree.register_namespace('x14ac', 'http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac')
xml.etree.ElementTree.register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')

Categories: python windows