Generating Excel XML, avoiding “found unreadable content”
In theory, changing content of an Excel file is easy:
* Parse XML from the zip-file
* Change XML
* Save XML into the zip
In practice I got the error: >>Von Excel wurde unlesbares Inhalt in ... gefunden. Möchten Sie den Inhalt dieser Arbeitsmappe wiederherstellen?< < (English: "Excel found unreadable content...")
After long debug, I found that my code is actually correct. The problem has appeared due to:
Misunderstanding of XML namespaces by the developers of the Word XML format. They use:
<worksheet
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
mc:Ignorable="x14ac">
But the correct use is:
<worksheet
...
mc:Ignorable="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
One should not rely on the namespace prefixes in XML, these prefixes are arbitrary. One should use the namespace URIs.
After the problem is clear, it is easy to make a fix. Or better to say, to make a workaround. The correct fix is to change the format, but it is not possible.
Solution 1.
Before saving XML, find out which prefix corresponds to "http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" and set it as the value of the attribute mc:Ignorable.
Solution 2.
I use ElementTree in Python and can force the use of desired namespace prefixes.
xml.etree.ElementTree.register_namespace('r', 'http://schemas.openxmlformats.org/officeDocument/2006/relationships')
xml.etree.ElementTree.register_namespace('mc', 'http://schemas.openxmlformats.org/markup-compatibility/2006')
xml.etree.ElementTree.register_namespace('x14ac', 'http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac')
xml.etree.ElementTree.register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')