python libxml2: save XML as HTML
HTML is the main output format for XML transformations. Every XSLT-processor, including libxslt/libxml2, supports it. But if you transform a libxml2 tree manually, you are in trouble. You can save XML only as XML, not as HTML. A solution is required. My version is not elegant, but works.
By the way, the desired functionality is provided by the plain C library. I think that the parameter options of the functions xmlSaveToXXX should set the flag XML_SAVE_AS_HTML, "force HTML serialization on XML doc". Another try would be to change the type of a node from XML_DOCUMENT_NODE to XML_HTML_DOCUMENT_NODE.
Unfortunately, the Python bindings can't access this functionality. Fortunately, based on the "node type" idea, the following does work:
* Create an empty HTML document
* Move nodes from the XML tree into the new HMTL tree
* Save the HTML tree
Proof-of-the concept code:
import libxml2
doc = libxml2.parseDoc('''
<html>
<head>
<title>I'm a title</title>
<link rel="stylesheet" type="text/css" href="style/style.css"></link>
</head>
<body>
<h1>Test</h1>
<img src="#none" width="32" height="32"/>
<p>Test</p>
</body>
</html>
''')
node = doc.getRootElement()
print node.serialize()
html_doc = libxml2.htmlParseDoc('<html></html>', None)
html_root = html_doc.getRootElement()
while node.children:
kid = node.children
kid.unlinkNode()
html_root.addChild(kid)
print '------------------'
print html_root.serialize()
The output. First as an XML-tree, than as an HTML-tree
<html>
<head>
<title>I'm a title</title>
<link rel="stylesheet" type="text/css" href="style/style.css"/>
</head>
<body>
<h1>Test</h1>
<img src="#none" width="32" height="32"/>
<p>Test</p>
</body>
</html>
------------------
<html>
<head>
<title>I'm a title</title>
<link rel="stylesheet" type="text/css" href="style/style.css">
</head>
<body>
<h1>Test</h1>
<img src="#none" width="32" height="32">
<p>Test</p>
</body>
</html>
Note the differences in the ends of the elements link and img.