python libxml2 dita
python libxml2 dita
For correct transformation of DITA files (XML-standard for modular documentation), it is necessary to pull information from DTD (document type definition). In my python code, sometimes I did get this information and sometimes not. Now I've tracked the source of instability and corrected the code.
Sample python code:
import libxml2
import libxsltmod
s = """<!DOCTYPE map PUBLIC "-//OASIS//DTD XDITA Map//EN"
"file://.../dita-ot-2.2.1/plugins/org.oasis-open.dita.v1
_2/dtd/technicalContent/dtd/map.dtd">
<map title="Empty map">
</map>"""
libxml2.substituteEntitiesDefault(1)
xmldoc = libxml2.parseDoc(s)
print xmldoc
The result as expected:
<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD XDITA Map//EN"
"file://.../dita-ot-2.2.1/plugins/org.oasis-open.dita.v1
_2/dtd/technicalContent/dtd/map.dtd">
<map xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/"
title="Empty map" ditaarch:DITAArchVersion="1.2" domains="(topic delay-d)
(map mapgroup-d) (topic indexing-d)
(map glossref-d) (topic hi-d)
(topic ut-d) (topic hazard-d)
(topic abbrev-d) (topic pr-d)
(topic sw-d) (topic ui-d)
" class="- map/map ">
</map>
If I comment-out import libxsltmod, the result is wrong for me:
<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD XDITA Map//EN"
"file://.../dita-ot-2.2.1/plugins/org.oasis-open.dita.v
1_2/dtd/technicalContent/dtd/map.dtd">
<map title="Empty map">
</map>
The explanation and two solutions is on the stackoverflow site: Expand default (dita) attributes. My verbose solution:
ctxt = libxml2.createDocParserCtxt(s)
opts = libxml2.XML_PARSE_NOENT | libxml2.XML_PARSE_DTDATTR
ctxt.ctxtUseOptions(opts)
ctxt.parseDocument()
xmldoc = ctxt.doc()
del ctxt
The short easy solution:
xmldoc = libxml2.readDoc(s, None, None, libxml2.XML_PARSE_NOENT | libxml2.XML_PARSE_DTDATTR)