python libxml2 dita

python libxml2 dita

For correct transformation of DITA files (XML-standard for modular documentation), it is necessary to pull information from DTD (document type definition). In my python code, sometimes I did get this information and sometimes not. Now I've tracked the source of instability and corrected the code.

Sample python code:

import libxml2
import libxsltmod

s = """<!DOCTYPE map PUBLIC "-//OASIS//DTD XDITA Map//EN"
"file://.../dita-ot-2.2.1/plugins/org.oasis-open.dita.v1
_2/dtd/technicalContent/dtd/map.dtd">

<map title="Empty map">
</map>"""

libxml2.substituteEntitiesDefault(1)
xmldoc = libxml2.parseDoc(s)
print xmldoc

The result as expected:

<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD XDITA Map//EN"
"file://.../dita-ot-2.2.1/plugins/org.oasis-open.dita.v1
_2/dtd/technicalContent/dtd/map.dtd">
<map xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/"
  title="Empty map" ditaarch:DITAArchVersion="1.2" domains="(topic delay-d)
  (map mapgroup-d)                           (topic indexing-d)
  (map glossref-d)                          (topic hi-d)
  (topic ut-d)                           (topic hazard-d)
  (topic abbrev-d)                          (topic pr-d)
  (topic sw-d)                          (topic ui-d)
  " class="- map/map ">
</map>

If I comment-out import libxsltmod, the result is wrong for me:

<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD XDITA Map//EN"
"file://.../dita-ot-2.2.1/plugins/org.oasis-open.dita.v
1_2/dtd/technicalContent/dtd/map.dtd">
<map title="Empty map">
</map>

The explanation and two solutions is on the stackoverflow site: Expand default (dita) attributes. My verbose solution:

ctxt = libxml2.createDocParserCtxt(s)
opts = libxml2.XML_PARSE_NOENT | libxml2.XML_PARSE_DTDATTR
ctxt.ctxtUseOptions(opts)
ctxt.parseDocument()
xmldoc = ctxt.doc()
del ctxt

The short easy solution:

xmldoc = libxml2.readDoc(s, None, None, libxml2.XML_PARSE_NOENT | libxml2.XML_PARSE_DTDATTR)

Updated: