TeXML: any encoding as ASCII

The TeXML development version 1.27 brings new essential functionality: “–ascii” parameter. Now it’s possible to generate plain ASCII TeX files in a desired encoding. Non-ascii bytes are encoded as “^^XX“.

The folder “tests” contains the file “chinese1.xml” which is a working example of Chinese TeXML/LaTeX file:

<TeXML>
	<cmd name="documentclass" nl2="1">
		<parm>article</parm>
	</cmd>
	<cmd name="usepackage" nl2="1">
		<opt>encapsulated</opt>
		<parm>CJK</parm>
	</cmd>
	<cmd name="usepackage" nl2="1">
    <parm>ucs</parm>
  </cmd>
	<cmd name="usepackage" nl2="1">
		<opt>utf8x</opt>
		<parm>inputenc</parm>
	</cmd>
	<env name="document">
		<env name="CJK">
			<parm>UTF8</parm>
			<parm>cyberbit</parm>
			&#x4E16;&#x754C;&#xFF0C;&#x4F60;&#x597D;&#xFF01;
		</env>
	</env>
</TeXML>

(”世界,你好!” should mean “Hello, World!”, but I’m not sure)

After processing it with TeXML (options “–encoding utf8 –ascii“), you get the following result:

\documentclass{article}
\usepackage[encapsulated]{CJK}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\begin{document}
\begin{CJK}{UTF8}{cyberbit}
^^e4^^b8^^96^^e7^^95^^8c^^ef^^bc^^8c^^e4^^bd^^a0^^e5^^a5^^bd^^ef^^bc^^81
\end{CJK}
\end{document}

There are also minor improvements in the new version:

* TeXML issues a warning if an XML symbol isn’t converted to TeX and printed as ‘&#xNNN;’

* Refactoring. Code for tuning output stream is moved from “handler.py” to “texmlwr.py”.

Leave a Reply