TexML and unicode characters for math operators

Bug (?) report: “after switching from LaTeX(pdflatex) to XeTeX(xelatex) as the PDF generator something has ceased to work. Mathematical operators given as unicode characters in math formulas do not show up in the final document.ยดยด

The letter continues:

I’ve diagnosed the problem. The cause is a change in the encoding option of the TexML converter, from -eISO-8859-1 to -eUTF-8, as expected by XeTeX. Previously some unicode characters were automagically converted to the corresponding LaTeX mathematical symbols, but now they are just copied to the generated LaTeX code.

I’ve solved the problem myself by a quick-and-dirty hack of texmlwr.py. A new conditional section in writech(...) before the chained tests for TeXML modes forces the conversion inside math mode, regardless of the particular output encoding.

 <     # First attempt to write symbol as-is
 >     # First attempt to write symbol as-is  **** IF NOT IN MATH MODE ****
 >     if self.mode == MATH:
 >       #
 >       # Math mode, lookup math map
 >       #
 >       try:
 >         chord = ord(ch)
 >         if chord > 255:                            # no Latin-1
 >           self.stream.write(unimap.mathmap[chord])
 >           return                                             # return
 >       except:
 >         pass

My coment

So, XeTeX (and probably other TeXs, I think) doesn’t interpret the unicode math correctly. The solution is to avoid the unicode math and use the corresponding escape sequences instead.

I’m afraid it’s the only available solution right now. However, I vaguely remember discussions in XeTeX mailing list and articles in TUGboat, therefore I’d like to spend some time in looking for an alternative. Probably a magic option for XeTeX cures the problem.

Finally, I asked for a few examples of TeXML, generated TeX and PDF files (both correct and not). Hopefully, I’ll have a bit of time to find a better solution.

26.03.2010, solution

unicode math in xelatex

Leave a Reply