escape a TeX string in Python

Do you generate [La]TeX files using Python? If yes, do you escape the special characters. Haven’t you forgotten to escape also empty strings?

To avoid reinventing the wheel, I tried to use TeXML libraries. This is the right choice, but one has to use a little trick.

The problem is that TeXML doesn’t provide a simple function like:

s = tex_escape(s)

Instead, TeXML assumes it converts a big XML. The escaping functions are a part of an object which writes to an output stream.

Well, can we use StringIO to use a string as an output buffer?

Yes. But then the next subproblem: it is assumed that the stream is created once and used forever. Unfortunately, we can’t reset the content of a string buffer. And re-creating and re-initializing a string buffer and TeXML writer each time we need to escape a string is unefficient.

Wait! We can reset the string! The function truncate does work.

So, here is the solution:

import cStringIO, Texml.texmlwr

class TexEscape:
  def __init__(self):
    self.stream = cStringIO.StringIO()
    self.texmlwr = Texml.texmlwr.texmlwr(self.stream, 'utf-8', '72')
  def escape(self, s):
    self.texmlwr.write("x\n")
    self.stream.truncate(0)
    self.texmlwr.write(s)
    return self.stream.getvalue()
  def free(self):
    self.stream.close()

e = TexEscape()
s = "foo&bar\n\n\nbar&foo"
s = e.escape(s)
print s
e.free()

The only last trick is ‘self.texmlwr.write("x\n")‘. I use it to reset internal state flags.

Running the code produces the desired result:


$ python test.py
foo\&bar
%
%
bar\&foo

One Response to “escape a TeX string in Python”

  1. olpa Says:

    It seems that

    return self.stream.getvalue()

    should be changed to

    s = self.stream.getvalue()
    return unicode(s, ‘utf-8’)

Leave a Reply