escape a TeX string in Python
Do you generate [La]TeX files using Python? If yes, do you escape the special characters. Haven't you forgotten to escape also empty strings?
To avoid reinventing the wheel, I tried to use TeXML libraries. This is the right choice, but one has to use a little trick.
The problem is that TeXML doesn't provide a simple function like:
s = tex_escape(s)
Instead, TeXML assumes it converts a big XML. The escaping functions are a part of an object which writes to an output stream.
Well, can we use StringIO to use a string as an output buffer?
Yes. But then the next subproblem: it is assumed that the stream is created once and used forever. Unfortunately, we can't reset the content of a string buffer. And re-creating and re-initializing a string buffer and TeXML writer each time we need to escape a string is unefficient.
Wait! We can reset the string! The function truncate does work.
So, here is the solution:
import cStringIO, Texml.texmlwr class TexEscape: def __init__(self): self.stream = cStringIO.StringIO() self.texmlwr = Texml.texmlwr.texmlwr(self.stream, 'utf-8', '72') def escape(self, s): self.texmlwr.write("x\n") self.stream.truncate(0) self.texmlwr.write(s) return self.stream.getvalue() def free(self): self.stream.close() e = TexEscape() s = "foo&bar\n\n\nbar&foo" s = e.escape(s) print s e.free()
The only last trick is 'self.texmlwr.write("x\n")'. I use it to reset internal state flags.
Running the code produces the desired result:
$ python test.py foo\&bar % % bar\&foo