escape a TeX string in Python
Do you generate [La]TeX files using Python? If yes, do you escape the special characters. Haven't you forgotten to escape also empty strings?
To avoid reinventing the wheel, I tried to use TeXML libraries. This is the right choice, but one has to use a little trick.
The problem is that TeXML doesn't provide a simple function like:
s = tex_escape(s)
Instead, TeXML assumes it converts a big XML. The escaping functions are a part of an object which writes to an output stream.
Well, can we use StringIO to use a string as an output buffer?
Yes. But then the next subproblem: it is assumed that the stream is created once and used forever. Unfortunately, we can't reset the content of a string buffer. And re-creating and re-initializing a string buffer and TeXML writer each time we need to escape a string is unefficient.
Wait! We can reset the string! The function truncate does work.
So, here is the solution:
import cStringIO, Texml.texmlwr
class TexEscape:
def __init__(self):
self.stream = cStringIO.StringIO()
self.texmlwr = Texml.texmlwr.texmlwr(self.stream, 'utf-8', '72')
def escape(self, s):
self.texmlwr.write("x\n")
self.stream.truncate(0)
self.texmlwr.write(s)
return self.stream.getvalue()
def free(self):
self.stream.close()
e = TexEscape()
s = "foo&bar\n\n\nbar&foo"
s = e.escape(s)
print s
e.free()
The only last trick is 'self.texmlwr.write("x\n")'. I use it to reset internal state flags.
Running the code produces the desired result:
$ python test.py
foo\&bar
%
%
bar\&foo