Archive for March, 2005

WordML2LaTeX

Thursday, March 10th, 2005

New software was recently announced in comp.text.tex:

WordML2LaTeX is a meeting point between two titans in word processing: Microsoft Word 2003 and LaTeX2e. It is a XSL stylesheet that transforms a Word document (WordML) in a LaTeX2e source. With it You can use Word as a front end for LaTeX.

It converts XML to LaTeX and it doesn’t use TeXML! Very pity. I’ve written a letter to the author:

To:   ruggdam@....it
Subj: WordML2LaTeX with TeXML?

Hello Ruggero,

I’ve noticed an announce of WordML2LaTeX in comp.text.tex and looked at it. I can’t check how it works as I have no WordML documents. Anyway, the stylesheet looks impressive and interesting for me.

I noticed that you create TeX markup directly from XSLT. This is the reason why I’m writing to you: I’d like to advocate TeXML:

http://getfo.sourceforge.net/texml/

It was created specialy for XML to TeX translation, and I totally satisfied with it. I’m not alone. For example, see:

http://contextgarden.net/Getting_Started_with_XML_and_ConTeXt_using_TEXML

So I’d like to convince you of the using TeXML in the next version of WordML2LaTeX.

Regards, Oleg

By the way, the stylesheet looks good and covers many formatting issues. The software is available on CTAN in the folder “/support/WordML2LaTeX/” (for example, here).

Skeleton is forming

Thursday, March 10th, 2005

Unexpectedly, I’ve got too little time for programming this evening. Anyway, the new code now can convert elements and attributes.

XSLT:

<x:stylesheet
  xmlns:x = "http://www.w3.org/1999/XSL/Transform"
  xmlns:s = "http://uucode.com/xslt/scheme"
  x:extension-element-prefixes="s"
  version = "1.0">

<x:output indent="yes"/>

<x:template match="/">
  <s:scheme>
    '(article  (@ (id "hw"))
      (title "Hello")
      (para
        "Hello, "
        (object "World")
        "!"))
  </s:scheme>
</x:template>

</x:stylesheet>

Result:

<article id="hw">
  <title>Hello</title>
  <para>Hello, <object>World</object>!</para>
</article>

It was hard to implement conversion of attributes. I’ve used a small trick which is not guaranteed to work because I’ve misused the API a bit. I hope to write about it soon.

Paul Tremblay improves TeXML

Wednesday, March 9th, 2005

Nice news from Paul Tremblay. He submitted patches to TeXML. The first one allows to install TeXML in the normal Pythonic way:

python setup.py build
python setup.py install

The patch description.

Other patches deal with the vertical bar symbol (”|”). Haven’t seen them yet.

Hope to find a bit of free time to integrate all to CVS.

testing quotes

Wednesday, March 9th, 2005

WordPress (this blog engine) changes quotes (”) to (\”). Let see if

php_flag magic_quotes_gpc off
php_flag magic_quotes_runtime off

have fixed the problem.

Update: good, it seems so.

————–

Updated, 10 March

Unfortunately, it doesn’t help. For some reason, quotes inside the “pre” tag get a backslash. And quotes in text are smart. I hate it!

————-

Updated, 12 March

Finally fixed by hacking the code. I’ve commented out smart quotes stuff and corrected the following line in “wp-includes/functions-formatting.php”:

$pee = preg_replace('!(<pre.*?>)(.*?)</pre>!ise',
  " stripslashes('$1') .  clean_pre('$2')  . '</pre>' ", $pee);

I changed

clean_pre('$2')

to

clean_pre(stripslashes('$2'))

the road to hell is paved with good intentions

Wednesday, March 9th, 2005

I’ve spent the evening investigating why I was getting a core dump. It was not an easy task because the code was in a plugin (so it was hard to set a breakpoint) and the data structures were full of links (libxml) or the mess (Guile).

Well, suppose you have the empty element node par and two text nodes node1 and node2 with the content text1 and text1. You adds nodes node1 and node2 as children to the node par. What the structure do you get? A node with two text children? No, it’s too boring. Excerpt from the xmlAddChild description:

Add a new node to @parent, at the end of the child (or property) list merging adjacent TEXT nodes (in which case @cur is freed)

It was my problem. I was reusing a freed node. It was a big problem because I had to reuse it, and the automatical merging broke the mapping between SCM values and xml nodes. Well, the solution is trivial: I don’t map text nodes anymore, and now I see that it is the right way.

Conclusion: sometimes even the road to hell is useful.

By the way, I’ve implemented conversion of Scheme lists to XML nodesets. The problem above arose while working on the conversion.

links, trees, forest, infinity

Tuesday, March 8th, 2005

On mapping from Scheme values to libxml nodes. It is reasonable to map the same physical values to the same physical nodes. But it causes unexpected results. Consider the stylesheet:

<x:stylesheet
  xmlns:x = "http://www.w3.org/1999/XSL/Transform"
  xmlns:s = "http://uucode.com/xslt/scheme"
  x:extension-element-prefixes="s"
  version     = "1.0">

<s:init>
  (define foo 777)
</s:init>

<x:template match="/">
  <x>
    <s:scheme>foo</s:scheme>
    <s:scheme>foo</s:scheme>
    <!-- <y><s:scheme>foo</s:scheme></y> -->
  </x>
</x:template>

</x:stylesheet>

One can expect to get the following result:

<x>777777</x>

But the right answer is:

<x>777</x>

As I found, libxml performs a set of checks while adding a child. One of the checks is that the same node isn’t inserted twice.

And what happens when the part of stylesheet is uncommented? It produces such a tree structure which isn’t expected by libxml. As result, serializer enters infinitive loop till core dump.

from Scheme values to XML nodes

Tuesday, March 8th, 2005

Conversion of some basic Scheme types (string, boolean, char, number) to XML nodes now works. The stylesheet

<x:stylesheet
  xmlns:x = "http://www.w3.org/1999/XSL/Transform"
  xmlns:s = "http://uucode.com/xslt/scheme"
  x:extension-element-prefixes="s"
  version     = "1.0">

<x:template match="/">
  <x>
    <x1><s:scheme>"string value"</s:scheme></x1>
    <x2><s:scheme>(> 2 3)</s:scheme></x2>
    <x3><s:scheme>(/ 777 2)</s:scheme></x3>
    <x4><s:scheme>#A</s:scheme></x4>
  </x>
</x:template>

</x:stylesheet>

produces, as expected

<x>
  <x1>string value</x1>
  <x2>false</x2>
  <x3>388.5</x3>
  <x4>A</x4>
</x>

goto considered

Tuesday, March 8th, 2005

It’s well-known that goto is considered harmful. But… I’ve just written a small C function of approximately 70 lines with 5 gotos and two goto targets. And I like the code.

xfind is not the standard find

Monday, March 7th, 2005

The title “Find with XPath over file system” is a bit misleading. It is not the first time when I’ve got a question like:

When you say “The standard UNIX utility find now supports XPath”, does that mean that this will be included in the standard find?

Now I’ve been asked at Lambda the Ultimate (and I like being noticed there).

The answer is:

No, it will never be included in the standard find: http://lists.gnu.org/archive/html/bug-findutils/2005-01/msg00107.html, and I completely agree.

It looks like the title is a bit misleading. I wanted to emphasize that I hadn’t written a program from scratch, but added the XPath facility to a legacy code, and the legacy code is the standard find.

XSLT element “scheme:init” works

Monday, March 7th, 2005

I’m working on embedding Guile (Scheme interpreter) to xsltproc. I’ve introduced the element “init” to contain Scheme initialization code and added (I suppose, sometimes redundant) error checks. Well, it works now:

<x:stylesheet
  xmlns:x = "http://www.w3.org/1999/XSL/Transform"
  xmlns:s = "http://uucode.com/xslt/scheme"
  x:extension-element-prefixes="s"
  version = "1.0">

<s:init>
  (display "SCHEME: Initialization")
  (define greeting "SCHEME: Hello from scheme.xsl!")
  (newline)
</s:init>

<x:template match="/">
  <x>
    <s:scheme>
      (display greeting)
      (newline)
    </s:scheme>
  </x>
</x:template>

</x:stylesheet>

Getting Started with XML and ConTeXt

Monday, March 7th, 2005

Paul Tremblay has published an article for XML authors who want to use open source software to produce high quality PDF documents. He suggests using ConTeXt, a variation of TeX, and advocates using TeXML for conversion of XML to ConTeXt:

http://contextgarden.net/Getting_Started_with_XML_and_ConTeXt

And I’ve added some comments on the article:

http://contextgarden.net/Talk:Getting_Started_with_XML_and_ConTeXt

I’m happy to see that TeXML becomes popular, and not only because I’m the author, but also because I really believe in the benefits of TeXML.

Meanwhile I released TeXML 1.23 development version with ConTeXt support. Unfortunately, documentation is not updated yet.

TeXML

Monday, March 7th, 2005

First of all, let me introduce TeXML:

TeXML is an XML vocabulary for TeX. The processor transforms TeXML markup into the TeX markup, escaping special and out-of-encoding characters. The intended audience is developers who automatically generate TeX files.

* home page: http://getfo.sourceforge.net/texml/
* SourceForge page: http://sourceforge.net/projects/getfo/

introduction

Sunday, March 6th, 2005

Why I’ve started the blog? What I plan to write?

The answer is simple: it’s all about the marketing of my software.

I work on several open source projects which are quite good. Unfortunately, they are not popular yet. I hope that if the blog becomes popular, then the projects become popular too.

The second. My projects are interrelated, but users sometimes miss it. The blog can make them aware of my other products.

The next point is the feedback. I can write about plans of development and possible features, and you can comment on it or suggest ideas.

There are some other reasons, but I can’t remember them.

The most comprehensive list of my projects is on my home page: http://uucode.com/
Some projects are listed on SourceForge: http://sourceforge.net/users/olpa/

By the way, I also have a blog in Russian language: http://www.livejournal.com/users/olpa/

And the last note: don’t hesitate to correct my English. I’d be happy to fix errors.