Archive for March, 2005

default namespaces force new changes

Monday, March 21st, 2005

I decided to make sure that default namespaces are not a problem for the converter, and I found that actually it is the problem.

(more…)

redefinition of namespaces works

Sunday, March 20th, 2005

In one of the previous entries I wrote that conversion of namespaces is mostly working, but in some cases doesn’t. I’ve traced the reason, and it neither a bug in my code, neither a bug in libxml2. It’s just an interference of features.

(more…)

GPCE

Sunday, March 20th, 2005

Homepage of the International Conference on
Generative Programming and Component Engineering (GPCE): http://gpce.org/

internet outage

Sunday, March 20th, 2005

Yesterday evening I want to have a nice time after the working week. Unfortunately, etherlink link was dows and Internet was absent. To make things worse, it’s unknown when I get the line back.

(more…)

namespaces are completed at the first look, but then not

Sunday, March 20th, 2005

Yesterday I succeeded in writing a code for processing an auxiliary SXML list and reporting namespace definition pairs (prefix and URI). I thought I was still too far from finalizing namespaces, but today I’ve just finished it in less than half on hour. Unfortunately, the code has a bug.

(more…)

Google AdSense

Friday, March 18th, 2005

I’ve decided to join the Google AdSense program. Now you can see advertisement blocks at the sidebar. Hope they are relevant, and that it may give me revenue.

(more…)

the namespaces problems are solved

Friday, March 18th, 2005

In the previous entry I wrote about the two faults in the conversion code. Now they are fixed. And the code have became a bit better.

Attributes are now also namespace aware.

I’m afraid of PIs, entities and other similar things be namespace aware. I hope not. Looking at the Namespaces in XML Specification… Well, all is ok. Namespaces are only for elements and attributes.

While coding, I looked into the file “tree.c” from the libxml2. The non-trivial code of “xmlNewProp” and “xmlNewNsProp” are nearly identical, but differ a bit. Going to report to the developers’ mailing list.

I’m not alone in TeXML development

Thursday, March 17th, 2005

After making a lot of contributions, Paul Tremblay have joined the TeXML project. Welcome!

first attempt on namespaces conversion

Thursday, March 17th, 2005

The code splits an element’s full name on two parts, looks up in the tree for ns prefix or ns URI and creates a ns definition if not found. There is also a draft code to check that ns definitions are complete.

I specified the algorithm of conversion in previous entries, but the simplest example have revealed two weak points.

Error 1.

SXML: ... (a:b:c:d:doc "data") ...
XML: <x><a:b:c:d:doc>data</a:b:c:d:doc></x>

There is no namespace definition in the result and no any warning on the console. The error is that I check for ns completeness on returning from the attributes processing. In this case I have no attributes at all, and so have got no any warning.

Error 2.

The fake attributes set is added.

SXML: ... (a:b:c:d:doc (@) "data") ...
XML: <x xmlns:a:b:c:d=""><a:b:c:d:doc>data</a:b:c:d:doc></x>

The “xmlns” is attached to “x”, not to “a:b:c:d”. The reason is that ns definition can be created only together with attaching it to some node. I attach it to a context node, and at the moment when I’m going to create “doc”, the context node is “x.

The funny thing is that “xmlNewNode” expects an already created ns definition, and “xmlNewNs” expects an already created node. Just a sort of circular reference. Fortunately, it seems that this loop can be broken by using “xmlSetNs” a posteriori. Will see later.

one year of a bug

Thursday, March 17th, 2005

A year ago I wrote a script for managing reusable components in FrameMaker. It has being expensively used. Saying more, it’s a core of a new workflow.

Recently users reported a bug. Their comments for components where not displaying.

After some exploration, I realized that comments never were used by the code. At the time of writing the script there were no comments and even no agreement on how to make comments, so I implemented a fallback behaviour (using the first paragraph of a component as the comment).

All the year the users were getting the fallback behavior, and it was right!

namespaces in TeXML

Wednesday, March 16th, 2005

I’ve got a feature request to add namespaces to TeXML. Reasons are very strong, so I can’t reject the proposal. Here is my answer:

I though about namespaces at the very beginning, and decided to avoid namespaces to simplify TeXML. The namespaces is a dark corner of XML technologies for many developers.

> the problem of trying to reconcile XML with a namespace and XML without a namespace.

Unbeatable point! Well, let assign a namespace URI for TeXML:

http://getfo.sourceforge.net/texml/

The namespace prefix is not important, but I think the best variant is “texml”. So a TeXML document may look like this:

<texml:TeXML xmlns:texml="http://getfo.sourceforge.net/texml/">...</texml>

I don’t want force user to use TeXML namespace, so I’m going to introduce modes of processing:

* namespace-aware, and
* no namespaces, as now.

About technical implementation. I’d like to update “glue_handler” in the file “handler.py”: let handler “startElementNS” validates “uri” (it should be None or TeXML’s URI), creates new attributes map as there are no namespaces (maybe it’s not required, need to read documentation), and delegates processing to “startElement”. The same is for “endElementNS”.

I suppose all these actions slow down the conversion, so I’d like to enable namespaces feature only on demand (for example, “-ns” command line flag).

What do you think about it?

sTeXme at EuroTeX 2005

Wednesday, March 16th, 2005

It seems sTeXme was mentioned at EuroTeX 2005. I’ve just stumbled upon the preprint (PDF) of Jonathan Fine’s talk “TeX Forever!”. He mentions sTeXme several times, and I’m very pleased with his writing. I couldn’t write better! Thanks you, Jonathan.

Algorithm for SXML to libxml2 namespaces

Wednesday, March 16th, 2005

The previous entry “preparation for namespaces” was about presentation of namespaces in libxml2 and SXML. Now I’m going to describe the algorithm of translation from SXML to libxml2.

The general conversion algorithm uses top-bottom tree traversal, and XML nodes go in the next order:

1. Element name (probably with namespace prefix)
2. Attributes (probably with namespace prefixes)
3. Definition of namespaces
4. Children

The use of a namespace prefix can precede the definition of this namespace, and it is a big problem. Ideally, the converter should lookahead for (3) at the step (1). But I decided to “cheat”, and if an element or an attribite has a namespace, then I use the next approach:

A. A name has a namespace if the name contains the character “:”. The string after the last “:” is the element name, and the string before the last “:” is the ns-id (the namespace ID).
B. Suppose that the ns-id is a namespace prefix and call the libxml2 function “xmlSearchNs”.
C. If not found, suppose that the ns-id is a namespace URI and use the function “xmlSearchNsByHref”.
D. If still not found, create a dummy “xmlNs” with NULL “href” (it’s an incorrect value) and add it to the list of the ns definitions attacheda to the current element (even if we process an attribute (then use its element)).
E. Continue the traversal.

When a namespace definition is found, we add/replace the namespace node to/in the ns definiiton list attached to the element node in context.

There is a moment when element’s attributes and namespace definitions are processed, and the children are not started yet (technical detail: before returing from processing of the special node “@”). It’s a good chance to check for consistency of namespace definitions. The converter should warn about NULL “href”s and set “href=prefix” for them. Then it should check namespace usage in the attributes and update the “href”s according to the namespace definitions.

All this should work well for the examples of the SXML namespaces usage (see the previous article). The only uncovered thing is the “original prefix”. I’m ignoring it because it needs some effort to implement, but it should not appear in normal SXML<->libxml2 mapping.

preparation for namespaces

Tuesday, March 15th, 2005

Namespaces are one of the dark corners of the XML technologies. For SXML to XML conversion, I’ve investigated technical details on the namespace presentation in libxml2 and SXML. It seems that I’m not going to support all SXML features, and conversion will be a bit cheaty.

Namespaces in libxml2 is quite a simple thing, and it is documented.

There are several pre-docs on namespaces in SXML, with opinions:

* http://pair.com/lisovsky/xml/ns/
* http://sourceforge.net/mailarchive/forum.php?thread_id=759249&forum_id=599
* http://sourceforge.net/mailarchive/forum.php?thread_id=789156&forum_id=599

But the main source of information is the SXML specification. I’m not going to cite it. Instead, I’m going to demonstrate options on a simple example. The document:

<z:doc xmlns:z="a:b:c:d">data</z:doc>

It seems (I can’t check at the moment) that SSAX parser should produce:

(a:b:c:d:doc "data")

Although such names (”a:b:c:d:doc”) make no problem to Scheme, they are not so good for human beings. So it is posible to define a map from IDs to URIs and force parser to return namespace IDs instead of URIs. For example, having ID “w” for URI “a:b:c:d”, result is:

(w:doc (@ (@ w “a:b:c:d”)) “data”)

It’s much better, but it’s not good for reverse transformation because original prefix is lost. So there is a format to save the original prefix:

(w:doc (@ (@ w “a:b:c:d”) z) “data”)

Now it’s time to think about algorithm for SXML to libxml2 transformation.

entities appear

Saturday, March 12th, 2005

The SXML specification allows only one sort of entities, namely, unexpanded entities. I can’t find how to map them to the libxml2 tree, so converter just throws away *ENTITIES* nodes.

On the other side, there is not support for “normal” entities. So I decided to extend SXML specification, invented *REF* node, but finally switched to the Neil W. Van Dyke’s form:

The second element of the “&” form can be a string, symbol, or (for character ordinal values) a nonnegative integer:

(& “rArr”)
(& rArr)
(& 151)

I don’t think that allowing a symbol is a good idea, but decided to accept it too.

Conversion from SXML to XML for entities works now. Example:

<x:stylesheet
  xmlns:x = "http://www.w3.org/1999/XSL/Transform"
  xmlns:s = "http://uucode.com/xslt/scheme"
  x:extension-element-prefixes="s"
  version = "1.0">

<x:output indent="yes"/>

<x:template match="/">
  <s:scheme>
    '(article  (@ (id "hw" (&amp; 777)))
      (title “Hello”)
      (para
        “Hello, ”
        (&amp; entityI)
        (object “World”)
        (&amp; entutyII)
        “!”))
  </s:scheme>
</x:template>

</x:stylesheet>

This stylesheet produces:

<article id="hw̉“>
  <title>Hello</title>
  <para>Hello, &entityI;<object>World</object>&entutyII;!</para>
</article>

Making FM menus

Friday, March 11th, 2005

It’s hard to create menu scripts for FrameMaker, even with help of FrameScript because there are a lot of details. So several months ago I wrote a generator of FrameScript (fslgen). Now I create XML-file with description of desired menu structure, describe handlers and properties, and the generator does all the dirty work.

When the software appeared first, it was a kind of experiment and just a tool to simplify my work. But now the generator is so essential part of the development process, that I can’t understand how I programmed without it.

Right now I’ve finished alpha version of a maker of folios. There are a lot of PDFs possible which differ only in variants. In my solution, the possible variants and the differences are described in FrameMaker itself. fslgen takes the description and produces appropriate menus and scripts. A screenshot (click here for better view):

generated FrameMaker menu

Quite impressive, isn’t it? And what’s the more important — users can change the menu structure without a programmer!

entities in SXML

Friday, March 11th, 2005

My letter to the ssax-sxml mailing list:

Hello,

the SXML specification allows only one sort of entities, namely, unexpanded entities. The only use for them is in XMLs like this:


But I want to use “normal” entities:

jjinr֦v쥩֬

Unfortunately, this sort of entities doesn’t fit to SXML specification. So I’m going to use my SXML dialect for them (*REF* is for “reference”):

v-vDj~

Any objection to the approach? Will you use the same method if you will need such entities?

Notes.

1. I found a letter in archives:

http://sourceforge.net/mailarchive/message.php?msg_id=7743681

It suggests trying a the existing SXML notation *ENTITY*. But, in my opinion, it’s just an ugly workaround, not a clean solution.

2. Please look at the sample XML above. One can’t convert it to SXML tree without losing information that the entity name was “figure1″ (but I don’t know if it is a problem or not).

PIs and comments

Friday, March 11th, 2005

My code now converts the special nodes *TOP*, *PI* and *COMMENT* to XML.

The *TOP* is just a mark for the tree root. The code just skips it and descends to children.

Conversion of *PI* was painful. The neet trick I used for attributes didn’t work. So I had to write a function to extract text content from SXML. But having the function, conversion of *COMMENT* was trivial.

Unfortunately, I’ve left a bug. At the last moment I noticed that I had to escape special XML characters manually. I’ll do it later.

The only SXML special nodes left are *ENTITY* and *NAMESPACES*. I’m frightened by the latter in advance. I’m afraid it will take a lot of time to implement all the gory details.

And, unexpectedly, *ENTITY* appeared to be a non-trivial task. It is used for unexpanded entity, but I want to have the normal entity! Going to complain in the ssax-sxml mailing list.

libxml2 tree: NULL vs empty string

Friday, March 11th, 2005

In libxml2, C calls

xmlNewPI("name", NULL);
xmlNewPI("name", "");

produce the following processing instructions

<?name?>
<?name ?>

(notice the space in the second case). At the first moment I though that I shouldn’t allow the first variant. But then reminded that in XML absence of node and empty node are sometimes very different things. So let both variants be.

Then I tested two variants for another node type:

xmlNewComment(NULL);
xmlNewComment("");

The second case produced the expected result

<---->

but the first variant seems broken. In serialized XML, I have no comments at all!

Anyway, I decided to have difference between nothing and empty string. I feel it can be used somehow,

following bad practises

Friday, March 11th, 2005

Some days ago I used “goto”s in my C code. Now that C function have got a new bad feature: a set of local variables named “scm”, “scm2″, “scm3″, “scm4″. Anyway, I still think that the code is good.