Archive for the ‘Generative XML’ Category

XML to SXML is probably works

Thursday, March 31st, 2005

(This post was written yesterday, but due to internet outage I’m posting it only now.) Exactly as I expected, implementation of conversion of attributes and namespace definitions has taken one working block. The second milestone is now near.

(more…)

bugfixing 2, coredump

Tuesday, March 29th, 2005

Other two bugs (see “bug list” and “bugfixing 1“) are interrelated. Use of the map is powerful, but dangerous.

(more…)

bugfixing 1, NULLs

Tuesday, March 29th, 2005

In the previous entry I mentioned three bugs to be fixed. In this post I describe the second issue.

(more…)

XML to SXML coredumped

Tuesday, March 29th, 2005

Surprisingly, conversion from libxml2 tree to SXML seems a trivial task now. Only attributes and namespaces are left.

(more…)

more “bad style” programming

Monday, March 28th, 2005

Using a global variable is a bad programming style, isn’t it? But I’ve just used it, and I’m sure it’s a right thing.

(more…)

XML to SXML conversion is taking off

Monday, March 28th, 2005

Several days ago I passed the first milestone in the generative XML work (conversion from XML to SXML). Reverse conversion looks simplier, but I just was not able to start working on this task.

(more…)

going to participate in GTTSE2005

Saturday, March 26th, 2005

SXML to libxml2 conversion is completed. what’s next?

Wednesday, March 23rd, 2005

All tests are passed, and now I think that conversion from SXML to libxml2 is working and stable.

(more…)

namespaces are finished

Wednesday, March 23rd, 2005

Namespace support in the SXML to libxml2 conversion is completed.

(more…)

make test

Wednesday, March 23rd, 2005

Before starting fixing the default namespace problem, I’ve created a test framework and added several tests. The results are as usual.

(more…)

default namespaces force new changes

Monday, March 21st, 2005

I decided to make sure that default namespaces are not a problem for the converter, and I found that actually it is the problem.

(more…)

redefinition of namespaces works

Sunday, March 20th, 2005

In one of the previous entries I wrote that conversion of namespaces is mostly working, but in some cases doesn’t. I’ve traced the reason, and it neither a bug in my code, neither a bug in libxml2. It’s just an interference of features.

(more…)

namespaces are completed at the first look, but then not

Sunday, March 20th, 2005

Yesterday I succeeded in writing a code for processing an auxiliary SXML list and reporting namespace definition pairs (prefix and URI). I thought I was still too far from finalizing namespaces, but today I’ve just finished it in less than half on hour. Unfortunately, the code has a bug.

(more…)

the namespaces problems are solved

Friday, March 18th, 2005

In the previous entry I wrote about the two faults in the conversion code. Now they are fixed. And the code have became a bit better.

Attributes are now also namespace aware.

I’m afraid of PIs, entities and other similar things be namespace aware. I hope not. Looking at the Namespaces in XML Specification… Well, all is ok. Namespaces are only for elements and attributes.

While coding, I looked into the file “tree.c” from the libxml2. The non-trivial code of “xmlNewProp” and “xmlNewNsProp” are nearly identical, but differ a bit. Going to report to the developers’ mailing list.

first attempt on namespaces conversion

Thursday, March 17th, 2005

The code splits an element’s full name on two parts, looks up in the tree for ns prefix or ns URI and creates a ns definition if not found. There is also a draft code to check that ns definitions are complete.

I specified the algorithm of conversion in previous entries, but the simplest example have revealed two weak points.

Error 1.

SXML: ... (a:b:c:d:doc "data") ...
XML: <x><a:b:c:d:doc>data</a:b:c:d:doc></x>

There is no namespace definition in the result and no any warning on the console. The error is that I check for ns completeness on returning from the attributes processing. In this case I have no attributes at all, and so have got no any warning.

Error 2.

The fake attributes set is added.

SXML: ... (a:b:c:d:doc (@) "data") ...
XML: <x xmlns:a:b:c:d=""><a:b:c:d:doc>data</a:b:c:d:doc></x>

The “xmlns” is attached to “x”, not to “a:b:c:d”. The reason is that ns definition can be created only together with attaching it to some node. I attach it to a context node, and at the moment when I’m going to create “doc”, the context node is “x.

The funny thing is that “xmlNewNode” expects an already created ns definition, and “xmlNewNs” expects an already created node. Just a sort of circular reference. Fortunately, it seems that this loop can be broken by using “xmlSetNs” a posteriori. Will see later.

Algorithm for SXML to libxml2 namespaces

Wednesday, March 16th, 2005

The previous entry “preparation for namespaces” was about presentation of namespaces in libxml2 and SXML. Now I’m going to describe the algorithm of translation from SXML to libxml2.

The general conversion algorithm uses top-bottom tree traversal, and XML nodes go in the next order:

1. Element name (probably with namespace prefix)
2. Attributes (probably with namespace prefixes)
3. Definition of namespaces
4. Children

The use of a namespace prefix can precede the definition of this namespace, and it is a big problem. Ideally, the converter should lookahead for (3) at the step (1). But I decided to “cheat”, and if an element or an attribite has a namespace, then I use the next approach:

A. A name has a namespace if the name contains the character “:”. The string after the last “:” is the element name, and the string before the last “:” is the ns-id (the namespace ID).
B. Suppose that the ns-id is a namespace prefix and call the libxml2 function “xmlSearchNs”.
C. If not found, suppose that the ns-id is a namespace URI and use the function “xmlSearchNsByHref”.
D. If still not found, create a dummy “xmlNs” with NULL “href” (it’s an incorrect value) and add it to the list of the ns definitions attacheda to the current element (even if we process an attribute (then use its element)).
E. Continue the traversal.

When a namespace definition is found, we add/replace the namespace node to/in the ns definiiton list attached to the element node in context.

There is a moment when element’s attributes and namespace definitions are processed, and the children are not started yet (technical detail: before returing from processing of the special node “@”). It’s a good chance to check for consistency of namespace definitions. The converter should warn about NULL “href”s and set “href=prefix” for them. Then it should check namespace usage in the attributes and update the “href”s according to the namespace definitions.

All this should work well for the examples of the SXML namespaces usage (see the previous article). The only uncovered thing is the “original prefix”. I’m ignoring it because it needs some effort to implement, but it should not appear in normal SXML<->libxml2 mapping.

preparation for namespaces

Tuesday, March 15th, 2005

Namespaces are one of the dark corners of the XML technologies. For SXML to XML conversion, I’ve investigated technical details on the namespace presentation in libxml2 and SXML. It seems that I’m not going to support all SXML features, and conversion will be a bit cheaty.

Namespaces in libxml2 is quite a simple thing, and it is documented.

There are several pre-docs on namespaces in SXML, with opinions:

* http://pair.com/lisovsky/xml/ns/
* http://sourceforge.net/mailarchive/forum.php?thread_id=759249&forum_id=599
* http://sourceforge.net/mailarchive/forum.php?thread_id=789156&forum_id=599

But the main source of information is the SXML specification. I’m not going to cite it. Instead, I’m going to demonstrate options on a simple example. The document:

<z:doc xmlns:z="a:b:c:d">data</z:doc>

It seems (I can’t check at the moment) that SSAX parser should produce:

(a:b:c:d:doc "data")

Although such names (”a:b:c:d:doc”) make no problem to Scheme, they are not so good for human beings. So it is posible to define a map from IDs to URIs and force parser to return namespace IDs instead of URIs. For example, having ID “w” for URI “a:b:c:d”, result is:

(w:doc (@ (@ w “a:b:c:d”)) “data”)

It’s much better, but it’s not good for reverse transformation because original prefix is lost. So there is a format to save the original prefix:

(w:doc (@ (@ w “a:b:c:d”) z) “data”)

Now it’s time to think about algorithm for SXML to libxml2 transformation.

entities appear

Saturday, March 12th, 2005

The SXML specification allows only one sort of entities, namely, unexpanded entities. I can’t find how to map them to the libxml2 tree, so converter just throws away *ENTITIES* nodes.

On the other side, there is not support for “normal” entities. So I decided to extend SXML specification, invented *REF* node, but finally switched to the Neil W. Van Dyke’s form:

The second element of the “&” form can be a string, symbol, or (for character ordinal values) a nonnegative integer:

(& “rArr”)
(& rArr)
(& 151)

I don’t think that allowing a symbol is a good idea, but decided to accept it too.

Conversion from SXML to XML for entities works now. Example:

<x:stylesheet
  xmlns:x = "http://www.w3.org/1999/XSL/Transform"
  xmlns:s = "http://uucode.com/xslt/scheme"
  x:extension-element-prefixes="s"
  version = "1.0">

<x:output indent="yes"/>

<x:template match="/">
  <s:scheme>
    '(article  (@ (id "hw" (&amp; 777)))
      (title “Hello”)
      (para
        “Hello, ”
        (&amp; entityI)
        (object “World”)
        (&amp; entutyII)
        “!”))
  </s:scheme>
</x:template>

</x:stylesheet>

This stylesheet produces:

<article id="hw̉“>
  <title>Hello</title>
  <para>Hello, &entityI;<object>World</object>&entutyII;!</para>
</article>

entities in SXML

Friday, March 11th, 2005

My letter to the ssax-sxml mailing list:

Hello,

the SXML specification allows only one sort of entities, namely, unexpanded entities. The only use for them is in XMLs like this:


But I want to use “normal” entities:

jjinr֦v쥩֬

Unfortunately, this sort of entities doesn’t fit to SXML specification. So I’m going to use my SXML dialect for them (*REF* is for “reference”):

v-vDj~

Any objection to the approach? Will you use the same method if you will need such entities?

Notes.

1. I found a letter in archives:

http://sourceforge.net/mailarchive/message.php?msg_id=7743681

It suggests trying a the existing SXML notation *ENTITY*. But, in my opinion, it’s just an ugly workaround, not a clean solution.

2. Please look at the sample XML above. One can’t convert it to SXML tree without losing information that the entity name was “figure1″ (but I don’t know if it is a problem or not).

PIs and comments

Friday, March 11th, 2005

My code now converts the special nodes *TOP*, *PI* and *COMMENT* to XML.

The *TOP* is just a mark for the tree root. The code just skips it and descends to children.

Conversion of *PI* was painful. The neet trick I used for attributes didn’t work. So I had to write a function to extract text content from SXML. But having the function, conversion of *COMMENT* was trivial.

Unfortunately, I’ve left a bug. At the last moment I noticed that I had to escape special XML characters manually. I’ll do it later.

The only SXML special nodes left are *ENTITY* and *NAMESPACES*. I’m frightened by the latter in advance. I’m afraid it will take a lot of time to implement all the gory details.

And, unexpectedly, *ENTITY* appeared to be a non-trivial task. It is used for unexpanded entity, but I want to have the normal entity! Going to complain in the ssax-sxml mailing list.