draft text on XML, XPath, S-expressions
I'm writing a paper (not for GTTSE2005 as the previous entry may suggest). Here is a bit of the text written today.
XML
Extensible Markup Language (XML, ref) is a text format
which helps to represent a logical structure in documents.
Example of a possible document in the XML format:
<article id="hw"> <title>Hello</title> <para>Hello <object>World</object>!</para> </article>
Ignoring details, one can said that XML documents consist
of elements and attributes. Elements specify named text
extents and attibutes curry properties. Elements should be
properly nested, so any XML can be mapped to a tree.
One of such mappings is described in the Document Object Model
(DOM, ref) standard.
One of the keys in the success of XML is interoperability.
When different legacy systems talk to each other, they usually
speak in XML. Any data can be mapped to XML, but XML is the
most natural for representing hierarchical data.
Another important side of XML is the plenty of standards.
They cover most of issues related to data presentation,
validation, transformation and exchange.
There are tools designed to work with XML, and sometimes
instead of processing data inside an application itself,
it's simplier to export data to XML, process them using
XML tools and import the result back.
One of such tools are those which implement XPath (ref) and
XSLT (ref) standards. XPath is a language for navigation over
XML trees, and XSLT is a template language for XML tree
transformations.
In this paper we concentrate mainly on XPath and leave
XSLT for future work.
XPath
XPath(ref) is a language for navigating, matching and querying
on XML data.
An XPath expression is a series of location steps divided by
the slash symbol (/). A location step has three parts:
- an axis, which specifies the tree relationship,
- a node test, which specifies the node type, and
- zero or more predicates, which use arbitrary expressions to filter nodes.
Example of an XPath expression:
/child::article[attribute::id='hw']/child::para
This XPath consist of two location steps. The first step uses
child axis to select all children of the root node, the node
test filters those which are named article, and the predicate
filters those which have an attribute id set to the value hw.
The second step selects all the paragraphs (elements para) of
the article selected.
There are also a number of syntactic abbreviations that allow common
cases to be expressed concisely. For example, the previous XPath
can be also written as:
/article[@id='hw']/para
A remarkable feature about XPath is that bloating tree-walking
code in conventional languages can be replaced by short
expressive XPath expressions.
Lisp approaches
Lisp is a family of computer languages. The most popular
dialects are Common Lisp (ref) and Scheme(ref).
In Lisp, data is representing in so-called S-expressions(ref),
or Symbolic Expressions, or "sexps", defined recursively.
Just as XML, S-expressions can represent any data structures,
including hierarchical, and Lisp is good in processing them.
There are a lot of discussions on comparing S-expressions,
Lisp and XML (for example, see ref and ref). In our work
we use a combination of useful features of XML and Lisp worlds.