string value of entities

The XPath specification doesn't define what is the string value of an entity. Even more, the data model for XPath doesn't have entities. But in XSieve I do have entities.

Consider the XML fragment:

<data>aaa&mdash;bbb</data>

What is the string value of the fragment above? I see three variants:

1) Entity has no value at all. Result: "aaabbb".
2) Entity is written as is. Result: "aaa&mdash;bbb".
3) The value is the value of the expansion. Result is something like "aaa---bbb".

In my opinion, the third were the best if were possible. But the use of the phrase "something like" identifies the main problem with the approach. Sometimes (I'd even say often) we don't know how to expand an entity.

At the moment, "x:string" of XSieve is implemented through calling an libxml function, and that function doesn't support entity nodes. Therefore, "x:string" currently returns "aaabbb". I dislike it.

I think about implementing the second approach, writing entities as is. However, two issues bother me.

1) Consistency. If the string value of an entity is something like "&mdash;", why the string value of an attribute isn't "aname='aval'", the string value of a processing instruction isn't "< ?pi target?>", and so on?

2) Escaping for HTML. Imagine a PHP developer writing the code:

echo htmlspecialchars(string_value(...XML fragment above...))

Result is:

aaa&amp;mdash;bbb

And browsers show it as:

aaa&mdash;bbb

Is it the right thing?

Categories: XSieve

Updated: