You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-dev@xml.apache.org by Kevin O'Neill <ke...@rocketred.com.au> on 2003/07/29 13:59:57 UTC

Queries and namespaces.

So I'm back doing some work with xindice and I've come across a couple of
issues with regards to query results.

Namespace searches work the appropriate nodes are selected. The problem is
that the namespace declarations are missing from the returned elements.
For example if you store a document

<a xmlns="example.com">
  <b>foo</b>
</a>

then query /x:a/x:b (with x mapped to "example.com")  

<b>foo</b> will be returned and the namespace of b is lost.

Store the following:

<n:a xmlns:n="example.com">
  <n:b>foo</n:b>
</n:a>

then query /x:a/x:b (with x mapped to "example.com")  

<n:b>foo</n:b> will be returned. Calling getContentAsDOM() on the
XMLResource returned throw an exception indicating that the prefix n has
not been declared.

There's also an issue if the result is not an element as CollectionImpl in
the embeded database attempts to cast the resulting node to an Element.
This is wasy to avoid by checking the type of the node before casting it.

I'm a little unsure of how to handle the issue of the namespaces for the
element. Any ideas?

-k.

-k.


Re: Queries and namespaces.

Posted by Kevin O'Neill <ke...@rocketred.com.au>.
> I believe there is a better solution though. At the moment a query result
> is converted to a string when the result XMLResource is created. If the
> node was stored stored as a node the the namespace information may not be
> lost. I'm looking at the code today to see if this is a workable solution.

The xmlrpc driver solves this issue nicely. I'm going to build a patch
around the techniques it uses.

-k.


Re: Queries and namespaces.

Posted by Kevin O'Neill <ke...@rocketred.com.au>.
> This is obviously a bigger issue than simply querying within Xindice, as
> it has more to do with XML Namespaces and default attributes. Generally
> when you do a query absent any care over XML Namespace, you already know
> the namespace. Because 'xmlns' is merely from the XML 1.0 POV a default
> attribute, you wouldn't necessarily want all the defaulted attributes to
> be present on a query result (e.g., some HTML elements have quite a
> few). 'xmlns' is a *bit* different than other attributes and I wouldn't
> be surprised to see implementations vary in their handling of it.

I think for the majority of cases it would be a matter of the engine being
able to record the namespace declarations "in play" at the target element.
These namespace declarations would need to be added to the result node.
Additions and changes in namespace declarations on child elements would
take care of themselves.

This wont always work for documents validated with a DTD, as the namespace
declaration could well be within an external external entity decalred in a
dtd.

I believe there is a better solution though. At the moment a query result
is converted to a string when the result XMLResource is created. If the
node was stored stored as a node the the namespace information may not
be lost. I'm looking at the code today to see if this is a workable solution.

> I'm not sure I'd consider that the 'xmlns' was "lost", as it's not (as
> above) really a normal attribute. By extension, if you queried and
> returned a large XML document, you'd not want 'xmlns' attributes on
> *all* the returned elements (I wouldn't think).

I agree, that you don't want all the element nodes to have all the
possible namespace declarations. 

> The real answer to this problem is probably fairly elusive as handling
> of 'xmlns' has historically been pretty murky.
> 
> You might try this with explicit prefixes and a short DTD to see if you
> get the results you want.

My work around is to standardize the prefixes used in documents. I have a
small patch that adds the namespaces used in the query to the result node
(if it's an element). This works in my case, but is not a real solution.
This is the reason I haven't submitted it as a patch. I think something
along the lines I have set out above though may be possible.

> Again, I'd try this with a short DTD and explicit attributes to see if
> your results improve, say, something like:
> 
>   <!DOCTYPE n:a [
>     <!ELEMENT n:a ( n:b )* >
>     <!ATTLIST n:a
>        xmlns:n   CDATA   #FIXED  "http://example.com/"
>     >
>     <!ELEMENT n:b ( PCDATA )* >
>     <!ATTLIST n:b
>        xmlns:n   CDATA   #FIXED  "http://example.com/"
>     >
>   ]>
>   <n:a xmlns:n="http://example.com/">
>     <n:b xmlns:n="http://example.com/">foo</n:b>
>   </n:a>
> 
> 

Unfortunatly having a DTD decarled doesn't help, as the result node has no
idea that the DTD even exists :(. Thanks for your thoughts though :). I'm
sure this is solvable with a little thought and a bit of coding. The
xindice code is so easy to work with in the majority of cases.

-k.


Re: Queries and namespaces.

Posted by Murray Altheim <m....@open.ac.uk>.
Kevin O'Neill wrote:
> So I'm back doing some work with xindice and I've come across a couple of
> issues with regards to query results.
> 
> Namespace searches work the appropriate nodes are selected. The problem is
> that the namespace declarations are missing from the returned elements.
> For example if you store a document
> 
> <a xmlns="example.com">
>   <b>foo</b>
> </a>
> 
> then query /x:a/x:b (with x mapped to "example.com")  
> 
> <b>foo</b> will be returned and the namespace of b is lost.

This is obviously a bigger issue than simply querying within Xindice,
as it has more to do with XML Namespaces and default attributes.
Generally when you do a query absent any care over XML Namespace,
you already know the namespace. Because 'xmlns' is merely from the
XML 1.0 POV a default attribute, you wouldn't necessarily want all the
defaulted attributes to be present on a query result (e.g., some HTML
elements have quite a few). 'xmlns' is a *bit* different than other
attributes and I wouldn't be surprised to see implementations vary
in their handling of it.

I'm not sure I'd consider that the 'xmlns' was "lost", as it's not
(as above) really a normal attribute. By extension, if you queried
and returned a large XML document, you'd not want 'xmlns' attributes
on *all* the returned elements (I wouldn't think).

The real answer to this problem is probably fairly elusive as handling
of 'xmlns' has historically been pretty murky.

You might try this with explicit prefixes and a short DTD to see if
you get the results you want.

 > Store the following:
 >
 > <n:a xmlns:n="example.com">
 >   <n:b>foo</n:b>
 > </n:a>
 >
 > then query /x:a/x:b (with x mapped to "example.com")
 >
 > <n:b>foo</n:b> will be returned. Calling getContentAsDOM() on the
 > XMLResource returned throw an exception indicating that the prefix n has
 > not been declared.
 >
 > There's also an issue if the result is not an element as CollectionImpl in
 > the embeded database attempts to cast the resulting node to an Element.
 > This is wasy to avoid by checking the type of the node before casting it.
 >
 > I'm a little unsure of how to handle the issue of the namespaces for the
 > element. Any ideas?

Again, I'd try this with a short DTD and explicit attributes to see if
your results improve, say, something like:

  <!DOCTYPE n:a [
    <!ELEMENT n:a ( n:b )* >
    <!ATTLIST n:a
       xmlns:n   CDATA   #FIXED  "http://example.com/"
    >
    <!ELEMENT n:b ( PCDATA )* >
    <!ATTLIST n:b
       xmlns:n   CDATA   #FIXED  "http://example.com/"
    >
  ]>
  <n:a xmlns:n="http://example.com/">
    <n:b xmlns:n="http://example.com/">foo</n:b>
  </n:a>


Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

           "The current and future international political
            environment severely constrains this country's
            ability to conduct long-range strike missions." -- DARPA
            http://news.bbc.co.uk/1/hi/world/americas/3035332.stm