You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-users@xml.apache.org by Richard Dallaway <ri...@dallaway.com> on 2002/02/15 20:08:29 UTC

entities in getMembersAsResource()

I've stumbled across a behaviour I don't quite understand.  I've been 
executing XPath queries against rc1 (using jdk 1.3.1, Redhat Linux 7.1) 
and looking at the results via DOM after using ResourceSet's 
getMembersAsResource(). One of the nodes I've been expecting to get back 
should have the body of "Harry Potter and the Philosopher&apos;s Stone" 
but it comes back as "Harry Potter and the Philosopher" (i.e., truncated 
at the &apos;).  Has anyone experienced anything like this?

In detail...

My XML documents contain elements like this:

<title>Harry Potter and the Philosopher's Stone</title>

When I use the command line tools, I see results like:

xindice xpath -c /db/books -q 
"/book[body/p[contains(.,'experience')]]/title"

<?xml version="1.0"?>
<title xmlns:src="http://xml.apache.org/xindice/Query" 
src:col="/db/books" src:key="potter.xml">Harry Potter and the 
Philosopher&apos;s Stone</title>
<?xml version="1.0"?>
<title xmlns:src="http://xml.apache.org/xindice/Query" 
src:col="/db/books" src:key="whywebuy.xml">Why We Buy: The Science of 
Shopping</title>

When I code up that query in Java I end up with code like this:

// A method to run the query and return a DOM:
public Document getResultsAsDOM() throws XMLDBException
{
XPathQueryService service =
	(XPathQueryService)collection.getService("XPathQueryService", "1.0");
ResourceSet resultSet = service.query(xpath);
		
// Any results?
if (resultSet == null || resultSet.getSize() == 0)
	return null;

// We want all the results as an XML document:		
Resource xml =resultSet.getMembersAsResource();
		
// Sanity check:
if (xml.getResourceType() != XMLResource.RESOURCE_TYPE)
   throw new XMLDBException(ErrorCodes.VENDOR_ERROR, "Unexpected result 
type");
		
return ((XMLResource)xml).getContentAsDOM().getOwnerDocument();
}

And then I call the above method to get a DOM and test the results:

// We know what the titles are:
String title1 = "Harry Potter and the Philosopher's Stone";
String title2 = "Why We Buy: The Science of Shopping";

assertEquals("Wrong second title", title2, 
titles.item(1).getFirstChild().getNodeValue());

assertEquals("Wrong first title", title1, 
titles.item(0).getFirstChild().getNodeValue());
	
.... and I get a failure on this last assert (the one for the  Harry 
Potter title):

Wrong first title expected:<Harry Potter and the Philosopher's Stone> 
but was:<Harry Potter and the Philosopher>

The DOM seems fine (two results in it, as expected... the "Why we Buy" 
test passes).  So I'm wondering if there's something I don't understand 
about the handling of '/&apos;.

NB. If I remove the ' from the title in my original XML and reimport the 
file, there are no problems.

Any clues much appreciated
Richard

Re: Tomcat4.02 and Addressbook example

Posted by Jane Riese <jr...@lanl.gov>.

Mark -

To solve the problem I moved the xalan jar from the
addressbook/WEB-INF/lib directory to the common/lib
directory.  It seems that then Tomcat could find the
XObject class used in the Addressbook example.
Perhaps the README file in the Xindice java/examples/Addressbook
should reflect this change.


Thanks for your JWhich program - will try it out next time I run
into this situation.

jane

Re: entities in getMembersAsResource()

Posted by Richard Dallaway <ri...@dallaway.com>.

Ah, yes, of course....  Thanks for the quick reply.   In this case, to 
get the behaviour I want I can use the w3c Node's normalize() method.

Thanks
Richard

Krzysztof Kowalczykiewicz wrote:
> Hi!
> 
> Isn't it like that, that parser creates distinct nodes for XML entities
> (like &apos;), so probably Harry Potter title is represented in DOM as 3
> nodes:
> text node: Harry Potter and the Philosopher
> entity node: &apos;
> text node: s Stone
> 
> iterate through all children and concatenate.
> 
> I'm not sure, but I think it's like that.
> 
> regards
> 
> Krzysztof Kowalczykiewicz

Re: Re[2]: entities in getMembersAsResource()

Posted by Krzysztof Kowalczykiewicz <kr...@cs.put.poznan.pl>.

jeszcze nie, sprobuje i zobacze jakie to czasochlonne
----- Original Message ----- 
From: "Dawid Weiss" <Da...@cs.put.poznan.pl>
To: "Krzysztof Kowalczykiewicz" <xi...@xml.apache.org>
Sent: Friday, February 15, 2002 11:28 PM
Subject: Re[2]: entities in getMembersAsResource()


> 
> Widze ze jestes aktywny :)) I jak zripowales ten video cd do avika? Dawid
> 
>

Re[2]: entities in getMembersAsResource()

Posted by Dawid Weiss <Da...@cs.put.poznan.pl>.

Widze ze jestes aktywny :)) I jak zripowales ten video cd do avika? Dawid

Re: entities in getMembersAsResource()

Posted by Krzysztof Kowalczykiewicz <kr...@cs.put.poznan.pl>.

Hi!

Isn't it like that, that parser creates distinct nodes for XML entities
(like &apos;), so probably Harry Potter title is represented in DOM as 3
nodes:
text node: Harry Potter and the Philosopher
entity node: &apos;
text node: s Stone

iterate through all children and concatenate.

I'm not sure, but I think it's like that.

regards

Krzysztof Kowalczykiewicz

>
> assertEquals("Wrong second title", title2,
> titles.item(1).getFirstChild().getNodeValue());
>
> assertEquals("Wrong first title", title1,
> titles.item(0).getFirstChild().getNodeValue());
>
> .... and I get a failure on this last assert (the one for the  Harry
> Potter title):
>
> Wrong first title expected:<Harry Potter and the Philosopher's Stone>
> but was:<Harry Potter and the Philosopher>
>
> The DOM seems fine (two results in it, as expected... the "Why we Buy"
> test passes).  So I'm wondering if there's something I don't understand
> about the handling of '/&apos;.
>
> NB. If I remove the ' from the title in my original XML and reimport the
> file, there are no problems.
>
> Any clues much appreciated
> Richard
>
>
>