You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-users@xml.apache.org by Jayaram Narayana <na...@india.hp.com> on 2002/04/25 18:30:20 UTC

XPath details [Repeat Posting]

=============================================================
this question has been posted to the xindice dev. forum also.
=============================================================

hi there,

using an xpath query returns a lot more information than what's actually
required. example, given an element that looks like this:

<aaa>
	<bbb>
	<ccc value = "Hello">
</aaa>

and an xpath query like this:
"//ccc [@value]"

will return a big result like this:

<aaa xmlns:src="http://xml.apache.org/xindice/Query" src:col="/db/zzz/yyy"
src:key="eee">
 <ccc value="Hello" />
</aaa>

all i am interested in is the value of the attribute "value" (Hello). how
can i get JUST that directly, and nothig else? i do not want to parse the
DOM for this.

TIA,
-nani

Re: XPath details [Repeat Posting]

Posted by Dawid Weiss <da...@go2.pl>.

JN> all i am interested in is the value of the attribute "value" (Hello). how

If  you  REALLY don't want to parse the result, consider applying a regular
expression.  In  some cases (when the pattern has been precompiled) it will
work faster than parsing the XML to JDOM, for instance.

Bad  programming  style?  Of  course  it  is.  If you really crave for high
performance though...

Dawid

Re: XPath details [Repeat Posting]

Posted by "Mark J. Stang" <ma...@earthlink.net>.

I thought the regexp suggestion was great!   In an earlier question, it
didn't even occur to me that use regexp, I couldn't think of an alternative.
The response is a simple XML document, so regexp would be a simple
and fast alternative.   I have always done it the hard way.   I wrote a simple
XML parser and decided it was easier to let someone else do it ;-).

I intended to use a DOM/JDOM implementation to parse a really complex
document.   The more I looked at SAX, the "simpler" it became.   Then
I tried it on some simpler documents and figured out it is the easiest way
to go.   I did some timings and am getting anywhere from 8/10-90 ms to
parse a document, depending on the complexity.   All it really does is
walk down an array, notifying you when the tags begin and end, so it has
some bookkeeping overhead.

I have found that DOM/JDOM has a certain amount of mental overhead.
I was walking the tree looking for certain values, almost like a state
transistion.   And along the way I had to check every child to make sure
it wasn't null.   Now I only use DOM type trees for data that has to be
modified.

Mark

Dawid Weiss wrote:

> MJS> If you look at how the SAX parser works, it takes the "document"
> MJS> and steps through it like an array.
>
> Oh,   I   know  how SAX works,  but I'd be curious how the speed
> of  even  the  fastest  SAX  engines  compare  to  a  precompiled  regular
> expression.  It's  a  vain discussion though, of course I admit SAX is both
> nicer  and less error prone than a regular expression way. I just mentioned
> the   regexp  to  satisfy  people's  natural curiosity on HOW THINGS CAN BE
> DONE. You know what I mean? :)
>
> Dawid

Re[2]: XPath details [Repeat Posting]

Posted by Dawid Weiss <da...@go2.pl>.

MJS> If you look at how the SAX parser works, it takes the "document"
MJS> and steps through it like an array.

Oh,   I   know  how SAX works,  but I'd be curious how the speed
of  even  the  fastest  SAX  engines  compare  to  a  precompiled  regular
expression.  It's  a  vain discussion though, of course I admit SAX is both
nicer  and less error prone than a regular expression way. I just mentioned
the   regexp  to  satisfy  people's  natural curiosity on HOW THINGS CAN BE
DONE. You know what I mean? :)

Dawid

Re: XPath details [Repeat Posting]

Posted by "Mark J. Stang" <ma...@earthlink.net>.

If you look at how the SAX parser works, it takes the "document"
and steps through it like an array.   You can write a "handler", which
is a single class in about 20 lines of code.   I tried the JDOM for some
of my documents, but I had to keep working down the tree, checking for
null, etc.   With the search results, you are only interested in a couple of
tags, all it takes is a couple of ifs.  The nice part is you don't have to put
in any tree logic.

I have some timing checks and all of these are fast, so unless you are doing
it constantly, speed is not an issue.

HTH,

Mark

Unknown wrote:

> JN> i am StringTokenizing and parsing the xpathquery result. finally! only now i
> JN> am positive that what i knew is correct :-)
>
> Nanni,  StringTokenizer  may  be  in  some  cases  slower  than  a  regular
> expression,  which  is  mostly  automaton-based  transitions between states
> (i.e.  pretty  fast  ;).  More, a regular expression pattern you can easily
> throw  out to properties or some other external configuration, while string
> tokenizing  will embed the logic of extracting that information you need in
> the code.
>
> You know what fits your needs best of course.
>
> Cheers,
> Dawid

Re[2]: XPath details [Repeat Posting]

Posted by Dawid Weiss <da...@go2.pl>.

JN> i am StringTokenizing and parsing the xpathquery result. finally! only now i
JN> am positive that what i knew is correct :-)

Nanni,  StringTokenizer  may  be  in  some  cases  slower  than  a  regular
expression,  which  is  mostly  automaton-based  transitions between states
(i.e.  pretty  fast  ;).  More, a regular expression pattern you can easily
throw  out to properties or some other external configuration, while string
tokenizing  will embed the logic of extracting that information you need in
the code.

You know what fits your needs best of course.

Cheers,
Dawid

RE: XPath details [Repeat Posting]

Posted by Jayaram Narayana <na...@india.hp.com>.

mark, carsten and dawid,

thanks a ton guys, for your answers.

i am StringTokenizing and parsing the xpathquery result. finally! only now i
am positive that what i knew is correct :-)

-nani

-----Original Message-----
From: Mark J. Stang [mailto:markstang@earthlink.net]
Sent: Thursday, April 25, 2002 11:35 PM
To: xindice-users@xml.apache.org
Subject: Re: XPath details [Repeat Posting]

nani,
The result of an XPathQueryResult is a document.   You have to parse that
resulting document to get your result.   The result will contain the key
that
matches your XPath Query.   If you search on an attribute, you will get back
the element that contains that attribute.  Therefore, a search on
//*/ccc[@value]
will return an XML document that contains the <ccc value="Hello"></ccc> the
collection it came from and the key to the document that contains the
element.

You will have to parse the document to get the key and the "attributes">.
Since the format is the same, this can be done fairly easily with a SAX
parser.
I know of no other way, nature of the beast...

HTH,

Mark

Jayaram Narayana wrote:

> =============================================================
> this question has been posted to the xindice dev. forum also.
> =============================================================
>
> hi there,
>
> using an xpath query returns a lot more information than what's actually
> required. example, given an element that looks like this:
>
> <aaa>
>         <bbb>
>         <ccc value = "Hello">
> </aaa>
>
> and an xpath query like this:
> "//ccc [@value]"
>
> will return a big result like this:
>
> <aaa xmlns:src="http://xml.apache.org/xindice/Query" src:col="/db/zzz/yyy"
> src:key="eee">
>  <ccc value="Hello" />
> </aaa>
>
> all i am interested in is the value of the attribute "value" (Hello). how
> can i get JUST that directly, and nothig else? i do not want to parse the
> DOM for this.
>
> TIA,
> -nani

Re: XPath details [Repeat Posting]

Posted by Carsten Ziegert <ca...@ik.fh-hannover.de>.

> I know of no other way, nature of the beast...

A more elegant way is transforming the result in JDOM
- http://www.jdom.org (or another DOM) and using the
associated API.
Example:

import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
import org.jdom.JDOMException;

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;

...

ResourceSet resultSet = this.service.query(this.xpathQuery);
ResourceIterator results = resultSet.getIterator();
while (results.hasMoreResources()) {
   Resource res = results.nextResource();
   String resu = (String) res.getContent();

   // create JDOM Element
   Element resultElement = jdomEl(resu);
   ...
}


private Element jdomEl(String xmlInput) throws JDOMException {
   Element el = new Element("dummy");
   try {
      SAXBuilder builder = new SAXBuilder(false);
      Document doc = builder.build(new StringReader(xmlInput));
      el = (Element)doc.getRootElement().clone();
   } catch (JDOMException e) {
      e.printStackTrace();
   } finally {
      return el;
   }
}



Carsten



Am 25.04.2002 20:04 Uhr schrieb "Mark J. Stang" unter
<ma...@earthlink.net>:

> nani,
> The result of an XPathQueryResult is a document.   You have to parse that
> resulting document to get your result.   The result will contain the key that
> matches your XPath Query.   If you search on an attribute, you will get back
> the element that contains that attribute.  Therefore, a search on
> //*/ccc[@value]
> will return an XML document that contains the <ccc value="Hello"></ccc> the
> collection it came from and the key to the document that contains the element.
> 
> You will have to parse the document to get the key and the "attributes">.
> Since the format is the same, this can be done fairly easily with a SAX
> parser.
> I know of no other way, nature of the beast...
> 
> HTH,
> 
> Mark
> 
> Jayaram Narayana wrote:
> 
>> =============================================================
>> this question has been posted to the xindice dev. forum also.
>> =============================================================
>> 
>> hi there,
>> 
>> using an xpath query returns a lot more information than what's actually
>> required. example, given an element that looks like this:
>> 
>> <aaa>
>>         <bbb>
>>         <ccc value = "Hello">
>> </aaa>
>> 
>> and an xpath query like this:
>> "//ccc [@value]"
>> 
>> will return a big result like this:
>> 
>> <aaa xmlns:src="http://xml.apache.org/xindice/Query" src:col="/db/zzz/yyy"
>> src:key="eee">
>>  <ccc value="Hello" />
>> </aaa>
>> 
>> all i am interested in is the value of the attribute "value" (Hello). how
>> can i get JUST that directly, and nothig else? i do not want to parse the
>> DOM for this.
>> 
>> TIA,
>> -nani
> 
> 


--

Medizinische Hochschule Hannover                    Fachhochschule Hannover
Abt. Hämatologie und Onkologie     FB Informations- und Kommunikationswesen
Carl-Neuberg-Straße 1                               Ricklinger Stadtweg 120
30625 Hannover                                               30459 Hannover

                           ++49-511-9296-1650
                    http://summit-bmt.fh-hannover.de

Re: XPath details [Repeat Posting]

Posted by "Mark J. Stang" <ma...@earthlink.net>.

nani,
The result of an XPathQueryResult is a document.   You have to parse that
resulting document to get your result.   The result will contain the key that
matches your XPath Query.   If you search on an attribute, you will get back
the element that contains that attribute.  Therefore, a search on
//*/ccc[@value]
will return an XML document that contains the <ccc value="Hello"></ccc> the
collection it came from and the key to the document that contains the element.

You will have to parse the document to get the key and the "attributes">.
Since the format is the same, this can be done fairly easily with a SAX
parser.
I know of no other way, nature of the beast...

HTH,

Mark

Jayaram Narayana wrote:

> =============================================================
> this question has been posted to the xindice dev. forum also.
> =============================================================
>
> hi there,
>
> using an xpath query returns a lot more information than what's actually
> required. example, given an element that looks like this:
>
> <aaa>
>         <bbb>
>         <ccc value = "Hello">
> </aaa>
>
> and an xpath query like this:
> "//ccc [@value]"
>
> will return a big result like this:
>
> <aaa xmlns:src="http://xml.apache.org/xindice/Query" src:col="/db/zzz/yyy"
> src:key="eee">
>  <ccc value="Hello" />
> </aaa>
>
> all i am interested in is the value of the attribute "value" (Hello). how
> can i get JUST that directly, and nothig else? i do not want to parse the
> DOM for this.
>
> TIA,
> -nani