You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-users@xml.apache.org by Dawid Weiss <da...@go2.pl> on 2002/04/26 11:37:16 UTC

Re[2]: XPath details [Repeat Posting]

JN> i am StringTokenizing and parsing the xpathquery result. finally! only now i
JN> am positive that what i knew is correct :-)

Nanni,  StringTokenizer  may  be  in  some  cases  slower  than  a  regular
expression,  which  is  mostly  automaton-based  transitions between states
(i.e.  pretty  fast  ;).  More, a regular expression pattern you can easily
throw  out to properties or some other external configuration, while string
tokenizing  will embed the logic of extracting that information you need in
the code.

You know what fits your needs best of course.

Cheers,
Dawid

Re: XPath details [Repeat Posting]

Posted by "Mark J. Stang" <ma...@earthlink.net>.

I thought the regexp suggestion was great!   In an earlier question, it
didn't even occur to me that use regexp, I couldn't think of an alternative.
The response is a simple XML document, so regexp would be a simple
and fast alternative.   I have always done it the hard way.   I wrote a simple
XML parser and decided it was easier to let someone else do it ;-).

I intended to use a DOM/JDOM implementation to parse a really complex
document.   The more I looked at SAX, the "simpler" it became.   Then
I tried it on some simpler documents and figured out it is the easiest way
to go.   I did some timings and am getting anywhere from 8/10-90 ms to
parse a document, depending on the complexity.   All it really does is
walk down an array, notifying you when the tags begin and end, so it has
some bookkeeping overhead.

I have found that DOM/JDOM has a certain amount of mental overhead.
I was walking the tree looking for certain values, almost like a state
transistion.   And along the way I had to check every child to make sure
it wasn't null.   Now I only use DOM type trees for data that has to be
modified.

Mark

Dawid Weiss wrote:

> MJS> If you look at how the SAX parser works, it takes the "document"
> MJS> and steps through it like an array.
>
> Oh,   I   know  how SAX works,  but I'd be curious how the speed
> of  even  the  fastest  SAX  engines  compare  to  a  precompiled  regular
> expression.  It's  a  vain discussion though, of course I admit SAX is both
> nicer  and less error prone than a regular expression way. I just mentioned
> the   regexp  to  satisfy  people's  natural curiosity on HOW THINGS CAN BE
> DONE. You know what I mean? :)
>
> Dawid

Re[2]: XPath details [Repeat Posting]

Posted by Dawid Weiss <da...@go2.pl>.

MJS> If you look at how the SAX parser works, it takes the "document"
MJS> and steps through it like an array.

Oh,   I   know  how SAX works,  but I'd be curious how the speed
of  even  the  fastest  SAX  engines  compare  to  a  precompiled  regular
expression.  It's  a  vain discussion though, of course I admit SAX is both
nicer  and less error prone than a regular expression way. I just mentioned
the   regexp  to  satisfy  people's  natural curiosity on HOW THINGS CAN BE
DONE. You know what I mean? :)

Dawid

Re: XPath details [Repeat Posting]

Posted by "Mark J. Stang" <ma...@earthlink.net>.

If you look at how the SAX parser works, it takes the "document"
and steps through it like an array.   You can write a "handler", which
is a single class in about 20 lines of code.   I tried the JDOM for some
of my documents, but I had to keep working down the tree, checking for
null, etc.   With the search results, you are only interested in a couple of
tags, all it takes is a couple of ifs.  The nice part is you don't have to put
in any tree logic.

I have some timing checks and all of these are fast, so unless you are doing
it constantly, speed is not an issue.

HTH,

Mark

Unknown wrote:

> JN> i am StringTokenizing and parsing the xpathquery result. finally! only now i
> JN> am positive that what i knew is correct :-)
>
> Nanni,  StringTokenizer  may  be  in  some  cases  slower  than  a  regular
> expression,  which  is  mostly  automaton-based  transitions between states
> (i.e.  pretty  fast  ;).  More, a regular expression pattern you can easily
> throw  out to properties or some other external configuration, while string
> tokenizing  will embed the logic of extracting that information you need in
> the code.
>
> You know what fits your needs best of course.
>
> Cheers,
> Dawid