You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Dawid Weiss <da...@go2.pl> on 2002/04/26 11:37:16 UTC
Re[2]: XPath details [Repeat Posting]
JN> i am StringTokenizing and parsing the xpathquery result. finally! only now i
JN> am positive that what i knew is correct :-)
Nanni, StringTokenizer may be in some cases slower than a regular
expression, which is mostly automaton-based transitions between states
(i.e. pretty fast ;). More, a regular expression pattern you can easily
throw out to properties or some other external configuration, while string
tokenizing will embed the logic of extracting that information you need in
the code.
You know what fits your needs best of course.
Cheers,
Dawid
Re: XPath details [Repeat Posting]
Posted by "Mark J. Stang" <ma...@earthlink.net>.
I thought the regexp suggestion was great! In an earlier question, it
didn't even occur to me that use regexp, I couldn't think of an alternative.
The response is a simple XML document, so regexp would be a simple
and fast alternative. I have always done it the hard way. I wrote a simple
XML parser and decided it was easier to let someone else do it ;-).
I intended to use a DOM/JDOM implementation to parse a really complex
document. The more I looked at SAX, the "simpler" it became. Then
I tried it on some simpler documents and figured out it is the easiest way
to go. I did some timings and am getting anywhere from 8/10-90 ms to
parse a document, depending on the complexity. All it really does is
walk down an array, notifying you when the tags begin and end, so it has
some bookkeeping overhead.
I have found that DOM/JDOM has a certain amount of mental overhead.
I was walking the tree looking for certain values, almost like a state
transistion. And along the way I had to check every child to make sure
it wasn't null. Now I only use DOM type trees for data that has to be
modified.
Mark
Dawid Weiss wrote:
> MJS> If you look at how the SAX parser works, it takes the "document"
> MJS> and steps through it like an array.
>
> Oh, I know how SAX works, but I'd be curious how the speed
> of even the fastest SAX engines compare to a precompiled regular
> expression. It's a vain discussion though, of course I admit SAX is both
> nicer and less error prone than a regular expression way. I just mentioned
> the regexp to satisfy people's natural curiosity on HOW THINGS CAN BE
> DONE. You know what I mean? :)
>
> Dawid
Re[2]: XPath details [Repeat Posting]
Posted by Dawid Weiss <da...@go2.pl>.
MJS> If you look at how the SAX parser works, it takes the "document"
MJS> and steps through it like an array.
Oh, I know how SAX works, but I'd be curious how the speed
of even the fastest SAX engines compare to a precompiled regular
expression. It's a vain discussion though, of course I admit SAX is both
nicer and less error prone than a regular expression way. I just mentioned
the regexp to satisfy people's natural curiosity on HOW THINGS CAN BE
DONE. You know what I mean? :)
Dawid
Re: XPath details [Repeat Posting]
Posted by "Mark J. Stang" <ma...@earthlink.net>.
If you look at how the SAX parser works, it takes the "document"
and steps through it like an array. You can write a "handler", which
is a single class in about 20 lines of code. I tried the JDOM for some
of my documents, but I had to keep working down the tree, checking for
null, etc. With the search results, you are only interested in a couple of
tags, all it takes is a couple of ifs. The nice part is you don't have to put
in any tree logic.
I have some timing checks and all of these are fast, so unless you are doing
it constantly, speed is not an issue.
HTH,
Mark
Unknown wrote:
> JN> i am StringTokenizing and parsing the xpathquery result. finally! only now i
> JN> am positive that what i knew is correct :-)
>
> Nanni, StringTokenizer may be in some cases slower than a regular
> expression, which is mostly automaton-based transitions between states
> (i.e. pretty fast ;). More, a regular expression pattern you can easily
> throw out to properties or some other external configuration, while string
> tokenizing will embed the logic of extracting that information you need in
> the code.
>
> You know what fits your needs best of course.
>
> Cheers,
> Dawid