You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-dev@xml.apache.org by Gary Hallmark <Ga...@oracle.com> on 2002/10/15 01:04:35 UTC

slow xpath evaluation in Xindice 1.1

I pulled Xindice 1.1 from cvs on 10/05 and tried the following xpath query 
on a non-indexed collection: /a/b[. < 'foo']

A bit of profiling reveals the following:

89% of the time is spent in org.apache.xml.dtm.DTMManager.findFactory(), 
which is called from
DTMManager.newInstance, which is called from
org.apache.xpath.XPathContext <init>, which is called from
org.apache.xindice.core.query.XPathQueryResolver$ResultSet.prepareNextNode

Does anyone know off the top of their head what is going on and whether 
this expensive-looking method can be done once per query rather than once 
per result node?


Cheers, Gary


Re: slow xpath evaluation in Xindice 1.1

Posted by John Merrells <me...@sleepycat.com>.
Gary Hallmark wrote:

>"illegal" may be a bit strong. The Xpath 1.0 spec states that the operands
>are first converted to numbers and then compared. It doesn't say what
>happens if the conversion fails. 
>

Ah, good, my misunderstanding. I must go fix my parser to attempt a 
conversion
before it spits out an error. Thanks

John


Re: slow xpath evaluation in Xindice 1.1

Posted by Gary Hallmark <ga...@oracle.com>.
"illegal" may be a bit strong. The Xpath 1.0 spec states that the operands
are first converted to numbers and then compared. It doesn't say what
happens if the conversion fails. It seems that Xalan's xpath (used
internally by Xindice) simply treats "<" on strings as always "false". It
looks like Xindices' indexes (B-trees) may be allowing the comparison of
strings (but I could be mistaken). I was looking for a case where Xindice
gives different results depending on whether or not indexes are used to
evaluate Xpath (which would be a very bad thing) when I stumbled on the
performance issue.

Note that Xpath 1.0 further states that "&lt;" should be used in place of
"<". Thankfully, Xindice doesn't seem to enforce this.

----- Original Message -----
From: "John Merrells" <me...@sleepycat.com>
To: <xi...@xml.apache.org>
Sent: Monday, October 14, 2002 4:22 PM
Subject: Re: slow xpath evaluation in Xindice 1.1


>
> Gary Hallmark wrote:
>
> > /a/b[. < 'foo']
>
> This has nothing to do with Xindice...
> But isn't that an illegal XPath 1.0 expression?
> Inequality comparisons are only allowed for Numbers,
> not for Strings...?
>
> John
>
>


Re: slow xpath evaluation in Xindice 1.1

Posted by John Merrells <me...@sleepycat.com>.
Gary Hallmark wrote:

> /a/b[. < 'foo']

This has nothing to do with Xindice...
But isn't that an illegal XPath 1.0 expression?
Inequality comparisons are only allowed for Numbers,
not for Strings...?

John


Re: Solved! (Re: slow xpath evaluation in Xindice 1.1)

Posted by Steven Noels <st...@outerthought.org>.
James Bates wrote:

> Are Xindice commiters also automatically Xalan commiters?
> If so, I may be able to do this...

Nope, just send in a patch to the Xalan developers list.

You have to 'earn' your commit rights for each subproject individually. 
scott_boag@us.ibm.com is a friendly Xalan committer who often is the 
liaison between Cocoon and Xalan, so maybe putting him on CC might help.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org


Re: Solved! (Re: slow xpath evaluation in Xindice 1.1)

Posted by James Bates <ja...@amplexor.com>.
On Tuesday 15 October 2002 20:14, Gary Hallmark wrote:
> there's a bug in findFactory().  It looks high and low (properties files,
> class paths, etc.) for a factory, and if it finds one, it caches it so it
> doesn't look again.  If it doesn't find it, it uses a default, but doesn't
> cache that.  If you pass
> -Dorg.apache.xml.dtm.DTMManager=org.apache.xml.dtm.ref.DTMManagerDefault to
> your JVM running Xindice, you will speed up collection scans by 500% to
> 1000%
>
> If there's a Xalan commiter out there, you might want to look into fixing
> this bug in DTMManager.findFactory()
>

Are Xindice commiters also automatically Xalan commiters?
If so, I may be able to do this...

James


Solved! (Re: slow xpath evaluation in Xindice 1.1)

Posted by Gary Hallmark <Ga...@oracle.com>.
there's a bug in findFactory().  It looks high and low (properties files, 
class paths, etc.) for a factory, and if it finds one, it caches it so it 
doesn't look again.  If it doesn't find it, it uses a default, but doesn't 
cache that.  If you pass
-Dorg.apache.xml.dtm.DTMManager=org.apache.xml.dtm.ref.DTMManagerDefault to 
your JVM running Xindice, you will speed up collection scans by 500% to 1000%

If there's a Xalan commiter out there, you might want to look into fixing 
this bug in DTMManager.findFactory()

At 12:50 AM 10/15/02 -0700, you wrote:
>Just to follow up, I repeated the profiling using the more "legal":
>/a/b/[starts-with(., 'foo')] and still observed the large amount of time in
>DTMManager.findFactory()
>
>I tried the obvious "fix", moving
>XPathContext xpc = new XPathContext();
>to the instance variables of
>org.apache.xindice.core.query.XPathQueryResolver$ResultSet
>and calling xpc.reset() once per invocation of prepareNextNode().
>
>This helped a little (time spent in DTMManager.findFactory() dropped to
>60%).
>I gotta think that someone familiar with Xalan's xpath can get this down to
>low single digits...any ideas? Anybody know what XPathContext.release() is
>all about?  Javadoc says "experimental"...
>
>My superficial understanding of the code suggests that 60% is way too much
>to pay for merely setting up to do some xpath evaluation.
>
>----- Original Message -----
>From: "Gary Hallmark" <Ga...@oracle.com>
>To: <xi...@xml.apache.org>
>Sent: Monday, October 14, 2002 4:04 PM
>Subject: slow xpath evaluation in Xindice 1.1
>
>
> > I pulled Xindice 1.1 from cvs on 10/05 and tried the following xpath query
> > on a non-indexed collection: /a/b[. < 'foo']
> >
> > A bit of profiling reveals the following:
> >
> > 89% of the time is spent in org.apache.xml.dtm.DTMManager.findFactory(),
> > which is called from
> > DTMManager.newInstance, which is called from
> > org.apache.xpath.XPathContext <init>, which is called from
> > org.apache.xindice.core.query.XPathQueryResolver$ResultSet.prepareNextNode
> >
> > Does anyone know off the top of their head what is going on and whether
> > this expensive-looking method can be done once per query rather than once
> > per result node?
> >
> >
> > Cheers, Gary
> >
> >

Cheers, Gary


Re: slow xpath evaluation in Xindice 1.1

Posted by Gary Hallmark <ga...@oracle.com>.
Just to follow up, I repeated the profiling using the more "legal":
/a/b/[starts-with(., 'foo')] and still observed the large amount of time in
DTMManager.findFactory()

I tried the obvious "fix", moving
XPathContext xpc = new XPathContext();
to the instance variables of
org.apache.xindice.core.query.XPathQueryResolver$ResultSet
and calling xpc.reset() once per invocation of prepareNextNode().

This helped a little (time spent in DTMManager.findFactory() dropped to
60%).
I gotta think that someone familiar with Xalan's xpath can get this down to
low single digits...any ideas? Anybody know what XPathContext.release() is
all about?  Javadoc says "experimental"...

My superficial understanding of the code suggests that 60% is way too much
to pay for merely setting up to do some xpath evaluation.

----- Original Message -----
From: "Gary Hallmark" <Ga...@oracle.com>
To: <xi...@xml.apache.org>
Sent: Monday, October 14, 2002 4:04 PM
Subject: slow xpath evaluation in Xindice 1.1


> I pulled Xindice 1.1 from cvs on 10/05 and tried the following xpath query
> on a non-indexed collection: /a/b[. < 'foo']
>
> A bit of profiling reveals the following:
>
> 89% of the time is spent in org.apache.xml.dtm.DTMManager.findFactory(),
> which is called from
> DTMManager.newInstance, which is called from
> org.apache.xpath.XPathContext <init>, which is called from
> org.apache.xindice.core.query.XPathQueryResolver$ResultSet.prepareNextNode
>
> Does anyone know off the top of their head what is going on and whether
> this expensive-looking method can be done once per query rather than once
> per result node?
>
>
> Cheers, Gary
>
>