You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Christoph Kiehl <ki...@subshell.com> on 2007/03/01 14:00:36 UTC

Re: Query Performance and Optimization

David Johnson wrote:

> Digging into the internals of Jackrabbit, we have noticed that there is an
> implementation of RangeQuery that essentially walks the results if the # of
> query terms is greater than what Lucene can handle.  Reading the Lucene
> documentation, it looks like Filters are the recommended method of
> implementing "large" range queries, and also seem like a natural for
> matching node types - i.e., select * from Column

As we are expecting to reach a count of 1.000.000+ nodes in one of our 
repositories I'm always interested in any performance improvements. Is anyone 
investigating in this proposal? Or could at least anyone tell me if it's worth 
investigating? ;)

Cheers,
Christoph


Re: Query Performance and Optimization

Posted by David Johnson <db...@gmail.com>.
Any pointers and thoughts from the developers who have worked on the
LuceneQueryBuilder would be very appreciated.  As an idea, I was thinking of
running the Query AST through an optimization before it is passed the the
query builder.  Perhaps in
org.apache.jackrabbit.core.query.lucene.QueryImpl.execute() right before the
LueceneQueryBuilder.createQuery call.

Has anyone done any profiling on queries?  I have some data that I have
gathered with the Netbeans profiler that I could share if anyone is
interested.  Some highlights:

org.apache.lucene.search.Searcher.search(...) and children are taking 96%
time
of the children the first "hit" into jackrabbit code is at
org.apache.jackrabbit.core.query.lucene.SharedFiledSortComparator.newComparator(...)
with 58% time
with its child -
org.apache.jackrabbit.core.query.lucene.SharedFieldCache.getStringIndex(...)
taking all of its time.

At that point the biggest child is
org.apache.lucene.index.MultiTermDocs.next() taking the majority of the time
from then on out.

Any pointers/thoughts on either writing an optimizer for Lucene, alternate
indexing engines or even how to optimize queries would be appreciated.

-Dave

On 3/1/07, Christoph Kiehl <ki...@subshell.com> wrote:
>
> David Johnson wrote:
>
> > Digging into the internals of Jackrabbit, we have noticed that there is
> an
> > implementation of RangeQuery that essentially walks the results if the #
> of
> > query terms is greater than what Lucene can handle.  Reading the Lucene
> > documentation, it looks like Filters are the recommended method of
> > implementing "large" range queries, and also seem like a natural for
> > matching node types - i.e., select * from Column
>
> As we are expecting to reach a count of 1.000.000+ nodes in one of our
> repositories I'm always interested in any performance improvements. Is
> anyone
> investigating in this proposal? Or could at least anyone tell me if it's
> worth
> investigating? ;)
>
> Cheers,
> Christoph
>
>