You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Trejkaz <tr...@trypticon.org> on 2012/11/21 00:18:44 UTC

Performance of IndexSearcher.explain(Query)

I have a feature I wanted to implement which required a quick way to
check whether an individual document matched a query or not.

IndexSearcher.explain seemed to be a good fit for this.

The query I tested was just a BooleanQuery with two TermQuery inside
it, both with MUST. I ran an empty query to match all documents and
then ran the new code against each document. Within 40,743 documents,
1,072 documents matched the query.

I got the times of around 15.5s doing this. After noticing that
ConstantScoreQuery now works with Query in addition to Filter, I
started using it as well, which further reduced this time to 13.6s.

There is a comment like this on the explain method, though:

    "Computing an explanation is as expensive as executing
     the query over the entire index."

So I wanted to test this. To do this, I made a collector which did
nothing but look for the single item being matched.

Times for searching the whole index using this collector came to
around 30.9s, which is more than twice as slow as using explain (times
didn't vary at all if I used ConstantScoreQuery here, which I assume
is something to do with using a custom collector which is ignoring the
scorer.)

So I was wondering, is this comment just out of date? It seems that by
using explain(), I get the same information I get by querying the
whole index, *plus* information about the score which the custom
collector wasn't recording, all in less than half the time it took to
query the whole index.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Performance of IndexSearcher.explain(Query)

Posted by Trejkaz <tr...@trypticon.org>.
On Wed, Nov 21, 2012 at 10:40 AM, Robert Muir <rc...@gmail.com> wrote:
> Explain is not performant... but the comment is fair I think? Its more of a
> worst-case, depends on the query.
> Explain is going to rewrite the query/create the weight and so on just to
> advance() the scorer to that single doc
> So if this is e.g. a wildcard query then it could definitely be almost as
> slow as searching the whole index since the rewrite involves scanning
> through the term dictionary or whatever.

Hmm, yep. That does seem to be it. For complicated queries (or at
least queries which are slow to create a weight for) it's about the
same speed no matter which way I do it. For the more normal queries I
was trying, explain() seems to speed things up a fair bit. For simple
one-term queries it might be a bit quicker still.

It's at least never slower than doing the full query though, so I can
still use it. I'll just be putting a similar (though perhaps more
specific) warning about performance on the method.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Performance of IndexSearcher.explain(Query)

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Nov 20, 2012 at 6:18 PM, Trejkaz <tr...@trypticon.org> wrote:

> I have a feature I wanted to implement which required a quick way to
> check whether an individual document matched a query or not.
>
> IndexSearcher.explain seemed to be a good fit for this.
>
> The query I tested was just a BooleanQuery with two TermQuery inside
> it, both with MUST. I ran an empty query to match all documents and
> then ran the new code against each document. Within 40,743 documents,
> 1,072 documents matched the query.
>
> I got the times of around 15.5s doing this. After noticing that
> ConstantScoreQuery now works with Query in addition to Filter, I
> started using it as well, which further reduced this time to 13.6s.
>
> There is a comment like this on the explain method, though:
>
>     "Computing an explanation is as expensive as executing
>      the query over the entire index."
>
> So I wanted to test this. To do this, I made a collector which did
> nothing but look for the single item being matched.
>
> Times for searching the whole index using this collector came to
> around 30.9s, which is more than twice as slow as using explain (times
> didn't vary at all if I used ConstantScoreQuery here, which I assume
> is something to do with using a custom collector which is ignoring the
> scorer.)
>
> So I was wondering, is this comment just out of date? It seems that by
> using explain(), I get the same information I get by querying the
> whole index, *plus* information about the score which the custom
> collector wasn't recording, all in less than half the time it took to
> query the whole index.
>
>
Explain is not performant... but the comment is fair I think? Its more of a
worst-case, depends on the query.
Explain is going to rewrite the query/create the weight and so on just to
advance() the scorer to that single doc
So if this is e.g. a wildcard query then it could definitely be almost as
slow as searching the whole index since the rewrite involves scanning
through the term dictionary or whatever.