You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Deepak Konidena <de...@gmail.com> on 2013/08/30 17:08:05 UTC

Order of fields in a search query.

Does the order of fields matter in a lucene query?

For instance,

q = A && B && C

Lets say A appears in a million documents, B in 10000, C in 1000.

while the results would be identical irrespective of the order in which you
AND
A, B and C, will the response times of the following queries differ in any
way?

C && B && A
A && B && C

Does Lucene/Solr pick the best query execution plan in terms of both space
and time for a given query?

-Deepak

Re: Order of fields in a search query.

Posted by Aloke Ghoshal <al...@gmail.com>.
Hi Deepak,

As Hoss explains it, there wouldn't be any effect of changing the order of
individual search terms.

In addition, you could look at the Scoring algo:
http://lucene.apache.org/core/2_9_4/scoring.html#Algorithm,
http://lucene.apache.org/core/2_9_4/api/core/org/apache/lucene/search/package-summary.html#scoring

Also the advancing logic is in the Scorer's (inherited) advance method:
http://lucene.apache.org/core/2_9_4/api/core/org/apache/lucene/search/DocIdSetIterator.html#advance%28int%29

- For the AND query a leap frog/ skip ahead pattern is implemented at the
BooleanScorer2 (ConjunctionScorer) level.

For the query, q=A AND B

where A & B match doc. id's
A -> 1,3,5,7,11,15,17
B -> 2, 6

- Scorers start with the min. of each, i.e. A -> 1  & B -> 2, & current
highest doc id set to 2

In the next few iterations:
A is advanced past the current highest value to 3 & current highest updated
to 3.
B advanced past current highest 3 to 6 & current highest updated to 6.
A advanced past 6 to 7 & current highest updated to 7.
B has no more docs & this breaks out, without any match.

- On the other hand if the two had converged/ agreed on a particular doc
id, that doc would be scored & collected (added to a min-heap).

Please add if there's something amiss in the explanation.

Regards,
Aloke


On Sat, Aug 31, 2013 at 5:43 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : while the results would be identical irrespective of the order in which
> you
> : AND
> : A, B and C, will the response times of the following queries differ in
> any
> : way?
> :
> : C && B && A
> : A && B && C
>
> The queries won't be *cached* the same at the solr level, because the
> BooleanQuery generated by the parsers won't be 100% identical, but the
> execution of those (uncached) queries should be virtualy indential.
>
> : Does Lucene/Solr pick the best query execution plan in terms of both
> space
> : and time for a given query?
>
> It's not a "query execution plan" so much as it is a "skip ahead" pattern.
> the Sub-Scorers for each of the sub-queries are looped over and each one
> is asked to identify the "first" doc id (X) that it matches, after or
> equal to the "first" doc id (Y) returned by the last sub-query consulted
> -- starting with Y=0.  And each time a new "X" is found, the looping
> starts again on the remaining subscorers until a match is found (or we run
> out of documents)
>
> So regardless of what the order of clauses are in the original
> BooleanQuery, the Scorers for each clause are constantly reordered based
> on what the "first" document they match after the currently considered
> document is.
>
> -Hoss
>

Re: Order of fields in a search query.

Posted by Chris Hostetter <ho...@fucit.org>.
: while the results would be identical irrespective of the order in which you
: AND
: A, B and C, will the response times of the following queries differ in any
: way?
: 
: C && B && A
: A && B && C

The queries won't be *cached* the same at the solr level, because the 
BooleanQuery generated by the parsers won't be 100% identical, but the 
execution of those (uncached) queries should be virtualy indential.

: Does Lucene/Solr pick the best query execution plan in terms of both space
: and time for a given query?

It's not a "query execution plan" so much as it is a "skip ahead" pattern.  
the Sub-Scorers for each of the sub-queries are looped over and each one 
is asked to identify the "first" doc id (X) that it matches, after or 
equal to the "first" doc id (Y) returned by the last sub-query consulted 
-- starting with Y=0.  And each time a new "X" is found, the looping 
starts again on the remaining subscorers until a match is found (or we run 
out of documents)

So regardless of what the order of clauses are in the original 
BooleanQuery, the Scorers for each clause are constantly reordered based 
on what the "first" document they match after the currently considered 
document is.  

-Hoss