You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Penning <dp...@net-com.de> on 2003/10/23 12:15:48 UTC

Re[2]: improve performance of "AND-queries"

Hallo Erik,

EH> Do you have some performance numbers to go with this?

+_key:23/*              -> 801 hits  80-90 ms
+24:8397                -> 10 hits   <10 ms
+24:8397 +_key:23/*     -> 5 hits    80-90 ms

Info: The index size is 118kb.

If Lucene first searches for +24:8397 and then checks if the rest of
the queries matches those documents found, it might be able to return
the result much faster.

The execution time when using a QueryFilter is nearly the same (~10ms
faster).

EH> One thing I'd love to see is some JUnitPerf tests added to Lucene's 
EH> test suite - maybe you could come up with a test case that shows the 
EH> performance issue explicitly here?

EH> Also, if you are doing this type of query repeatedly, look into using a 
EH> QueryFilter that uses the most restrictive query as the filter, which 
EH> will limit the search space of the second query.

EH>         Erik


EH> On Thursday, October 23, 2003, at 04:43  AM, Daniel Penning wrote:

>> Hi
>>
>> The performance of queries using AND (and +) could be greatly improved.
>>
>> Example:
>> title:"The Right Way"                 -> 10 hits
>> text:go                               -> 100 hits
>> title:"The Right Way" AND text:go     -> 5 hits
>>
>> It looks like both parts of the query are executed seperatly and then
>> they are merged. If Lucene would be able to execute the query with
>> less results (text:go) first and then only check if the second part
>> (title:"The Right Way") matches, those queries would be much faster.
>>
>>
>> Daniel Penning
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
EH> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Daniel Penning


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: improve performance of "AND-queries"

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
I am not familiar enough with the query parser syntax, but is the * a 
wildcard? If so, that's what is causing the extra delay. If you want to 
speed this type of query up, the QueryFilter should be probably created 
on the wildcard query, not on the most restrictive one.

Lucene's internals are somewhat different than a typical database 
system. For instance, to execute a wildcard query, it needs to retrieve 
all tokens found in this field in all documents that match the wildcard. 
Then it turns this into an "OR" query and goes from there. Things get a 
bit better with a "starts with" type of wildcard, but it still ends up 
with some prep work and an OR-type query, I believe.

Dmitry.


Erik Hatcher wrote:

>
> On Thursday, October 23, 2003, at 06:15  AM, Daniel Penning wrote:
>
>> EH> Do you have some performance numbers to go with this?
>>
>> +_key:23/*              -> 801 hits  80-90 ms
>> +24:8397                -> 10 hits   <10 ms
>> +24:8397 +_key:23/*     -> 5 hits    80-90 ms
>>
>> Info: The index size is 118kb.
>
>
> Thanks for the numbers.
>
>> The execution time when using a QueryFilter is nearly the same (~10ms
>> faster).
>
>
> Keep in mind that QueryFilter will only be faster on successive uses 
> of it (the same instance, that is) as it caches the bitset of matching 
> documents, so successive calls only search the matched ones from the 
> first time around.
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Re[2]: improve performance of "AND-queries"

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Thursday, October 23, 2003, at 06:15  AM, Daniel Penning wrote:
> EH> Do you have some performance numbers to go with this?
>
> +_key:23/*              -> 801 hits  80-90 ms
> +24:8397                -> 10 hits   <10 ms
> +24:8397 +_key:23/*     -> 5 hits    80-90 ms
>
> Info: The index size is 118kb.

Thanks for the numbers.

> The execution time when using a QueryFilter is nearly the same (~10ms
> faster).

Keep in mind that QueryFilter will only be faster on successive uses of 
it (the same instance, that is) as it caches the bitset of matching 
documents, so successive calls only search the matched ones from the 
first time around.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org