You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Penning <dp...@net-com.de> on 2003/10/23 12:15:48 UTC
Re[2]: improve performance of "AND-queries"
Hallo Erik,
EH> Do you have some performance numbers to go with this?
+_key:23/* -> 801 hits 80-90 ms
+24:8397 -> 10 hits <10 ms
+24:8397 +_key:23/* -> 5 hits 80-90 ms
Info: The index size is 118kb.
If Lucene first searches for +24:8397 and then checks if the rest of
the queries matches those documents found, it might be able to return
the result much faster.
The execution time when using a QueryFilter is nearly the same (~10ms
faster).
EH> One thing I'd love to see is some JUnitPerf tests added to Lucene's
EH> test suite - maybe you could come up with a test case that shows the
EH> performance issue explicitly here?
EH> Also, if you are doing this type of query repeatedly, look into using a
EH> QueryFilter that uses the most restrictive query as the filter, which
EH> will limit the search space of the second query.
EH> Erik
EH> On Thursday, October 23, 2003, at 04:43 AM, Daniel Penning wrote:
>> Hi
>>
>> The performance of queries using AND (and +) could be greatly improved.
>>
>> Example:
>> title:"The Right Way" -> 10 hits
>> text:go -> 100 hits
>> title:"The Right Way" AND text:go -> 5 hits
>>
>> It looks like both parts of the query are executed seperatly and then
>> they are merged. If Lucene would be able to execute the query with
>> less results (text:go) first and then only check if the second part
>> (title:"The Right Way") matches, those queries would be much faster.
>>
>>
>> Daniel Penning
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
EH> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Daniel Penning
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: improve performance of "AND-queries"
Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
I am not familiar enough with the query parser syntax, but is the * a
wildcard? If so, that's what is causing the extra delay. If you want to
speed this type of query up, the QueryFilter should be probably created
on the wildcard query, not on the most restrictive one.
Lucene's internals are somewhat different than a typical database
system. For instance, to execute a wildcard query, it needs to retrieve
all tokens found in this field in all documents that match the wildcard.
Then it turns this into an "OR" query and goes from there. Things get a
bit better with a "starts with" type of wildcard, but it still ends up
with some prep work and an OR-type query, I believe.
Dmitry.
Erik Hatcher wrote:
>
> On Thursday, October 23, 2003, at 06:15 AM, Daniel Penning wrote:
>
>> EH> Do you have some performance numbers to go with this?
>>
>> +_key:23/* -> 801 hits 80-90 ms
>> +24:8397 -> 10 hits <10 ms
>> +24:8397 +_key:23/* -> 5 hits 80-90 ms
>>
>> Info: The index size is 118kb.
>
>
> Thanks for the numbers.
>
>> The execution time when using a QueryFilter is nearly the same (~10ms
>> faster).
>
>
> Keep in mind that QueryFilter will only be faster on successive uses
> of it (the same instance, that is) as it caches the bitset of matching
> documents, so successive calls only search the matched ones from the
> first time around.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Re[2]: improve performance of "AND-queries"
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Thursday, October 23, 2003, at 06:15 AM, Daniel Penning wrote:
> EH> Do you have some performance numbers to go with this?
>
> +_key:23/* -> 801 hits 80-90 ms
> +24:8397 -> 10 hits <10 ms
> +24:8397 +_key:23/* -> 5 hits 80-90 ms
>
> Info: The index size is 118kb.
Thanks for the numbers.
> The execution time when using a QueryFilter is nearly the same (~10ms
> faster).
Keep in mind that QueryFilter will only be faster on successive uses of
it (the same instance, that is) as it caches the bitset of matching
documents, so successive calls only search the matched ones from the
first time around.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org