You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Zsolt Koppany <zk...@intland.com> on 2009/06/15 16:19:16 UTC

Fuzzy vs Prefix query Performance

Hi,

on 99470 documents (I mean Lucene documents) a FuzzyQuery needs approx 
30 seconds but PrefixQuery less than one.

All Lucene files need 65MB together.

I'm bit surprised of that. Is that possible?

Zsolt

Zsolt Koppany
Phone: +49-711-67400-679

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Fuzzy vs Prefix query Performance

Posted by Zsolt Koppany <zk...@intland.com>.
Erick,

this a web application running 24 hours a day thus caching cannot be the 
reason. I get the same result after I re-start the same search.

Zsolt

Erick Erickson wrote:
> Well, if you're seeing it, it's possible <G>....
> 
> But the first question is always "what were you measuring?" Be aware
> that when you open a searcher, the first few queries can fill caches, etc
> and
> may take an anomalously long time, especially if you're sorting. So could
> you give more details of your test setup?
> 
> Best
> Erick
> 
> On Mon, Jun 15, 2009 at 3:19 PM, Zsolt Koppany <zk...@intland.com>wrote:
> 
>> Hi,
>>
>> on 99470 documents (I mean Lucene documents) a FuzzyQuery needs approx 30
>> seconds but PrefixQuery less than one.
>>
>> All Lucene files need 65MB together.
>>
>> I'm bit surprised of that. Is that possible?
>>
>> Zsolt
>>
>> Zsolt Koppany
>> Phone: +49-711-67400-679
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Fuzzy vs Prefix query Performance

Posted by mark harwood <ma...@yahoo.co.uk>.
FuzzyQuery performance is related to number of unique terms in the index not the number of documents e.g. a single "telephone directory" document could contain millions of terms.
Each term considered is compared using an "edit distance" algo which is CPU intensive.

The FuzzyQuery prefix length setting dictates if the fuzzy edit distance comparisons are done from A to Z (prefix length=0) or just those terms sharing the first n characters of the input term. Obviously this can make a huge difference in number of terms compared (prefix length of 1 would reduce search space to 1/26th of prefix length =0 assuming even distribution of words in the alphabet).

Your prefix query does a simpler operation - the equivalent of String.startsWith(..) and will typically operate on fewer terms.

Cheers
Mark



----- Original Message ----
From: Erick Erickson <er...@gmail.com>
To: java-user@lucene.apache.org
Sent: Monday, 15 June, 2009 15:34:18
Subject: Re: Fuzzy vs Prefix query Performance

Well, if you're seeing it, it's possible <G>....

But the first question is always "what were you measuring?" Be aware
that when you open a searcher, the first few queries can fill caches, etc
and
may take an anomalously long time, especially if you're sorting. So could
you give more details of your test setup?

Best
Erick

On Mon, Jun 15, 2009 at 3:19 PM, Zsolt Koppany <zk...@intland.com>wrote:

> Hi,
>
> on 99470 documents (I mean Lucene documents) a FuzzyQuery needs approx 30
> seconds but PrefixQuery less than one.
>
> All Lucene files need 65MB together.
>
> I'm bit surprised of that. Is that possible?
>
> Zsolt
>
> Zsolt Koppany
> Phone: +49-711-67400-679
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Fuzzy vs Prefix query Performance

Posted by Erick Erickson <er...@gmail.com>.
Well, if you're seeing it, it's possible <G>....

But the first question is always "what were you measuring?" Be aware
that when you open a searcher, the first few queries can fill caches, etc
and
may take an anomalously long time, especially if you're sorting. So could
you give more details of your test setup?

Best
Erick

On Mon, Jun 15, 2009 at 3:19 PM, Zsolt Koppany <zk...@intland.com>wrote:

> Hi,
>
> on 99470 documents (I mean Lucene documents) a FuzzyQuery needs approx 30
> seconds but PrefixQuery less than one.
>
> All Lucene files need 65MB together.
>
> I'm bit surprised of that. Is that possible?
>
> Zsolt
>
> Zsolt Koppany
> Phone: +49-711-67400-679
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>