You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kyle Lee <ra...@gmail.com> on 2011/07/20 17:09:04 UTC

Manipulating a Fuzzy Query's Prefix Length

We're performing fuzzy searches on a field possessing a large number of
unique terms. Specifying a required minimum similarity of 0.7 results in a
query execution time of 13-15 seconds, which stands in stark contrast to our
average query time of 40ms.

We suspect that the performance problem most likely emanates from the
enumeration over all the unique terms in the index. The Lucene documentation
for FuzzyQuery supports this theory with the following warning:

*"Warning:* this query is not very scalable with its default prefix length
of 0 - in this case, *every* term will be enumerated and cause an edit score
calculation."

We would therefore like to set the prefix length to one or two, mandating
that the first couple of characters match and thereby substantially reduce
the number of terms enumerated. Is this possible with Solr? I haven't yet
discovered a method, if so. Any help would be greatly appreciated.

Re: Manipulating a Fuzzy Query's Prefix Length

Posted by Kyle Lee <ra...@gmail.com>.
Update:

Solr/Lucene 4.0 will incorporate a new fuzzy search algorithm with
substantial performance improvements.

To tide us over until this release, we've simply rebuilt from source with a
default prefix length of 2, which will suit our needs until then.

On Wed, Jul 20, 2011 at 10:09 AM, Kyle Lee <ra...@gmail.com>wrote:

> We're performing fuzzy searches on a field possessing a large number of
> unique terms. Specifying a required minimum similarity of 0.7 results in a
> query execution time of 13-15 seconds, which stands in stark contrast to our
> average query time of 40ms.
>
> We suspect that the performance problem most likely emanates from the
> enumeration over all the unique terms in the index. The Lucene documentation
> for FuzzyQuery supports this theory with the following warning:
>
> *"Warning:* this query is not very scalable with its default prefix length
> of 0 - in this case, *every* term will be enumerated and cause an edit score
> calculation."
>
> We would therefore like to set the prefix length to one or two, mandating
> that the first couple of characters match and thereby substantially reduce
> the number of terms enumerated. Is this possible with Solr? I haven't yet
> discovered a method, if so. Any help would be greatly appreciated.
>