You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Guilherme Aiolfi <gr...@gmail.com> on 2011/05/19 02:03:08 UTC

Fuzzy search and solr 4.0

Hi,

I want to do a fuzzy search that compare a phrase to a field in solr. For
example:

"abc company ltda" will be compared to "abc comp", "abc corporation", "def
company ltda", "nothing to match here".

The thing is the it has to always returns documents sorted by its score.

I've found some good algorithms to do that, like StrikeAMatch[1] and
JaroWinkler.

Using the JaroWinkler with strdist() I can do exactly that. But, I rather
prefer to use the StrikeAMatch that had a patch in the lucene jira that was
never commited.

So, I contacted the author of that patch and he told me that I should use
the solr 4.0 that it has now some pretty good new fuzzy search enhancements
that made StrikeAMatch seems toys for kids.

Anyone know how can I achieve that using solr 4.0?

[1] http://www.catalysoft.com/articles/StrikeAMatch.html

Re: Fuzzy search and solr 4.0

Posted by Guilherme Aiolfi <gr...@gmail.com>.
You, or any other solr member, knows a good fuzzy string matching library to
recommend?

On Thu, May 19, 2011 at 9:39 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Well.... the good news is FuzzyQuery is indeed much faster in Lucene/Solr
> 4.0.
>
> But the bad news is... FuzzyQuery won't do what you need here.  You
> need some sort of FuzzyPhraseQuery, which is able to replace terms
> similar to one another (comp/company/corporation) by some metric.  I
> don't know of such a query in Lucene/Solr... but it'd be a nice
> addition.  Others have asked about this before.
>
> FuzzyQuery finds terms "close" to other terms, when measured by edit
> distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each
> other.
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi <gr...@gmail.com>
> wrote:
> > Hi,
> >
> > I want to do a fuzzy search that compare a phrase to a field in solr. For
> > example:
> >
> > "abc company ltda" will be compared to "abc comp", "abc corporation",
> "def
> > company ltda", "nothing to match here".
> >
> > The thing is the it has to always returns documents sorted by its score.
> >
> > I've found some good algorithms to do that, like StrikeAMatch[1] and
> > JaroWinkler.
> >
> > Using the JaroWinkler with strdist() I can do exactly that. But, I rather
> > prefer to use the StrikeAMatch that had a patch in the lucene jira that
> was
> > never commited.
> >
> > So, I contacted the author of that patch and he told me that I should use
> > the solr 4.0 that it has now some pretty good new fuzzy search
> enhancements
> > that made StrikeAMatch seems toys for kids.
> >
> > Anyone know how can I achieve that using solr 4.0?
> >
> > [1] http://www.catalysoft.com/articles/StrikeAMatch.html
> >
>

Re: Fuzzy search and solr 4.0

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well.... the good news is FuzzyQuery is indeed much faster in Lucene/Solr 4.0.

But the bad news is... FuzzyQuery won't do what you need here.  You
need some sort of FuzzyPhraseQuery, which is able to replace terms
similar to one another (comp/company/corporation) by some metric.  I
don't know of such a query in Lucene/Solr... but it'd be a nice
addition.  Others have asked about this before.

FuzzyQuery finds terms "close" to other terms, when measured by edit
distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each
other.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi <gr...@gmail.com> wrote:
> Hi,
>
> I want to do a fuzzy search that compare a phrase to a field in solr. For
> example:
>
> "abc company ltda" will be compared to "abc comp", "abc corporation", "def
> company ltda", "nothing to match here".
>
> The thing is the it has to always returns documents sorted by its score.
>
> I've found some good algorithms to do that, like StrikeAMatch[1] and
> JaroWinkler.
>
> Using the JaroWinkler with strdist() I can do exactly that. But, I rather
> prefer to use the StrikeAMatch that had a patch in the lucene jira that was
> never commited.
>
> So, I contacted the author of that patch and he told me that I should use
> the solr 4.0 that it has now some pretty good new fuzzy search enhancements
> that made StrikeAMatch seems toys for kids.
>
> Anyone know how can I achieve that using solr 4.0?
>
> [1] http://www.catalysoft.com/articles/StrikeAMatch.html
>