You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eoghan Ó Carragáin <eo...@gmail.com> on 2013/03/18 01:19:30 UTC

Fuzzy Suggester and exactMatchFirst

Hi,
I've got the Fuzzy Suggester returning results thanks to tips from Robert
Muir, but some of the suggestions aren't great.

For example, passing spellcheck.q=colla to the standard Suggester gives:

<arr name="suggestion">
<str>collaboration</str>
<str>collaborate</str>
<str>collaborating</str>
<str>collaborations</str>
</arr>

Whereas passing spellcheck.q=colla to the Fuzzy Suggester gives:

<arr name="suggestion">
<str>college</str>
<str>colleges</str>
<str>collisions</str>
<str>collaboration</str>
</arr>

I can see why the Fuzzy Suggester sees "college" as a match for "colla" but
expected the exactMatchFirst parameter to ensure that suggestions beginning
with "colla" to be weighted higher than "fuzzier" matches. I
have spellcheck.onlyMorePopular set to true, in case this makes a
difference.

Am I misunderstanding what exactMatchFirst is supposed to do? Is there a
way to ensure suggestions matching exactly what the user has entered rank
higher than fuzzy matches?

Thanks!
Eoghan

Re: Fuzzy Suggester and exactMatchFirst

Posted by Robert Muir <rc...@gmail.com>.
On Sun, Mar 17, 2013 at 8:19 PM, Eoghan Ó Carragáin
<eo...@gmail.com> wrote:
>
> I can see why the Fuzzy Suggester sees "college" as a match for "colla" but
> expected the exactMatchFirst parameter to ensure that suggestions beginning
> with "colla" to be weighted higher than "fuzzier" matches. I
> have spellcheck.onlyMorePopular set to true, in case this makes a
> difference.
>
> Am I misunderstanding what exactMatchFirst is supposed to do? Is there a
> way to ensure suggestions matching exactly what the user has entered rank
> higher than fuzzy matches?
>

I think exactMatchFirst is unrelated to typo-correction: it only
ensures that if you type the whole suggestion exactly that the weight
is completely ignored.
This means if you type 'college' and there is an actual suggestion of
'college' it will be weighted above 'colleges' even if colleges has a
much higher weight.

On the other hand what you want (i think) is to punish the weights of
suggestions that required some corrections. Currently I don't think
there is any way to do that:

 * NOTE: This suggester does not boost suggestions that
 * required no edits over suggestions that did require
 * edits.  This is a known limitation.

I think the trickiest part about this is how the "punishment" formula
should work. Because today this thing makes no assumptions as to how
you came up with your suggestion weights...

But feel free to open a JIRA issue if you have ideas !