You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by elisabeth benoit <el...@gmail.com> on 2011/11/29 14:07:38 UTC

Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

Hello,

I'd like to know if the Levensthein distance algorithm used by Solr 4.0
DirectSpellChecker (working quite well I must say) is considering an
inversion as distance = 1 or distance = 2?

For instance, if I write Monteruil and I meant Montreuil, is the distance 1
or 2?

Thanks,
Elisabeth

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Nov 29, 2011 at 9:21 AM, elisabeth benoit
<el...@gmail.com> wrote:
> ok, thanks.
>
> I think it would be a nice improvment to consider inversion as distance =
> 1, since it's a so common mistake. The distance = 2 makes it difficult to
> correct transpositions on small words (for instance, the DirectSpellChecker
> couldn't make the right suggestion for "joile" given for 'jolie").
>

I agree with you, it would be a great improvement. The first step is
to get support for 'transpositions as a primitive edit operation' to
https://bitbucket.org/jpbarrette/moman/ . This is the library we use
to generate the tables.

-- 
lucidimagination.com

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

Posted by elisabeth benoit <el...@gmail.com>.
ok, thanks.

I think it would be a nice improvment to consider inversion as distance =
1, since it's a so common mistake. The distance = 2 makes it difficult to
correct transpositions on small words (for instance, the DirectSpellChecker
couldn't make the right suggestion for "joile" given for 'jolie").

Best,
Elisabeth

one error the DirectSpellChecker couldn't make the right suggestion for is
"joile" for "jolie", I guess because transposition is 2, and because the
word is just five letters long so the inversion
2011/11/29 Robert Muir <rc...@gmail.com>

> On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit
> <el...@gmail.com> wrote:
> > Hello,
> >
> > I'd like to know if the Levensthein distance algorithm used by Solr 4.0
> > DirectSpellChecker (working quite well I must say) is considering an
> > inversion as distance = 1 or distance = 2?
> >
> > For instance, if I write Monteruil and I meant Montreuil, is the
> distance 1
> > or 2?
> >
>
> the algorithm is just levenshtein, so 2. its possible to also support
> a modified form where transpositions count as 1, but its not
> implemented.
>
> --
> lucidimagination.com
>

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit
<el...@gmail.com> wrote:
> Hello,
>
> I'd like to know if the Levensthein distance algorithm used by Solr 4.0
> DirectSpellChecker (working quite well I must say) is considering an
> inversion as distance = 1 or distance = 2?
>
> For instance, if I write Monteruil and I meant Montreuil, is the distance 1
> or 2?
>

the algorithm is just levenshtein, so 2. its possible to also support
a modified form where transpositions count as 1, but its not
implemented.

-- 
lucidimagination.com