You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ahmed Abdeen Hamed <ah...@gmail.com> on 2012/03/19 16:44:17 UTC

Edit Distance

Hello,

Does Mahout have support for Edit Distance between two Strings? I looked on
the web but can't find anything. Please let me know if it does.

Thanks very much,

-Ahmed

Re: Edit Distance

Posted by Ahmed Abdeen Hamed <ah...@gmail.com>.
Thank you all for your responses!

Dawid: I actually ended up using TFIDF distance, which sounds similar to
your friend's work. The Edit Distance was going to compare characters to
characters vs token. I needed the latter which why I ended up using the
TFIDF distance instead. I intended to share what I did but you beat me to
it. I also ended up using LingPipe APIs which is a licensed NLP framework,
which supports the TFIDF distance among others.

Thanks very much for the lovely discussion!

-Ahmed

On Mon, Mar 19, 2012 at 4:06 PM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

> This isn't of immediate relevance to you perhaps, but my friend once
> did a comparison of string edit distance metrics for name matching
> correction.
>
>
> http://www.mendeley.com/research/comparison-string-distance-metrics-namematching-tasks-3/
>
> Dawid
>
> On Mon, Mar 19, 2012 at 4:44 PM, Ahmed Abdeen Hamed
> <ah...@gmail.com> wrote:
> > Hello,
> >
> > Does Mahout have support for Edit Distance between two Strings? I looked
> on
> > the web but can't find anything. Please let me know if it does.
> >
> > Thanks very much,
> >
> > -Ahmed
>

Re: Edit Distance

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
Hmm... I just realized I've sent an incorrect link. That is: the link
is fine (and the paper as well), but none of these folks are my among
my friends :)

The one I meant to send is this one:
http://www.pubzone.org/dblp/conf/ltconf/PiskorskiSW07

Dawid

On Mon, Mar 19, 2012 at 9:30 PM, Ted Dunning <te...@gmail.com> wrote:
> While I didn't as nice a job as your friend, TFIDF of n-grams has
> consistently done very well for me.  The soft TFIDF that they examine is
> something that I haven't previously looked at, but everything else seems
> just in order based on what I have seen.
>
> On Mon, Mar 19, 2012 at 1:06 PM, Dawid Weiss
> <da...@cs.put.poznan.pl>wrote:
>
>> This isn't of immediate relevance to you perhaps, but my friend once
>> did a comparison of string edit distance metrics for name matching
>> correction.
>>
>>
>> http://www.mendeley.com/research/comparison-string-distance-metrics-namematching-tasks-3/
>>
>> Dawid
>>
>> On Mon, Mar 19, 2012 at 4:44 PM, Ahmed Abdeen Hamed
>> <ah...@gmail.com> wrote:
>> > Hello,
>> >
>> > Does Mahout have support for Edit Distance between two Strings? I looked
>> on
>> > the web but can't find anything. Please let me know if it does.
>> >
>> > Thanks very much,
>> >
>> > -Ahmed
>>

Re: Edit Distance

Posted by Ted Dunning <te...@gmail.com>.
While I didn't as nice a job as your friend, TFIDF of n-grams has
consistently done very well for me.  The soft TFIDF that they examine is
something that I haven't previously looked at, but everything else seems
just in order based on what I have seen.

On Mon, Mar 19, 2012 at 1:06 PM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

> This isn't of immediate relevance to you perhaps, but my friend once
> did a comparison of string edit distance metrics for name matching
> correction.
>
>
> http://www.mendeley.com/research/comparison-string-distance-metrics-namematching-tasks-3/
>
> Dawid
>
> On Mon, Mar 19, 2012 at 4:44 PM, Ahmed Abdeen Hamed
> <ah...@gmail.com> wrote:
> > Hello,
> >
> > Does Mahout have support for Edit Distance between two Strings? I looked
> on
> > the web but can't find anything. Please let me know if it does.
> >
> > Thanks very much,
> >
> > -Ahmed
>

Re: Edit Distance

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
This isn't of immediate relevance to you perhaps, but my friend once
did a comparison of string edit distance metrics for name matching
correction.

http://www.mendeley.com/research/comparison-string-distance-metrics-namematching-tasks-3/

Dawid

On Mon, Mar 19, 2012 at 4:44 PM, Ahmed Abdeen Hamed
<ah...@gmail.com> wrote:
> Hello,
>
> Does Mahout have support for Edit Distance between two Strings? I looked on
> the web but can't find anything. Please let me know if it does.
>
> Thanks very much,
>
> -Ahmed

Re: Edit Distance

Posted by reinhard schwab <re...@aon.at>.
lucene has some classes to calculate the edit distance.

http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/contrib-spellchecker/org/apache/lucene/search/spell/package-summary.html

Package org.apache.lucene.search.spell

regards
reinhard

Am 19.03.2012 16:44, schrieb Ahmed Abdeen Hamed:
> Hello,
>
> Does Mahout have support for Edit Distance between two Strings? I looked on
> the web but can't find anything. Please let me know if it does.
>
> Thanks very much,
>
> -Ahmed
>
>   


Re: Edit Distance

Posted by David Kincaid <ki...@gmail.com>.
Mahout doesn't, but Lucene does and the Lucene libraries are shipped with
Mahout

http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/spell/LevensteinDistance.html

On Mon, Mar 19, 2012 at 10:49, Ahmed Abdeen Hamed
<ah...@gmail.com>wrote:

> Thanks very much!
>
> -Ahmed
>
> On Mon, Mar 19, 2012 at 11:46 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > No I don't think that really comes into play in any of the ML algorithms
> > here. At least I do not recall seeing it.
> >
> >
>

Re: Edit Distance

Posted by Ahmed Abdeen Hamed <ah...@gmail.com>.
Thanks very much!

-Ahmed

On Mon, Mar 19, 2012 at 11:46 AM, Sean Owen <sr...@gmail.com> wrote:

> No I don't think that really comes into play in any of the ML algorithms
> here. At least I do not recall seeing it.
>
>

Re: Edit Distance

Posted by Sean Owen <sr...@gmail.com>.
No I don't think that really comes into play in any of the ML algorithms
here. At least I do not recall seeing it.

On Mon, Mar 19, 2012 at 3:44 PM, Ahmed Abdeen Hamed <ahmed.elmasri@gmail.com
> wrote:

> Hello,
>
> Does Mahout have support for Edit Distance between two Strings? I looked on
> the web but can't find anything. Please let me know if it does.
>
> Thanks very much,
>
> -Ahmed
>