You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mohamed Yahya <ya...@gmail.com> on 2011/06/08 11:17:53 UTC

Lemmatization

Hi,

Is there something in Lucene that supports lemmatization of the following form:

Mexican --> Mexico (from adjective to name/noune)

Thanks
Mohamed  Yahya

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lemmatization

Posted by Karl Wettin <ka...@gmail.com>.
Perhaps "least frequent substring" or even "suffix truncation" might be enough for your needs.

Here is a related paper: http://web.jhu.edu/bin/q/b/p75-mcnamee.pdf


	karl



On Jun 8, 2011, at 1:52 PM, Mohamed Yahya wrote:

> You're right. Still, I am not sure if there is a library that would
> take care of examples such as the one I gave.
> 
> On Wed, Jun 8, 2011 at 11:25, Lahiru Samarakoon <la...@gmail.com> wrote:
>> Hi,
>> 
>>> 
>>> Is there something in Lucene that supports lemmatization of the following
>>> form:
>>> 
>>> Mexican --> Mexico (from adjective to name/noune)
>>> 
>>> Lemmatization do not change part of speech. I think you are looking for a
>> stemming algorithm.
>> 
>> http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
>> 
>>> 
>>> 
>>> Thanks,
>> Lahiru
>> 
> 
> 
> 
> -- 
> Mohamed  Yahya
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lemmatization

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Jun 8, 2011 at 7:52 AM, Mohamed Yahya <ya...@gmail.com> wrote:
> You're right. Still, I am not sure if there is a library that would
> take care of examples such as the one I gave.
>

which is why you might want to just pick one that is close to what you
want, and then customize/tune it with any stuff particular to your use
case.

http://lucene.apache.org/java/3_2_0/api/contrib-analyzers/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lemmatization

Posted by Mohamed Yahya <ya...@gmail.com>.
You're right. Still, I am not sure if there is a library that would
take care of examples such as the one I gave.

On Wed, Jun 8, 2011 at 11:25, Lahiru Samarakoon <la...@gmail.com> wrote:
> Hi,
>
>>
>> Is there something in Lucene that supports lemmatization of the following
>> form:
>>
>> Mexican --> Mexico (from adjective to name/noune)
>>
>> Lemmatization do not change part of speech. I think you are looking for a
> stemming algorithm.
>
> http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
>
>>
>>
>> Thanks,
> Lahiru
>



-- 
Mohamed  Yahya

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lemmatization

Posted by Lahiru Samarakoon <la...@gmail.com>.
Hi,

>
> Is there something in Lucene that supports lemmatization of the following
> form:
>
> Mexican --> Mexico (from adjective to name/noune)
>
> Lemmatization do not change part of speech. I think you are looking for a
stemming algorithm.

http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

>
>
> Thanks,
Lahiru