You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gonçalo Gaiolas <go...@outsystems.com> on 2006/09/19 10:40:30 UTC
Stemming and Synonyms
Hi everyone!
Im using version 7.2 of Nutch and Im very happy with it. Want to send a
big thumbs up for you guys behind it!
Having said that, Id like to make my users search experience as good as
possible. To do that, I need to solve two little problems :
- Stemming in my index I have lots of plurals and verbal forms
that prevent my users from sometimes finding the right results. Ive been
looking around and it seems that the only stemming implementation available
for nutch is described in the wiki and requires extensive changes in Nutch
code, something Id like to avoid. Can somebody help me ?
- Synonyms Ok, I dont really need synonyms. What I need is a way
to specify that Image Converter should be equal to ImageConverter, or
WebBlock should be the same as web block. How can I do this? This one is
really impacting the search quality :-)
Anyway, thanks a lot for the great piece of sotware you delivered!
Cheers,
Gonçalo Gaiolas
OutSystems Engineering
Office: +351 21 415 37 30
Fax: +351 21 415 37 31
goncalo.gaiolas@outsystems.com
www.outsystems.com
The information in this e-mail and all its attachments is confidential and
intended solely for the use of the person that received it directly from
OutSystems. Any disclosure, copying, distribution or retaining of this
message and/or documents or any part of it, without the prior written
consent of OutSystems, is prohibited and may be unlawful. Usage of this
email and/or documents is fully bound by OutSystems "Terms of Use",
available at http://www.outsystems.net/Legalinformation. If you are not the
intended recipient of this message and/or its attached documents, please
inform OutSystems (info@outsystems.com) and destroy this messages and its
attached documents immediately, retaining no copy.
Re: Stemming and Synonyms
Posted by Richard Braman <rb...@taxcodesoftware.org>.
I dont think it should be 7.2 before we get some natural language
processing.
especially if there is public collaboration with nutch community and the
folks at
http://opennlp.sourceforge.net/
:-0
Tomi NA wrote:
> On 9/19/06, Gonçalo Gaiolas <go...@outsystems.com> wrote:
>> Hi everyone!
>>
>>
>>
>> I'm using version 7.2 of Nutch and I'm very happy with it. Want to
>> send a
>> big thumbs up for you guys behind it!
>
> Welcome, our honoured guest from the future! :) 7.2 probably includes
> natural language processing and spawns a great deal of controversy as
> to weather it can be considered "intelligent" or just very good at
> smalltalk. :)
>
>> Having said that, I'd like to make my users search experience as good as
>> possible. To do that, I need to solve two little "problems" :
>>
>> - Stemming – in my index I have lots of plurals and verbal
>> forms
>> that prevent my users from sometimes finding the right results. I've
>> been
>> looking around and it seems that the only stemming implementation
>> available
>> for nutch is described in the wiki and requires extensive changes in
>> Nutch
>> code, something I'd like to avoid. Can somebody help me ?
>>
>> - Synonyms – Ok, I don't really need synonyms. What I need
>> is a way
>> to specify that Image Converter should be equal to ImageConverter, or
>> WebBlock should be the same as web block. How can I do this? This one is
>> really impacting the search quality :-)
>
> I guess you need a different Analyzer. There's a list at
> http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html
>
> You could also write your own to best represent the data you have.
>
> Cheers,
> t.n.a.
>
>
>
Re: Stemming and Synonyms
Posted by Tomi NA <he...@gmail.com>.
On 9/19/06, Gonçalo Gaiolas <go...@outsystems.com> wrote:
> Hi everyone!
>
>
>
> I'm using version 7.2 of Nutch and I'm very happy with it. Want to send a
> big thumbs up for you guys behind it!
Welcome, our honoured guest from the future! :) 7.2 probably includes
natural language processing and spawns a great deal of controversy as
to weather it can be considered "intelligent" or just very good at
smalltalk. :)
> Having said that, I'd like to make my users search experience as good as
> possible. To do that, I need to solve two little "problems" :
>
> - Stemming – in my index I have lots of plurals and verbal forms
> that prevent my users from sometimes finding the right results. I've been
> looking around and it seems that the only stemming implementation available
> for nutch is described in the wiki and requires extensive changes in Nutch
> code, something I'd like to avoid. Can somebody help me ?
>
> - Synonyms – Ok, I don't really need synonyms. What I need is a way
> to specify that Image Converter should be equal to ImageConverter, or
> WebBlock should be the same as web block. How can I do this? This one is
> really impacting the search quality :-)
I guess you need a different Analyzer. There's a list at
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html
You could also write your own to best represent the data you have.
Cheers,
t.n.a.