You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gonçalo Gaiolas <go...@outsystems.com> on 2006/09/19 10:40:30 UTC

Stemming and Synonyms

Hi everyone!

 

I’m using version 7.2 of Nutch and I’m very happy with it. Want to send a
big thumbs up for you guys behind it!

 

Having said that, I’d like to make my users search experience as good as
possible. To do that, I need to solve two little “problems” :

-          Stemming – in my index I have lots of plurals and verbal forms
that prevent my users from sometimes finding the right results. I’ve been
looking around and it seems that the only stemming implementation available
for nutch is described in the wiki and requires extensive changes in Nutch
code, something I’d like to avoid. Can somebody help me ?

-          Synonyms – Ok, I don’t really need synonyms. What I need is a way
to specify that Image Converter should be equal to ImageConverter, or
WebBlock should be the same as web block. How can I do this? This one is
really impacting the search quality :-)

 

Anyway, thanks a lot for the great piece of sotware you delivered!

 

Cheers,

 

Gonçalo Gaiolas

OutSystems Engineering 

Office: +351 21 415 37 30

Fax:    +351 21 415 37 31

goncalo.gaiolas@outsystems.com

www.outsystems.com

 

 


 

 


The information in this e-mail and all its attachments is confidential and
intended solely for the use of the person that received it directly from
OutSystems. Any disclosure, copying, distribution or retaining of this
message and/or documents or any part of it, without the prior written
consent of OutSystems, is prohibited and may be unlawful. Usage of this
email and/or documents is fully bound by OutSystems "Terms of Use",
available at http://www.outsystems.net/Legalinformation. If you are not the
intended recipient of this message and/or its attached documents, please
inform OutSystems (info@outsystems.com) and destroy this messages and its
attached documents immediately, retaining no copy.

 


Re: Stemming and Synonyms

Posted by Richard Braman <rb...@taxcodesoftware.org>.
I dont think it should be 7.2 before we get some natural language
processing.
especially if there is public collaboration with nutch community and the
folks at
http://opennlp.sourceforge.net/
:-0

Tomi NA wrote:
> On 9/19/06, Gonçalo Gaiolas <go...@outsystems.com> wrote:
>> Hi everyone!
>>
>>
>>
>> I'm using version 7.2 of Nutch and I'm very happy with it. Want to
>> send a
>> big thumbs up for you guys behind it!
>
> Welcome, our honoured guest from the future! :) 7.2 probably includes
> natural language processing and spawns a great deal of controversy as
> to weather it can be considered "intelligent" or just very good at
> smalltalk. :)
>
>> Having said that, I'd like to make my users search experience as good as
>> possible. To do that, I need to solve two little "problems" :
>>
>> -          Stemming – in my index I have lots of plurals and verbal
>> forms
>> that prevent my users from sometimes finding the right results. I've
>> been
>> looking around and it seems that the only stemming implementation
>> available
>> for nutch is described in the wiki and requires extensive changes in
>> Nutch
>> code, something I'd like to avoid. Can somebody help me ?
>>
>> -          Synonyms – Ok, I don't really need synonyms. What I need
>> is a way
>> to specify that Image Converter should be equal to ImageConverter, or
>> WebBlock should be the same as web block. How can I do this? This one is
>> really impacting the search quality :-)
>
> I guess you need a different Analyzer. There's a list at
> http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html
>
> You could also write your own to best represent the data you have.
>
> Cheers,
> t.n.a.
>
>
>



Re: Stemming and Synonyms

Posted by Tomi NA <he...@gmail.com>.
On 9/19/06, Gonçalo Gaiolas <go...@outsystems.com> wrote:
> Hi everyone!
>
>
>
> I'm using version 7.2 of Nutch and I'm very happy with it. Want to send a
> big thumbs up for you guys behind it!

Welcome, our honoured guest from the future! :) 7.2 probably includes
natural language processing and spawns a great deal of controversy as
to weather it can be considered "intelligent" or just very good at
smalltalk. :)

> Having said that, I'd like to make my users search experience as good as
> possible. To do that, I need to solve two little "problems" :
>
> -          Stemming – in my index I have lots of plurals and verbal forms
> that prevent my users from sometimes finding the right results. I've been
> looking around and it seems that the only stemming implementation available
> for nutch is described in the wiki and requires extensive changes in Nutch
> code, something I'd like to avoid. Can somebody help me ?
>
> -          Synonyms – Ok, I don't really need synonyms. What I need is a way
> to specify that Image Converter should be equal to ImageConverter, or
> WebBlock should be the same as web block. How can I do this? This one is
> really impacting the search quality :-)

I guess you need a different Analyzer. There's a list at
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html
You could also write your own to best represent the data you have.

Cheers,
t.n.a.