You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Marco Pereira <ma...@gmail.com> on 2006/07/13 17:36:40 UTC

Commom words

Hi,

 Is there a way to Nutch ignore commom words while searching?
 For example,  while searching for "the boy and the girl" it would only look
for "boy girl".

Thanks,
Marco

RE: Commom words

Posted by Bogdan Kecman <bo...@alteray.com>.
>  Is there a way to Nutch ignore commom words while searching?
>  For example,  while searching for "the boy and the girl" it 
> would only look for "boy girl".

Yes,
In nutch conf dir there is a file common-terms.utf8
Copy that file also in your java container

Hope this helps
Bogdan


RE: Commom words

Posted by Bogdan Kecman <bo...@alteray.com>.
 
>  Is there a way to Nutch ignore commom words while searching?
>  For example,  while searching for "the boy and the girl" it 
> would only look for "boy girl".

Small addition from wiki:
http://wiki.apache.org/nutch/FAQ#head-12f4fd64f03fc3cd0a3063b9283ed829963ed4
88
You can tweak your conf/common-terms.utf8 file after creating an index
through the following command:

bin/nutch org.apache.nutch.indexer.HighFreqTerms -count 10 -nofreqs index 

Regards
Bogdan