You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Naess, Ronny" <Ro...@avinor.no> on 2007/05/10 08:32:15 UTC

Stop words

Hi.

I am living in Norway and I would like to add a stop word list.

I found this https://issues.apache.org/jira/browse/NUTCH-453 in JIRA
saying something about "moveing stop words from code to config file",
but nothing has happend in this area it seems.

How can I add stop words with current version (0.9)?

Thanks,

Ronny

Re: Stop words

Posted by carmmello <ca...@globo.com>.
Dear Andrzej Bialecki

I added some words to the list in NutchAnalysis.java and tried to crawl some 
sites.
When I searched  for  the original stop words, I got zero results.  When I 
tried the added words, there were lots of them in the results.
What is going wrong?
Thank you


----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
To: <nu...@lucene.apache.org>
Sent: Thursday, May 10, 2007 7:14 AM
Subject: Re: Stop words


> Naess, Ronny wrote:
>> Hi.
>>
>> I am living in Norway and I would like to add a stop word list.
>>
>> I found this https://issues.apache.org/jira/browse/NUTCH-453 in JIRA
>> saying something about "moveing stop words from code to config file",
>> but nothing has happend in this area it seems.
>>
>
> Correct. Patches are welcome ;)
>
>> How can I add stop words with current version (0.9)?
>
> For now, you can simply replace the list that you can find in 
> NutchAnalysis.java.
>
>
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition. Version: 7.5.467 / Virus Database: 
> 269.6.6/795 - Release Date: 9/5/2007 15:07
>
> 


Re: Stop words

Posted by Andrzej Bialecki <ab...@getopt.org>.
Naess, Ronny wrote:
> Hi.
> 
> I am living in Norway and I would like to add a stop word list.
> 
> I found this https://issues.apache.org/jira/browse/NUTCH-453 in JIRA
> saying something about "moveing stop words from code to config file",
> but nothing has happend in this area it seems.
> 

Correct. Patches are welcome ;)

> How can I add stop words with current version (0.9)?

For now, you can simply replace the list that you can find in 
NutchAnalysis.java.



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com