You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Naess, Ronny" <Ro...@avinor.no> on 2007/05/10 08:32:15 UTC
Stop words
Hi.
I am living in Norway and I would like to add a stop word list.
I found this https://issues.apache.org/jira/browse/NUTCH-453 in JIRA
saying something about "moveing stop words from code to config file",
but nothing has happend in this area it seems.
How can I add stop words with current version (0.9)?
Thanks,
Ronny
Re: Stop words
Posted by carmmello <ca...@globo.com>.
Dear Andrzej Bialecki
I added some words to the list in NutchAnalysis.java and tried to crawl some
sites.
When I searched for the original stop words, I got zero results. When I
tried the added words, there were lots of them in the results.
What is going wrong?
Thank you
----- Original Message -----
From: "Andrzej Bialecki" <ab...@getopt.org>
To: <nu...@lucene.apache.org>
Sent: Thursday, May 10, 2007 7:14 AM
Subject: Re: Stop words
> Naess, Ronny wrote:
>> Hi.
>>
>> I am living in Norway and I would like to add a stop word list.
>>
>> I found this https://issues.apache.org/jira/browse/NUTCH-453 in JIRA
>> saying something about "moveing stop words from code to config file",
>> but nothing has happend in this area it seems.
>>
>
> Correct. Patches are welcome ;)
>
>> How can I add stop words with current version (0.9)?
>
> For now, you can simply replace the list that you can find in
> NutchAnalysis.java.
>
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition. Version: 7.5.467 / Virus Database:
> 269.6.6/795 - Release Date: 9/5/2007 15:07
>
>
Re: Stop words
Posted by Andrzej Bialecki <ab...@getopt.org>.
Naess, Ronny wrote:
> Hi.
>
> I am living in Norway and I would like to add a stop word list.
>
> I found this https://issues.apache.org/jira/browse/NUTCH-453 in JIRA
> saying something about "moveing stop words from code to config file",
> but nothing has happend in this area it seems.
>
Correct. Patches are welcome ;)
> How can I add stop words with current version (0.9)?
For now, you can simply replace the list that you can find in
NutchAnalysis.java.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com