You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by manjula wijewickrema <ma...@gmail.com> on 2010/12/21 04:50:58 UTC

Editing StopWordList

Hi,

1) In my application, I need to add more words to the stop word list.
Therefore, is it possible to add more words into the default lucene stop
word list?

2) If is it possible, then how can I do this?

Appreciate any comment from you.

Thanks,
Manjula.

Re: Editing StopWordList

Posted by Anshum <an...@gmail.com>.
The default stopword list is final and so you'd have to copy this to some
other object and add terms to it instead of being able to directly modify
it. So all said and done, what I meant when I said you could use it was that
you could read that object and construct your own set (almost as case 2
below).


--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Dec 21, 2010 at 3:54 PM, manjula wijewickrema
<ma...@gmail.com>wrote:

> Hi Gupta,
>
> Thanx a lot for your reply. But I could not understand whether I could
> modify (adding more words) to the default stop word list or should I have
> to
> make a new list as an array as follows.
> * public string[] NEW_STOP_WORDS = { "a", "and", "are", "as", "at", "be",
> "but", "by", "for", "if", "in", "into", "is", "no", "not", "of", "on",
> "or",
>
> "s", "such", "t", "that", "the", "their", "then", "there", "these", "they",
> "this", "to", "was", "will", "with",
> "inc","incorporated","co.","ltd","ltd.", "we", "you", "your", "us",
> etc...};
> *
> then call it as follows,
>
> SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English",
> StopAnalyzer.NEW_STOP_WORDS );
> Am I correct?
> Or if not could you explain me how can I do this?
>
> Thanx in advance.
> Manjula.
>
> On Tue, Dec 21, 2010 at 10:36 AM, Anshum <an...@gmail.com> wrote:
>
> > Hi Manjula,
> > You could initialize the Analyzer using a modified stop word set. Use
> > the *StopAnalyzer.ENGLISH_STOP_WORDS_SET
> > *to get the default stopset and then add your own words to it. You could
> > then initialize the analyzer using this new stop set instead of the
> default
> > stop set.
> > Hope that helps.
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Tue, Dec 21, 2010 at 9:20 AM, manjula wijewickrema
> > <ma...@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > 1) In my application, I need to add more words to the stop word list.
> > > Therefore, is it possible to add more words into the default lucene
> stop
> > > word list?
> > >
> > > 2) If is it possible, then how can I do this?
> > >
> > > Appreciate any comment from you.
> > >
> > > Thanks,
> > > Manjula.
> > >
> >
>

Re: Editing StopWordList

Posted by manjula wijewickrema <ma...@gmail.com>.
Hi Gupta,

Thanx a lot for your reply. But I could not understand whether I could
modify (adding more words) to the default stop word list or should I have to
make a new list as an array as follows.
public string[] NEW_STOP_WORDS = { "a", "and", "are", "as", "at", "be",
"but", "by", "for", "if", "in", "into", "is", "no", "not", "of", "on", "or",

"s", "such", "t", "that", "the", "their", "then", "there", "these", "they",
"this", "to", "was", "will", "with",
"inc","incorporated","co.","ltd","ltd.", "we", "you", "your", "us",
etc...};

then call it as follows,

SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English",
StopAnalyzer.NEW_STOP_WORDS );
Am I correct?
Or if not could you explain me how can I do this?

Thanx in advance.
Manjula.

On Tue, Dec 21, 2010 at 10:36 AM, Anshum <an...@gmail.com> wrote:

> Hi Manjula,
> You could initialize the Analyzer using a modified stop word set. Use
> the *StopAnalyzer.ENGLISH_STOP_WORDS_SET
> *to get the default stopset and then add your own words to it. You could
> then initialize the analyzer using this new stop set instead of the default
> stop set.
> Hope that helps.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Dec 21, 2010 at 9:20 AM, manjula wijewickrema
> <ma...@gmail.com>wrote:
>
> > Hi,
> >
> > 1) In my application, I need to add more words to the stop word list.
> > Therefore, is it possible to add more words into the default lucene stop
> > word list?
> >
> > 2) If is it possible, then how can I do this?
> >
> > Appreciate any comment from you.
> >
> > Thanks,
> > Manjula.
> >
>

Re: Editing StopWordList

Posted by Anshum <an...@gmail.com>.
Hi Manjula,
You could initialize the Analyzer using a modified stop word set. Use
the *StopAnalyzer.ENGLISH_STOP_WORDS_SET
*to get the default stopset and then add your own words to it. You could
then initialize the analyzer using this new stop set instead of the default
stop set.
Hope that helps.

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Dec 21, 2010 at 9:20 AM, manjula wijewickrema
<ma...@gmail.com>wrote:

> Hi,
>
> 1) In my application, I need to add more words to the stop word list.
> Therefore, is it possible to add more words into the default lucene stop
> word list?
>
> 2) If is it possible, then how can I do this?
>
> Appreciate any comment from you.
>
> Thanks,
> Manjula.
>