You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Donna L Gresh <gr...@us.ibm.com> on 2007/10/10 19:52:55 UTC

MoreLikeThis and stopword stemming

What is the appropriate way of achieving both stopwords and stemming of 
stopwords when the MoreLikeThis class is used? My analyzer 
(MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized 
with a stopwords set:

analyzer = new StandardAnalyzer(stopwords) {
             public TokenStream tokenStream(String fieldName, 
java.io.Reader reader) {
             return new SnowballFilter(super
.tokenStream(fieldName,reader),
             "English");
             }
};



If I do NOTsupply a separate stopwords list to the MoreLikeThis object 
(that is, using MoreLikeThis.setStopWords), will "the right thing" happen; 
that is, will my input text to the MoreLikeThis object be stemmed and 
(stemmed) stopwords removed before a query is formed? It seems that 
MoreLikeThis.setStopWords uses a simple lookup of words in the stop words 
list (no stemming) which is not what I want.

Thanks in advance
Donna


Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
gresh@us.ibm.com