You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Donna L Gresh <gr...@us.ibm.com> on 2007/10/10 19:52:55 UTC
MoreLikeThis and stopword stemming
What is the appropriate way of achieving both stopwords and stemming of
stopwords when the MoreLikeThis class is used? My analyzer
(MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized
with a stopwords set:
analyzer = new StandardAnalyzer(stopwords) {
public TokenStream tokenStream(String fieldName,
java.io.Reader reader) {
return new SnowballFilter(super
.tokenStream(fieldName,reader),
"English");
}
};
If I do NOTsupply a separate stopwords list to the MoreLikeThis object
(that is, using MoreLikeThis.setStopWords), will "the right thing" happen;
that is, will my input text to the MoreLikeThis object be stemmed and
(stemmed) stopwords removed before a query is formed? It seems that
MoreLikeThis.setStopWords uses a simple lookup of words in the stop words
list (no stemming) which is not what I want.
Thanks in advance
Donna
Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
gresh@us.ibm.com