You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Cameron M VandenBerg <cm...@cs.cmu.edu> on 2019/10/07 20:39:12 UTC
Adding stemmer option to EnglishAnalyzer
Hello!
My name is Cameron VandenBerg, and I am a research programmer at Carnegie Mellon University. We use Lucene for projects and classwork here, and one feature we have always added to our own code, which extends the EnglishAnalyzer, is a setStemmer method and use that stemmer in the createComponents method.
Is it possible to add this feature to the EnglishAnalyzer? If so, what steps can we take?
Code Snippets:
/**
* Control whether and how stemming is done. See StemmerType.
*/
public void setStemmer(StemmerType s) {
this.stemmer = s;
}
@Override
protected TokenStreamComponents createComponents(String fieldName) {
final Tokenizer source = new StandardTokenizer();
TokenStream result = new EnglishPossessiveFilter(source);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopwords);
if(!stemExclusionSet.isEmpty())
result = new SetKeywordMarkerFilter(result, stemExclusionSet);
if (this.stemmer == StemmerType.KSTEM)
result = new KStemFilter(result);
else
result = new PorterStemFilter(result);
return new TokenStreamComponents(source, result);
}
Thank you,
Cameron VandenBerg