You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Cameron M VandenBerg <cm...@cs.cmu.edu> on 2019/10/07 20:39:12 UTC

Adding stemmer option to EnglishAnalyzer

Hello!

My name is Cameron VandenBerg, and I am a research programmer at Carnegie Mellon University.  We use Lucene for projects and classwork here, and one feature we have always added to our own code, which extends the EnglishAnalyzer, is a setStemmer method and use that stemmer in the createComponents method.

Is it possible to add this feature to the EnglishAnalyzer?  If so, what steps can we take?

Code Snippets:
  /**
   * Control whether and how stemming is done. See StemmerType.
   */
  public void setStemmer(StemmerType s) {
    this.stemmer = s;
  }

  @Override
  protected TokenStreamComponents createComponents(String fieldName) {
    final Tokenizer source = new StandardTokenizer();
    TokenStream result = new EnglishPossessiveFilter(source);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopwords);
    if(!stemExclusionSet.isEmpty())
      result = new SetKeywordMarkerFilter(result, stemExclusionSet);
    if (this.stemmer == StemmerType.KSTEM)
       result = new KStemFilter(result);
    else
       result = new PorterStemFilter(result);
    return new TokenStreamComponents(source, result);
  }

Thank you,
Cameron VandenBerg