You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by pa...@hotmail.com on 2021/09/29 17:57:03 UTC

Chain or customize lucene analyzers with custom TokenFIlter

Hello,

I would like to enrich the lucene FrenchAnalyzer with additional
TokenFilter, but I can't find how to do it.

So far I successfully tested custom analyzer and TokenFilters having
this code, but now I don't understand how to "inject" my filters in
the FrenchAnalyzer. Can someone help me here ? 

@Override
protected TokenStreamComponents createComponents(final String
fieldName) {
   final StandardTokenizer src = new StandardTokenizer();
   src.setMaxTokenLength(maxTokenLength);
   TokenStream tok = new AccentTokenFilter(new GenderTokenFilter(new
PluralTokenFilter(new NGramTokenFilter(new
LowerCaseFilter(src),1,5,true))));
   tok = new StopFilter(tok, stopwords);
   return new TokenStreamComponents(r -> {
      src.setMaxTokenLength(BpsisAnalyzer.this.maxTokenLength);
      src.setReader(r);
   }, tok);
}

@Override
protected TokenStream normalize(String fieldName, TokenStream in) {
   return new AccentTokenFilter(new GenderTokenFilter(new
PluralTokenFilter(new LowerCaseFilter(in))));
}

thanksstephane