You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benson Margulies <be...@basistech.com> on 2013/10/08 16:30:35 UTC

Analyzer classes versus the constituent components

Is there some advice around about when it's appropriate to create an
Analyzer class, as opposed to just Tokenizer and TokenFilter classes?

The advantage of the constituent elements is that they allow the
consuming application to add more filters. The only disadvantage I see
is that the following is a bit on the verbose side. Is there some
advantage or use of an Analyzer class that I'm missing?

private Analyzer newAnalyzer() {
        return new Analyzer() {
            @Override
            protected TokenStreamComponents createComponents(String fieldName,
                                                             Reader reader) {
                Tokenizer source = tokenizerFactory.create(reader,
LanguageCode.JAPANESE);
                com.basistech.rosette.bl.Analyzer rblAnalyzer;
                try {
                    rblAnalyzer = analyzerFactory.create(LanguageCode.JAPANESE);
                } catch (IOException e) {
                    throw new RuntimeException("Error creating RBL
analyzer", e);
                }
                BaseLinguisticsTokenFilter filter = new
BaseLinguisticsTokenFilter(source, rblAnalyzer);
                return new TokenStreamComponents(source, filter);
            }
        };
    }

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Analyzer classes versus the constituent components

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
There are some Analyzer methods you might want to override (initReader 
for inserting a CharFilter, stuff about gaps), but if you don't need 
that, it seems to be mostly about packaging neatly, as you say.

-Mike

On 10/8/13 10:30 AM, Benson Margulies wrote:
> Is there some advice around about when it's appropriate to create an
> Analyzer class, as opposed to just Tokenizer and TokenFilter classes?
>
> The advantage of the constituent elements is that they allow the
> consuming application to add more filters. The only disadvantage I see
> is that the following is a bit on the verbose side. Is there some
> advantage or use of an Analyzer class that I'm missing?
>
> private Analyzer newAnalyzer() {
>          return new Analyzer() {
>              @Override
>              protected TokenStreamComponents createComponents(String fieldName,
>                                                               Reader reader) {
>                  Tokenizer source = tokenizerFactory.create(reader,
> LanguageCode.JAPANESE);
>                  com.basistech.rosette.bl.Analyzer rblAnalyzer;
>                  try {
>                      rblAnalyzer = analyzerFactory.create(LanguageCode.JAPANESE);
>                  } catch (IOException e) {
>                      throw new RuntimeException("Error creating RBL
> analyzer", e);
>                  }
>                  BaseLinguisticsTokenFilter filter = new
> BaseLinguisticsTokenFilter(source, rblAnalyzer);
>                  return new TokenStreamComponents(source, filter);
>              }
>          };
>      }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org