You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andy <an...@yahoo.com> on 2010/10/05 07:21:10 UTC
Differences between FilterFactory and TokenizerFactory?
There are EdgeNGramFilterFactory & EdgeNGramTokenizerFactory.
Likewise there are StandardFilterFactory & StandardTokenizerFactory.
LowerCaseFilterFactory & LowerCaseTokenizerFactory.
Seems like they always come in pairs.
What are the differences between FilterFactory and TokenizerFactory? When should I use one as opposed to the other?
Thanks
Re: Differences between FilterFactory and TokenizerFactory?
Posted by Ahmet Arslan <io...@yahoo.com>.
> There are EdgeNGramFilterFactory
> & EdgeNGramTokenizerFactory.
>
> Likewise there are StandardFilterFactory &
> StandardTokenizerFactory.
>
> LowerCaseFilterFactory & LowerCaseTokenizerFactory.
>
> Seems like they always come in pairs.
>
> What are the differences between FilterFactory and
> TokenizerFactory? When should I use one as opposed to the
> other?
Tokenizer breaks input text into words/tokens. Its input is a Reader. Only one tokenizer exists in an Analyzer. For example StandardTokenizer removes punctuations, recognizes e-mail addresses.
TokenFilters operate on output of tokenizer. Its input is words/tokens.
LowerCaseTokenizerFactory can be expressed as combination of LetterTokenizer + LowerCaseFilter.
EdgeNGramTokenizerFactory can be think as KeywordTokenizer + EdgeNGramFilterFactory.
For example when you have LetterTokenizer + LowerCaseFilter combination in your analyzer chain, you can replace them with LowerCaseTokenizerFactory for performance gain.