You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ben Davies <be...@gmail.com> on 2011/04/13 14:14:17 UTC

Field Analyzers: which values are indexed?

Hi there,

Just a quick question that the wiki page (
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to
answer very well.

Given an analyzer that has  zero or more Char Filter Factories, one
Tokenizer Factory, and zero or more Token Filter Factories, which value(s)
are indexed?

Is every value that is produced from each char filter, tokenizer, and filter
indexed?
Or is the only the final value after completing the whole chain indexed?

Cheers,
Ben

Re: Field Analyzers: which values are indexed?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
> Or is the only the final value after completing the whole chain indexed?

Yes.

Koji
-- 
http://www.rondhuit.com/en/

Re: Field Analyzers: which values are indexed?

Posted by Ben Davies <be...@gmail.com>.
Thanks both for your replies

Eric,
Yep, I use the Analysis page extensively, but what I was directly looking
for was whether all of only the last line of values given by the analysis
page, where eventually indexed.
I think we've concluded it's only the last line.

Cheers,
Ben

On Wed, Apr 13, 2011 at 2:41 PM, Erick Erickson <er...@gmail.com>wrote:

> CharFilterFactories are applied to the raw input before tokenization.
> Each token output from the tokenization is then sent through
> the rest of the chain.
>
> The Analysis page available from the Solr admin page is
> invaluable in answering in great detail what each part of
> an analysis chain does.
>
> TokenFilterFactories are applied to each token emitted from
> the tokenizer, and this includes the similar
> PatternReplaceFilterFactory. The difference is that the
> PatternReplaceCharFilterFactory is applied before tokenization
> to the entire input stream and PatternReplaceFilterFactory
> is applied to each token emitted by the tokenizer.
>
> And to make it even more fun, you can do both!
>
> Best
> Erick
>
> On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies <be...@gmail.com> wrote:
>
> > Hi there,
> >
> > Just a quick question that the wiki page (
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem
> > to
> > answer very well.
> >
> > Given an analyzer that has  zero or more Char Filter Factories, one
> > Tokenizer Factory, and zero or more Token Filter Factories, which
> value(s)
> > are indexed?
> >
> > Is every value that is produced from each char filter, tokenizer, and
> > filter
> > indexed?
> > Or is the only the final value after completing the whole chain indexed?
> >
> > Cheers,
> > Ben
> >
>

Re: Field Analyzers: which values are indexed?

Posted by Erick Erickson <er...@gmail.com>.
CharFilterFactories are applied to the raw input before tokenization.
Each token output from the tokenization is then sent through
the rest of the chain.

The Analysis page available from the Solr admin page is
invaluable in answering in great detail what each part of
an analysis chain does.

TokenFilterFactories are applied to each token emitted from
the tokenizer, and this includes the similar
PatternReplaceFilterFactory. The difference is that the
PatternReplaceCharFilterFactory is applied before tokenization
to the entire input stream and PatternReplaceFilterFactory
is applied to each token emitted by the tokenizer.

And to make it even more fun, you can do both!

Best
Erick

On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies <be...@gmail.com> wrote:

> Hi there,
>
> Just a quick question that the wiki page (
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem
> to
> answer very well.
>
> Given an analyzer that has  zero or more Char Filter Factories, one
> Tokenizer Factory, and zero or more Token Filter Factories, which value(s)
> are indexed?
>
> Is every value that is produced from each char filter, tokenizer, and
> filter
> indexed?
> Or is the only the final value after completing the whole chain indexed?
>
> Cheers,
> Ben
>