You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Michael Ryan <mr...@moreover.com> on 2011/11/03 21:16:38 UTC

UnInvertedField vs FieldCache for facets for single-token text fields

I have some fields I facet on that are TextFields but have just a single token.
The fieldType looks like this:

<fieldType name="myStringFieldType" class="solr.TextField" indexed="true"
    stored="false" omitNorms="true" sortMissingLast="true"
    positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
  </analyzer>
</fieldType>

SimpleFacets uses an UnInvertedField for these fields because
multiValuedFieldCache() returns true for TextField. I tried changing the type for
these fields to the plain "string" type (StrField). The facets *seem* to be
generated much faster. Is it expected that FieldCache would be faster than
UnInvertedField for single-token strings like this?

My goal is to make the facet re-generation after a commit as fast as possible. I
would like to continue using TextField for these fields since I have a need for
filters like LowerCaseFilterFactory, which still produces a single token. Is it
safe to extend TextField and have multiValuedFieldCache() return false for these
fields, so that UnInvertedField is not used? Or is there a better way to
accomplish what I'm trying to do?

-Michael

Re: UnInvertedField vs FieldCache for facets for single-token text fields

Posted by Martijn v Groningen <ma...@gmail.com>.

Hi Micheal,

The FieldCache is an easier data structure and easier to create, so I
also expect it to be faster. Unfortunately for TextField
UnInvertedField
is always used even if you have one token per document. I think
overriding the multiValuedFieldCache method and return false would
work.

If you're using 4.0-dev (trunk) I'd use facet.method=fcs (this
parameter is only useable if multiValuedFieldCache method returns
false)
This is per segment faceting and the cache will only be extended for
new segments. This field facet approach is better for indexes with
frequent changes.
I think this even faster in your case then just using the FieldCache
method (which operates on a top level reader. After each commit the
complete cache is invalid and has to be recreated).

Otherwise I'd try facet.method=enum which is fast if you have fewer
distinct facet values (num of docs doesn't influence the performance
that much).
The facet.method=enum option is also valid for normal TextFields, so
no need to have custom code.

Martijn

On 3 November 2011 21:16, Michael Ryan <mr...@moreover.com> wrote:
> I have some fields I facet on that are TextFields but have just a single token.
> The fieldType looks like this:
>
> <fieldType name="myStringFieldType" class="solr.TextField" indexed="true"
>    stored="false" omitNorms="true" sortMissingLast="true"
>    positionIncrementGap="100">
>  <analyzer>
>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>  </analyzer>
> </fieldType>
>
> SimpleFacets uses an UnInvertedField for these fields because
> multiValuedFieldCache() returns true for TextField. I tried changing the type for
> these fields to the plain "string" type (StrField). The facets *seem* to be
> generated much faster. Is it expected that FieldCache would be faster than
> UnInvertedField for single-token strings like this?
>
> My goal is to make the facet re-generation after a commit as fast as possible. I
> would like to continue using TextField for these fields since I have a need for
> filters like LowerCaseFilterFactory, which still produces a single token. Is it
> safe to extend TextField and have multiValuedFieldCache() return false for these
> fields, so that UnInvertedField is not used? Or is there a better way to
> accomplish what I'm trying to do?
>
> -Michael
>

-- 
Met vriendelijke groet,

Martijn van Groningen