You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Davis <jo...@gmail.com> on 2019/05/22 22:06:04 UTC

Facet count incorrect

Hi there -
Our facet counts are incorrect for a particular field and I suspect it is
because we changed the type of the field from StrField to TextField. Two
questions:

1. If we do re-index all the documents in the index, would these counts get
fixed?
2. Is there a "safe" way of changing field types that generally works?

*Old type:*
  <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
docValues="true" multiValued="true"/>

*New type:*
  <fieldType name="lowercase_notokenize" class="solr.TextField"
omitNorms="true" omitTermFreqAndPositions="true" indexed="true"
stored="true" positionIncrementGap="100" sortMissingLast="true"
multiValued="true">
<analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Re: Facet count incorrect

Posted by Erick Erickson <er...@gmail.com>.
You’ll have subtle, or not so subtle problems. String types are a single token, so a document with “my dog has fleas” will not be returned when searching for any of those 4 words. My definition there’s no position information in stored with the string type, so no phrases will work against docs indexed before you made the change.

I do not know what other things will creep out of the woodwork. Point is you’re playing with fire here….

Depending on how your configured and the like, and assuming you’re using SolrCloud, you could delete one replica from each shard, add a new collection with exactly one replica (leader-only) per shard and index to that. From there, and assuming quiescent indexing, add one replica to the each shard of the new collection for failover.

Once that’s complete, switch the alias and delete the old collection. Then addreplicas to to the new collection to build it out.

Best,
Erick

> On May 23, 2019, at 2:11 PM, John Davis <jo...@gmail.com> wrote:
> 
> Reindexing to alias is not always easy if it requires 2x resources. Just to
> be clear the issues you mentioned are mostly around faceting because we
> haven't seen any other search/retrieval issues. Or is that not accurate?
> 
> On Wed, May 22, 2019 at 5:12 PM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> 1> I strongly recommend you re-index into a new collection and switch to
>> it with a collection alias rather than try to re-index all the docs.
>> Segment merging with the same field with dissimilar definitions is not
>> guaranteed to do the right thing.
>> 
>> 2> No. There a few (very few) things that don’t require starting fresh.
>> You can do some things like add a lowercasefilter, add or remove a field
>> totally and the like. Even then you’ll go through a period of mixed-up
>> results until the reindex is complete. But changing the type, changing from
>> multiValued to singleValued or vice versa (particularly with docValues)
>> etc. are all “fraught”.
>> 
>> My usual reply is “if you’re going to reindex everything anyway, why not
>> just do it to a new collection and alias when you’re done?” It’s much safer.
>> 
>> Best,
>> Erick
>> 
>>> On May 22, 2019, at 3:06 PM, John Davis <jo...@gmail.com>
>> wrote:
>>> 
>>> Hi there -
>>> Our facet counts are incorrect for a particular field and I suspect it is
>>> because we changed the type of the field from StrField to TextField. Two
>>> questions:
>>> 
>>> 1. If we do re-index all the documents in the index, would these counts
>> get
>>> fixed?
>>> 2. Is there a "safe" way of changing field types that generally works?
>>> 
>>> *Old type:*
>>> <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
>>> docValues="true" multiValued="true"/>
>>> 
>>> *New type:*
>>> <fieldType name="lowercase_notokenize" class="solr.TextField"
>>> omitNorms="true" omitTermFreqAndPositions="true" indexed="true"
>>> stored="true" positionIncrementGap="100" sortMissingLast="true"
>>> multiValued="true">
>>> <analyzer>
>>>     <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>     <filter class="solr.LowerCaseFilterFactory"/>
>>>   </analyzer>
>>> </fieldType>
>> 
>> 


Re: Facet count incorrect

Posted by John Davis <jo...@gmail.com>.
Reindexing to alias is not always easy if it requires 2x resources. Just to
be clear the issues you mentioned are mostly around faceting because we
haven't seen any other search/retrieval issues. Or is that not accurate?

On Wed, May 22, 2019 at 5:12 PM Erick Erickson <er...@gmail.com>
wrote:

> 1> I strongly recommend you re-index into a new collection and switch to
> it with a collection alias rather than try to re-index all the docs.
> Segment merging with the same field with dissimilar definitions is not
> guaranteed to do the right thing.
>
> 2> No. There a few (very few) things that don’t require starting fresh.
> You can do some things like add a lowercasefilter, add or remove a field
> totally and the like. Even then you’ll go through a period of mixed-up
> results until the reindex is complete. But changing the type, changing from
> multiValued to singleValued or vice versa (particularly with docValues)
> etc. are all “fraught”.
>
> My usual reply is “if you’re going to reindex everything anyway, why not
> just do it to a new collection and alias when you’re done?” It’s much safer.
>
> Best,
> Erick
>
> > On May 22, 2019, at 3:06 PM, John Davis <jo...@gmail.com>
> wrote:
> >
> > Hi there -
> > Our facet counts are incorrect for a particular field and I suspect it is
> > because we changed the type of the field from StrField to TextField. Two
> > questions:
> >
> > 1. If we do re-index all the documents in the index, would these counts
> get
> > fixed?
> > 2. Is there a "safe" way of changing field types that generally works?
> >
> > *Old type:*
> >  <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
> > docValues="true" multiValued="true"/>
> >
> > *New type:*
> >  <fieldType name="lowercase_notokenize" class="solr.TextField"
> > omitNorms="true" omitTermFreqAndPositions="true" indexed="true"
> > stored="true" positionIncrementGap="100" sortMissingLast="true"
> > multiValued="true">
> > <analyzer>
> >      <tokenizer class="solr.KeywordTokenizerFactory"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >    </analyzer>
> >  </fieldType>
>
>

Re: Facet count incorrect

Posted by Erick Erickson <er...@gmail.com>.
1> I strongly recommend you re-index into a new collection and switch to it with a collection alias rather than try to re-index all the docs. Segment merging with the same field with dissimilar definitions is not guaranteed to do the right thing.

2> No. There a few (very few) things that don’t require starting fresh. You can do some things like add a lowercasefilter, add or remove a field totally and the like. Even then you’ll go through a period of mixed-up results until the reindex is complete. But changing the type, changing from multiValued to singleValued or vice versa (particularly with docValues) etc. are all “fraught”.

My usual reply is “if you’re going to reindex everything anyway, why not just do it to a new collection and alias when you’re done?” It’s much safer.

Best,
Erick

> On May 22, 2019, at 3:06 PM, John Davis <jo...@gmail.com> wrote:
> 
> Hi there -
> Our facet counts are incorrect for a particular field and I suspect it is
> because we changed the type of the field from StrField to TextField. Two
> questions:
> 
> 1. If we do re-index all the documents in the index, would these counts get
> fixed?
> 2. Is there a "safe" way of changing field types that generally works?
> 
> *Old type:*
>  <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
> docValues="true" multiValued="true"/>
> 
> *New type:*
>  <fieldType name="lowercase_notokenize" class="solr.TextField"
> omitNorms="true" omitTermFreqAndPositions="true" indexed="true"
> stored="true" positionIncrementGap="100" sortMissingLast="true"
> multiValued="true">
> <analyzer>
>      <tokenizer class="solr.KeywordTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>    </analyzer>
>  </fieldType>