You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2020/10/20 12:51:22 UTC

Re: [EXT: NEWSLETTER] SolrDocument difference between String and text_general

Owen:

Collection reload is necessary but not sufficient. You’ll still get wonky results even if you re-index everything unless you delete _all_ the documents first or start with a whole new collection. Each Lucene index is a “mini index” with its own picture of the structure of that index (i.e. the schema in force when it was created). If you have segments created with the old schema and other segments with the new schema, when they get merged the result is undefined. It may not blow up, but it also won't do what you want.

Take your change from text to string type and the title “my dog has fleas”. In the segment with the field defined as a Text type, you’ll be able to search for “dog” and get the doc. Similarly for Dog (assuming you have lowercasing in your analysis chain). “has fleas” would hit, as would “dog fleas”~2. 

For the segment defined with String, you will only get a hit if you search for “my dog has fleas”. You wouldn’t find the doc if you searched for any of the following:
- my AND dog AND has AND fleas
- “My dog has fleas”
- fleas
- “dog has fleas my"

When those segments are merged, Lucene doesn’t have the information to “do the right thing”, and even if it did the cost would be prohibitive because it’d be like re-indexing all the docs in one segment or the other.

You cannot spoof this by simply reindexing the corpus over top of an existing index since that’ll involve a bunch of segment merges.

You’re seeing consistent results here because you started with a _new_ collection that had no old segments lying around.

Best,
Erick

> On Oct 20, 2020, at 4:37 AM, Cox, Owen <oc...@deloitte.co.uk> wrote:
> 
> Hi Konstantinos, I think you're onto something there.  I don't think the collection was reloaded, I've just tried the same code against a different collection that uses the same configset; only difference being this collection was created after the schema changes.  That works, so it must've been the reload that was missing.
> 
> Thanks!
> 
> Owen Cox
> Senior Consultant | Deloitte MCS Limited
> D: +44 20 7007 1657
> ocox@deloitte.co.uk | www.deloitte.co.uk
> 
> 
> -----Original Message-----
> From: Konstantinos Koukouvis <ko...@mecenat.com>
> Sent: 20 October 2020 09:04
> To: solr-user@lucene.apache.org
> Subject: [EXT: NEWSLETTER] Re: SolrDocument difference between String and text_general
> 
> Hi Owen,
> 
> If I understand correctly you have changed the schema, then reloaded the core and reindexed all data right? Cause whenever I got this error I’ve usually forgotten to do one of those two things…
> 
> Regards,
> Konstantinos
> 
>> On 20 Oct 2020, at 09:53, Cox, Owen <oc...@deloitte.co.uk> wrote:
>> 
>> Hi folks,
>> 
>> I'm using Solr 8.5.2 and populating documents which include a string field called "title".  This field used to be text_general, but the data was reindexed and we've been inserting data happily with REST calls and it's been behaving as desired.
>> 
>> I've now written a Java Spring-Boot program to populate documents (snippet below) using SolrCrudRepository.  This works when I don't index the "title" field, but when I try include title I get the following error "cannot change field "title" from index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index options=DOCS"
>> 
>> To me that looks like it's trying to index the title as text_general and store it in a string field.  But the Solr schema states that field is string, all of the data in it is string, and any other string field in the document which is string is indexed correctly.
>> 
>> Could there be any hanging reference to the field's type anywhere?  Or some requirement that a field named "title" is always text_general or something odd like that?
>> 
>> Any help appreciated, thanks
>> Owen
>> 
>> 
>> 
>> @Data
>> @SolrDocument(collection="mycollection")
>> public class Node {
>> 
>>   @Id
>>   @Field
>>   private String id;
>> 
>> 
>>   @Field
>>   private String title;
>> 
>> 
>> 
>> 
>> IMPORTANT NOTICE
>> 
>> This communication is from Deloitte LLP, a limited liability partnership registered in England and Wales with registered number OC303675. Its registered office is 1 New Street Square, London EC4A 3HQ, United Kingdom. Deloitte LLP is the United Kingdom affiliate of Deloitte NSE LLP, a member firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee ("DTTL"). DTTL and each of its member firms are legally separate and independent entities. DTTL and Deloitte NSE LLP do not provide services to clients. Please see www.deloitte.co.uk/about<https://www.deloitte.co.uk/about> to learn more about our global network of member firms. For details of our professional regulation please see Regulators<https://www2.deloitte.com/uk/en/footerlinks1/regulators-and-provision-service-regulations.html>.
>> 
>> This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), please notify it.security.uk@deloitte.co.uk<ma...@deloitte.co.uk> and destroy this message immediately. Email communications cannot be guaranteed to be secure or free from error or viruses. All emails sent to or from a @deloitte.co.uk email account are securely archived and stored by an external supplier within the European Union.
>> 
>> You can understand more about how we collect and use (process) your personal information in our Privacy Notice<https://www2.deloitte.com/uk/en/legal/privacy.html>.
>> 
>> Deloitte LLP does not accept any liability for use of or reliance on the contents of this email by any person save by the intended recipient(s) to the extent agreed in a Deloitte LLP engagement contract.
>> 
>> Opinions, conclusions and other information in this email which have not been delivered by way of the business of Deloitte LLP are neither given nor endorsed by it.
> 
> ==================================================
> Konstantinos Koukouvis
> konstantinos.koukouvis@mecenat.com
> 
> Using Golang and Solr? Try this: https://github.com/mecenat/solr
> 
> 
> 
> 
> 
> IMPORTANT NOTICE
> 
> This communication is from Deloitte LLP, a limited liability partnership registered in England and Wales with registered number OC303675. Its registered office is 1 New Street Square, London EC4A 3HQ, United Kingdom. Deloitte LLP is the United Kingdom affiliate of Deloitte NSE LLP, a member firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”). DTTL and each of its member firms are legally separate and independent entities. DTTL and Deloitte NSE LLP do not provide services to clients. Please see www.deloitte.co.uk/about<https://www.deloitte.co.uk/about> to learn more about our global network of member firms. For details of our professional regulation please see Regulators<https://www2.deloitte.com/uk/en/footerlinks1/regulators-and-provision-service-regulations.html>.
> 
> This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), please notify it.security.uk@deloitte.co.uk<ma...@deloitte.co.uk> and destroy this message immediately. Email communications cannot be guaranteed to be secure or free from error or viruses. All emails sent to or from a @deloitte.co.uk email account are securely archived and stored by an external supplier within the European Union.
> 
> You can understand more about how we collect and use (process) your personal information in our Privacy Notice<https://www2.deloitte.com/uk/en/legal/privacy.html>.
> 
> Deloitte LLP does not accept any liability for use of or reliance on the contents of this email by any person save by the intended recipient(s) to the extent agreed in a Deloitte LLP engagement contract.
> 
> Opinions, conclusions and other information in this email which have not been delivered by way of the business of Deloitte LLP are neither given nor endorsed by it.


Re: [EXT: NEWSLETTER] SolrDocument difference between String and text_general

Posted by Konstantinos Koukouvis <ko...@mecenat.com>.
Reindexing has to be done either by starting from scratch or by deleting all documents and then re-inserting them. Right?
https://lucene.apache.org/solr/guide/8_0/reindexing.html <https://lucene.apache.org/solr/guide/8_0/reindexing.html>

Regards,
Konstantinos

> On 20 Oct 2020, at 14:51, Erick Erickson <er...@gmail.com> wrote:
> 
> You’re seeing consistent results here because you started with a _new_ collection that had no old segments lying around.

==================================================
Konstantinos Koukouvis
konstantinos.koukouvis@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr