You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by gnandre <ar...@gmail.com> on 2021/03/19 21:36:32 UTC

Solr complains about unknown field during atomic indexing

While performing  atomic indexing, I run into an error which says 'unknown
field X' where X is not a field specified in the schema. It is a
discontinued field. After deleting that field from the schema, I have
restarted Solr but I have not re-indexed the content back, so the deleted
field data still might be there in Solr index.

The way I understand how atomic indexing works, it tries to index all
stored values again, but why is it trying to index stored value of a field
that does not exist in the schema?

Re: Solr complains about unknown field during atomic indexing

Posted by Andreas Hubold <an...@coremedia.com>.

Hi,

> You could then add the following to take care of any and all unknown 
> fields:
>
> <dynamicField name="*" type="ignored" multiValued="true" />
>
> Or you could name individual fields like that, which I think would be 
> a better option than the wildcard dynamic field. 

Just a small addition, in case you're also using nested documents: You 
should really prefer individual field names instead of the "*" wildcard 
then.
Otherwise you may run into the following bug, which causes nested child 
documents to disappear: https://issues.apache.org/jira/browse/SOLR-15018

Cheers,
Andreas

Shawn Heisey wrote on 20.03.21 11:52:
> On 3/19/2021 3:36 PM, gnandre wrote:
>> While performing  atomic indexing, I run into an error which says 
>> 'unknown
>> field X' where X is not a field specified in the schema. It is a
>> discontinued field. After deleting that field from the schema, I have
>> restarted Solr but I have not re-indexed the content back, so the 
>> deleted
>> field data still might be there in Solr index.
>>
>> The way I understand how atomic indexing works, it tries to index all
>> stored values again, but why is it trying to index stored value of a 
>> field
>> that does not exist in the schema?
>
>
> Solr's Atomic Update feature works by grabbing the existing document, 
> all of it, performing the atomic update instructions on that document, 
> and then indexing the results as a new document. If the uniqueKey 
> feature is enabled (which would be required for Atomic Updates to work 
> properly), the old document is deleted as the new document is added.  
> I haven't looked at the code, but the existing fields are likely added 
> to the document that is being built all at once and without consulting 
> the schema.  So if field X is in the document that's already in the 
> index, it will be in the new document too.  If X is deleted from the 
> schema, you'll get the error you're getting.
>
> It would be a fair amount of work to have Solr take the schema into 
> account for atomic updates.  Not impossible, just slightly 
> time-consuming.  I think we (the Solr developers) would want it to 
> still fail indexing in this situation, the failure would just happen 
> at a different place in the code than it does now, during atomic 
> document assembly.  Fail earlier and faster.
>
> What you'll need to for your circumstances is leave X in the schema, 
> but change it to a type that will be completely ignored on indexing.
>
> Something like this:
>
> <fieldType
>   name="ignored"
>   indexed="false"
>   stored="false"
>   docValues="false"
>   multiValued="true"
>   class="solr.StrField" />
>
> You could then add the following to take care of any and all unknown 
> fields:
>
> <dynamicField name="*" type="ignored" multiValued="true" />
>
> Or you could name individual fields like that, which I think would be 
> a better option than the wildcard dynamic field.
>
> My source for the config snippets: 
> https://stackoverflow.com/questions/46509259/solr-7-managed-schema-how-to-ignore-unnamed-fields
>
> Thanks,
> Shawn
> .

Re: Solr complains about unknown field during atomic indexing

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/19/2021 3:36 PM, gnandre wrote:
> While performing  atomic indexing, I run into an error which says 'unknown
> field X' where X is not a field specified in the schema. It is a
> discontinued field. After deleting that field from the schema, I have
> restarted Solr but I have not re-indexed the content back, so the deleted
> field data still might be there in Solr index.
> 
> The way I understand how atomic indexing works, it tries to index all
> stored values again, but why is it trying to index stored value of a field
> that does not exist in the schema?

Solr's Atomic Update feature works by grabbing the existing document, 
all of it, performing the atomic update instructions on that document, 
and then indexing the results as a new document.  If the uniqueKey 
feature is enabled (which would be required for Atomic Updates to work 
properly), the old document is deleted as the new document is added.  I 
haven't looked at the code, but the existing fields are likely added to 
the document that is being built all at once and without consulting the 
schema.  So if field X is in the document that's already in the index, 
it will be in the new document too.  If X is deleted from the schema, 
you'll get the error you're getting.

It would be a fair amount of work to have Solr take the schema into 
account for atomic updates.  Not impossible, just slightly 
time-consuming.  I think we (the Solr developers) would want it to still 
fail indexing in this situation, the failure would just happen at a 
different place in the code than it does now, during atomic document 
assembly.  Fail earlier and faster.

What you'll need to for your circumstances is leave X in the schema, but 
change it to a type that will be completely ignored on indexing.

Something like this:

<fieldType
   name="ignored"
   indexed="false"
   stored="false"
   docValues="false"
   multiValued="true"
   class="solr.StrField" />

You could then add the following to take care of any and all unknown fields:

<dynamicField name="*" type="ignored" multiValued="true" />

Or you could name individual fields like that, which I think would be a 
better option than the wildcard dynamic field.

My source for the config snippets: 
https://stackoverflow.com/questions/46509259/solr-7-managed-schema-how-to-ignore-unnamed-fields

Thanks,
Shawn