You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pratik Thaker <Pr...@smartstreamrdu.com> on 2017/02/07 11:41:54 UTC

DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

Hi All,

I am using SOLR Cloud 6.0

I am receiving below exception very frequently in solr logs,

o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:63)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1091)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:714)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
        at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)

Can you please help me with the root cause ? Below is the snapshot of solrconfig,

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
    <!-- UUIDUpdateProcessorFactory will generate an id if none is present in the incoming document -->
    <processor class="solr.UUIDUpdateProcessorFactory" />

    <processor class="solr.LogUpdateProcessorFactory"/>
   <processor class="solr.DistributedUpdateProcessorFactory"/>
    <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
    <processor class="solr.FieldNameMutatingUpdateProcessorFactory">
      <str name="pattern">[^\w-\.]</str>
      <str name="replacement">_</str>
    </processor>
    <processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseLongFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseDateFieldUpdateProcessorFactory">
      <arr name="format">
        <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
        <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
        <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
        <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
        <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
        <str>yyyy-MM-dd'T'HH:mm:ss</str>
        <str>yyyy-MM-dd'T'HH:mmZ</str>
        <str>yyyy-MM-dd'T'HH:mm</str>
        <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
        <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
        <str>yyyy-MM-dd HH:mm:ss.SSS</str>
        <str>yyyy-MM-dd HH:mm:ss,SSS</str>
        <str>yyyy-MM-dd HH:mm:ssZ</str>
        <str>yyyy-MM-dd HH:mm:ss</str>
        <str>yyyy-MM-dd HH:mmZ</str>
        <str>yyyy-MM-dd HH:mm</str>
        <str>yyyy-MM-dd</str>
      </arr>
    </processor>
    <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
      <str name="defaultFieldType">strings</str>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Boolean</str>
        <str name="fieldType">booleans</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.util.Date</str>
        <str name="fieldType">tdates</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Long</str>
        <str name="valueClass">java.lang.Integer</str>
        <str name="fieldType">tlongs</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Number</str>
        <str name="fieldType">tdoubles</str>
      </lst>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>

Regards,
Pratik Thaker

________________________________
The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

Posted by Pratik Thaker <Pr...@smartstreamrdu.com>.
Hi Alessandro,

Can you please suggest what should be the correct order of adding processors ?

I am having 5 collections, 6 shards, replication factor 2, 3 nodes on 3 separate VMs.

Regards,
Pratik Thaker

-----Original Message-----
From: alessandro.benedetti [mailto:a.benedetti@sease.io]
Sent: 21 April 2017 13:38
To: solr-user@lucene.apache.org
Subject: RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

Let's make a quick differentiation between PRE and POST processors in a Solr Cloud atchitecture :

 "In a single node, stand-alone Solr, each update is run through all the update processors in a chain exactly once. But the behavior of update request processors in SolrCloud deserves special consideration. " cit. wiki

*PRE PROCESSORS*
All the processors defined BEFORE the distributedUpdateProcessor happen ONLY on the first node that receive the update ( regardless if it is a leader or a replica ).

*POST PROCESSORS*
The distributedUpdateProcessor will forward the update request to the the correct leader ( or multiple leaders if the request involves more shards), the leader will then forward to the replicas.
The leaders and replicas at this point will execute all the update request processors defined AFTER the distributedUpdateProcessor.

" Pre-processors and Atomic Updates
Because DistributedUpdateProcessor is responsible for processing Atomic Updates into full documents on the leader node, this means that pre-processors which are executed only on the forwarding nodes can only operate on the partial document. If you have a processor which must process a full document then the only choice is to specify it as a post-processor."
wiki

In your example, your chain is definitely messed up, the order is important and you want your heavy processing to happen only on the first node.

For better info and clarification:
https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ( you can find here a working alternative to your chain) https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/DistributedUpdateProcessorFactory-was-explicitly-disabled-from-this-updateRequestProcessorChain-tp4319154p4331215.html
Sent from the Solr - User mailing list archive at Nabble.com.
________________________________
 The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.

RE: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

Posted by "alessandro.benedetti" <a....@sease.io>.
Let's make a quick differentiation between PRE and POST processors in a Solr
Cloud atchitecture :

 "In a single node, stand-alone Solr, each update is run through all the
update processors in a chain exactly once. But the behavior of update
request processors in SolrCloud deserves special consideration. " cit. wiki

*PRE PROCESSORS*
All the processors defined BEFORE the distributedUpdateProcessor happen ONLY
on the first node that receive the update ( regardless if it is a leader or
a replica ).

*POST PROCESSORS*
The distributedUpdateProcessor will forward the update request to the the
correct leader ( or multiple leaders if the request involves more shards),
the leader will then forward to the replicas.
The leaders and replicas at this point will execute all the update request
processors defined AFTER the distributedUpdateProcessor.

" Pre-processors and Atomic Updates
Because DistributedUpdateProcessor is responsible for processing Atomic
Updates into full documents on the leader node, this means that
pre-processors which are executed only on the forwarding nodes can only
operate on the partial document. If you have a processor which must process
a full document then the only choice is to specify it as a post-processor."
wiki

In your example, your chain is definitely messed up, the order is important
and you want your heavy processing to happen only on the first node.

For better info and clarification:
https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode ( you can
find here a working alternative to your chain)
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/DistributedUpdateProcessorFactory-was-explicitly-disabled-from-this-updateRequestProcessorChain-tp4319154p4331215.html
Sent from the Solr - User mailing list archive at Nabble.com.