You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2022/10/12 11:54:12 UTC

disabling schemaless mode - SolrCloud 8.11.2

Hi all,

TL;DR

Does anyone know what's the responsibility of
DistributedUpdateProcessorFactory?
This is what I found in the source code:

// NOT mt-safe... create a new processor for each add thread
// TODO: we really should not wait for distrib after local? unless a
certain replication factor is
// asked for

https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

---
Long version

Today I realized that in SolrCloud when disabling the schemaless mode, we
are going to disable also:

- LogUpdateProcessorFactory,
- DistributedUpdateProcessorFactory,
- RunUpdateProcessorFactory

Please look at this snipped taken from solrconfig.xml default configuration:

  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"
default="${update.autoCreateFields:false}"

 processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.DistributedUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>

Those processors don't seem to be related to the schemaless mode, in
particular DistributedUpdateProcessorFactory.

So I'm curious to understand what it does, and why it has to be activated
when schemaless mode is on.

Best regards,
Vincenzo


-- 
Vincenzo D'Amore

Re: disabling schemaless mode - SolrCloud 8.11.2

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Alex,

Also, when you disable schemaless, make sure you don't have implicit
> conversions happening that the chain also provides. E. G. Date as text gets
> converted to real date.


On second thought, thanks a lot, this is a very helpful suggestion, I
didn't realize immediately this side effect.
Disabling it can be dangerous.


-- 
Vincenzo D'Amore

Re: disabling schemaless mode - SolrCloud 8.11.2

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Those 3 are part of default  chain. So if you stop using schemaless one,
you get them anyway.

https://solr.apache.org/guide/solr/latest/configuration-guide/update-request-processors.html#default-update-request-processor-chain

If you define a different custom chain, I think you need to add them
manually though.

Also, when you disable schemaless, make sure you don't have implicit
conversions happening that the chain also provides. E. G. Date as text gets
converted to real date. If you do, you may want to remove just schema
generation part or create a custom chain with only bits you need.

Regards,
    Alex

On Wed., Oct. 12, 2022, 7:55 a.m. Vincenzo D'Amore, <v....@gmail.com>
wrote:

> Hi all,
>
> TL;DR
>
> Does anyone know what's the responsibility of
> DistributedUpdateProcessorFactory?
> This is what I found in the source code:
>
> // NOT mt-safe... create a new processor for each add thread
> // TODO: we really should not wait for distrib after local? unless a
> certain replication factor is
> // asked for
>
>
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
>
> ---
> Long version
>
> Today I realized that in SolrCloud when disabling the schemaless mode, we
> are going to disable also:
>
> - LogUpdateProcessorFactory,
> - DistributedUpdateProcessorFactory,
> - RunUpdateProcessorFactory
>
> Please look at this snipped taken from solrconfig.xml default
> configuration:
>
>   <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"
> default="${update.autoCreateFields:false}"
>
>
>  processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
>     <processor class="solr.LogUpdateProcessorFactory"/>
>     <processor class="solr.DistributedUpdateProcessorFactory"/>
>     <processor class="solr.RunUpdateProcessorFactory"/>
>   </updateRequestProcessorChain>
>
> Those processors don't seem to be related to the schemaless mode, in
> particular DistributedUpdateProcessorFactory.
>
> So I'm curious to understand what it does, and why it has to be activated
> when schemaless mode is on.
>
> Best regards,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
>

Re: disabling schemaless mode - SolrCloud 8.11.2

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Shawn, Alex,

thanks for the update.
In the first instance I have disabled entirely the
updateRequestProcessorChain "add-unknown-fields-to-the-schema".
I can confirm that since then Solr continues to handle things correctly
under the hood.
I agree with you, configuration should be as explicit and transparent as
possible.
it seems that LogUpdateProcessorFactory and RunUpdateProcessorFactory are
still there, somehow.
I'm only afraid of what I'm losing by not having
DistributedUpdateProcessorFactory configured explicitly, is even this
component present by default?
Where the default processors are configured?

On Wed, Oct 12, 2022 at 7:55 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/12/22 05:54, Vincenzo D'Amore wrote:
> > Does anyone know what's the responsibility of
> > DistributedUpdateProcessorFactory?
>
> That update processor takes care of atomic update functionality and
> farming out update requests to other Solr nodes, in particular for
> SolrCloud.  It probably has some other functionalities, but I haven't
> looked at its code recently.
>
> > Today I realized that in SolrCloud when disabling the schemaless mode, we
> > are going to disable also:
> >
> > - LogUpdateProcessorFactory,
> > - DistributedUpdateProcessorFactory,
> > - RunUpdateProcessorFactory
>
> The LogUpdate processor logs the request, usually that goes to
> solr.log.  The RunUpdate processor actually does the update.   I believe
> that if any of those 3 processors are removed from a definition, they
> are still automatically performed by Solr.  It is better to leave them
> in the definition so that anyone looking at it knows exactly what is
> being done and doesn't need to know about Solr's internal implicit
> handling.  If you are entirely removing all processor chains, then don't
> worry about it.  Solr will handle things correctly.
>
> > Those processors don't seem to be related to the schemaless mode, in
> > particular DistributedUpdateProcessorFactory.
>
> You are correct, they are not related to schemaless mode, but they are
> important.
>
> Thanks,
> Shawn
>
>

-- 
Vincenzo D'Amore

Re: disabling schemaless mode - SolrCloud 8.11.2

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/12/22 05:54, Vincenzo D'Amore wrote:
> Does anyone know what's the responsibility of
> DistributedUpdateProcessorFactory?

That update processor takes care of atomic update functionality and 
farming out update requests to other Solr nodes, in particular for 
SolrCloud.  It probably has some other functionalities, but I haven't 
looked at its code recently.

> Today I realized that in SolrCloud when disabling the schemaless mode, we
> are going to disable also:
>
> - LogUpdateProcessorFactory,
> - DistributedUpdateProcessorFactory,
> - RunUpdateProcessorFactory

The LogUpdate processor logs the request, usually that goes to 
solr.log.  The RunUpdate processor actually does the update.   I believe 
that if any of those 3 processors are removed from a definition, they 
are still automatically performed by Solr.  It is better to leave them 
in the definition so that anyone looking at it knows exactly what is 
being done and doesn't need to know about Solr's internal implicit 
handling.  If you are entirely removing all processor chains, then don't 
worry about it.  Solr will handle things correctly.

> Those processors don't seem to be related to the schemaless mode, in
> particular DistributedUpdateProcessorFactory.

You are correct, they are not related to schemaless mode, but they are 
important.

Thanks,
Shawn