You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2022/10/12 11:54:12 UTC
disabling schemaless mode - SolrCloud 8.11.2
Hi all,
TL;DR
Does anyone know what's the responsibility of
DistributedUpdateProcessorFactory?
This is what I found in the source code:
// NOT mt-safe... create a new processor for each add thread
// TODO: we really should not wait for distrib after local? unless a
certain replication factor is
// asked for
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
---
Long version
Today I realized that in SolrCloud when disabling the schemaless mode, we
are going to disable also:
- LogUpdateProcessorFactory,
- DistributedUpdateProcessorFactory,
- RunUpdateProcessorFactory
Please look at this snipped taken from solrconfig.xml default configuration:
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"
default="${update.autoCreateFields:false}"
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
Those processors don't seem to be related to the schemaless mode, in
particular DistributedUpdateProcessorFactory.
So I'm curious to understand what it does, and why it has to be activated
when schemaless mode is on.
Best regards,
Vincenzo
--
Vincenzo D'Amore
Re: disabling schemaless mode - SolrCloud 8.11.2
Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Alex,
Also, when you disable schemaless, make sure you don't have implicit
> conversions happening that the chain also provides. E. G. Date as text gets
> converted to real date.
On second thought, thanks a lot, this is a very helpful suggestion, I
didn't realize immediately this side effect.
Disabling it can be dangerous.
--
Vincenzo D'Amore
Re: disabling schemaless mode - SolrCloud 8.11.2
Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Those 3 are part of default chain. So if you stop using schemaless one,
you get them anyway.
https://solr.apache.org/guide/solr/latest/configuration-guide/update-request-processors.html#default-update-request-processor-chain
If you define a different custom chain, I think you need to add them
manually though.
Also, when you disable schemaless, make sure you don't have implicit
conversions happening that the chain also provides. E. G. Date as text gets
converted to real date. If you do, you may want to remove just schema
generation part or create a custom chain with only bits you need.
Regards,
Alex
On Wed., Oct. 12, 2022, 7:55 a.m. Vincenzo D'Amore, <v....@gmail.com>
wrote:
> Hi all,
>
> TL;DR
>
> Does anyone know what's the responsibility of
> DistributedUpdateProcessorFactory?
> This is what I found in the source code:
>
> // NOT mt-safe... create a new processor for each add thread
> // TODO: we really should not wait for distrib after local? unless a
> certain replication factor is
> // asked for
>
>
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
>
> ---
> Long version
>
> Today I realized that in SolrCloud when disabling the schemaless mode, we
> are going to disable also:
>
> - LogUpdateProcessorFactory,
> - DistributedUpdateProcessorFactory,
> - RunUpdateProcessorFactory
>
> Please look at this snipped taken from solrconfig.xml default
> configuration:
>
> <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"
> default="${update.autoCreateFields:false}"
>
>
> processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
> <processor class="solr.LogUpdateProcessorFactory"/>
> <processor class="solr.DistributedUpdateProcessorFactory"/>
> <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
>
> Those processors don't seem to be related to the schemaless mode, in
> particular DistributedUpdateProcessorFactory.
>
> So I'm curious to understand what it does, and why it has to be activated
> when schemaless mode is on.
>
> Best regards,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
>
Re: disabling schemaless mode - SolrCloud 8.11.2
Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Shawn, Alex,
thanks for the update.
In the first instance I have disabled entirely the
updateRequestProcessorChain "add-unknown-fields-to-the-schema".
I can confirm that since then Solr continues to handle things correctly
under the hood.
I agree with you, configuration should be as explicit and transparent as
possible.
it seems that LogUpdateProcessorFactory and RunUpdateProcessorFactory are
still there, somehow.
I'm only afraid of what I'm losing by not having
DistributedUpdateProcessorFactory configured explicitly, is even this
component present by default?
Where the default processors are configured?
On Wed, Oct 12, 2022 at 7:55 PM Shawn Heisey <ap...@elyograg.org> wrote:
> On 10/12/22 05:54, Vincenzo D'Amore wrote:
> > Does anyone know what's the responsibility of
> > DistributedUpdateProcessorFactory?
>
> That update processor takes care of atomic update functionality and
> farming out update requests to other Solr nodes, in particular for
> SolrCloud. It probably has some other functionalities, but I haven't
> looked at its code recently.
>
> > Today I realized that in SolrCloud when disabling the schemaless mode, we
> > are going to disable also:
> >
> > - LogUpdateProcessorFactory,
> > - DistributedUpdateProcessorFactory,
> > - RunUpdateProcessorFactory
>
> The LogUpdate processor logs the request, usually that goes to
> solr.log. The RunUpdate processor actually does the update. I believe
> that if any of those 3 processors are removed from a definition, they
> are still automatically performed by Solr. It is better to leave them
> in the definition so that anyone looking at it knows exactly what is
> being done and doesn't need to know about Solr's internal implicit
> handling. If you are entirely removing all processor chains, then don't
> worry about it. Solr will handle things correctly.
>
> > Those processors don't seem to be related to the schemaless mode, in
> > particular DistributedUpdateProcessorFactory.
>
> You are correct, they are not related to schemaless mode, but they are
> important.
>
> Thanks,
> Shawn
>
>
--
Vincenzo D'Amore
Re: disabling schemaless mode - SolrCloud 8.11.2
Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/12/22 05:54, Vincenzo D'Amore wrote:
> Does anyone know what's the responsibility of
> DistributedUpdateProcessorFactory?
That update processor takes care of atomic update functionality and
farming out update requests to other Solr nodes, in particular for
SolrCloud. It probably has some other functionalities, but I haven't
looked at its code recently.
> Today I realized that in SolrCloud when disabling the schemaless mode, we
> are going to disable also:
>
> - LogUpdateProcessorFactory,
> - DistributedUpdateProcessorFactory,
> - RunUpdateProcessorFactory
The LogUpdate processor logs the request, usually that goes to
solr.log. The RunUpdate processor actually does the update. I believe
that if any of those 3 processors are removed from a definition, they
are still automatically performed by Solr. It is better to leave them
in the definition so that anyone looking at it knows exactly what is
being done and doesn't need to know about Solr's internal implicit
handling. If you are entirely removing all processor chains, then don't
worry about it. Solr will handle things correctly.
> Those processors don't seem to be related to the schemaless mode, in
> particular DistributedUpdateProcessorFactory.
You are correct, they are not related to schemaless mode, but they are
important.
Thanks,
Shawn