You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by yo tomi <yo...@gmail.com> on 2020/07/17 07:32:18 UTC

AtomicUpdate on SolrCloud is not working

Hi, All
When I did AtomicUpdate on SolrCloud by the following setting, it does
not work properly.

---
<updateRequestProcessorChain name="skip-empty">
 <processor class="solr.DistributedUpdateProcessorFactory"/>
 <processor class="TrimFieldUpdateProcessorFactory" />
 <processor class="RemoveBlankFieldUpdateProcessorFactory" />
 <processor class="solr.LogUpdateProcessorFactory" />
 <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---
When changed as follows and made it work, it became as expected.
---
<updateRequestProcessorChain name="skip-empty">
 <processor class="TrimFieldUpdateProcessorFactory" />
 <processor class="RemoveBlankFieldUpdateProcessorFactory" />
 <processor class="solr.LogUpdateProcessorFactory" />
 <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---
The later setting and the way of using post-processor could make the
same result, I though,
but using post-processor, bug of SOLR-8030 makes me not feel like using it.
By the latter setting even, is there any possibility of SOLR-8030 to
become? Seeing the source code, tlog which is from leader comes to
Replica seems to be processed correctly with UpdateRequestProcessor,
the latter setting had not been the right one for the bug, I
though.Anyone knows the most appropriate way to configure AtomicUpdate
on SolrCloud?

Thanks,
Yoshiaki

Re: AtomicUpdate on SolrCloud is not working

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/19/2020 1:37 AM, yo tomi wrote:
> I have no choice but use post-processor.
> However bug of SOLR-8030 makes me not feel like using it.

Can you explain why you need the trim field and remove blank field 
processors to be post processors?  When I think about these 
functionalities, they should work fully as expected even when executed 
as "pre" processors.

Thanks,
Shawn

Re: AtomicUpdate on SolrCloud is not working

Posted by yo tomi <yo...@gmail.com>.
Hi Jörn & shown
"does not work properly" means pre-processors
(TrimFieldUpdateProcessorFactory and
RemoveBlankFieldUpdateProcessorFactory) don't trim and remove blank for
string fields.

example:

When the following schema:
---
  <field name="id" type="string" multiValued="false" indexed="true"
required="true" stored="true"/>
  <field name="title" type="string" uninvertible="false" indexed="true"
stored="true"/>
---

update following documents with "Documents" of solr admin:
---
{
    "id": "1",
    "title": {"set": " test "}
},
{
    "id": "2",
    "title": {set": ""}
}
---

Then the follows are indexed, when pre-processor:
---
{
    "id": "1",
    "title": " test "
},
{
    "id": "2",
    "title": ""
}
---

When post-processor:
---
{
    "id": "1",
    "title": "test"
},
{
    "id": "2"
}
---

I have no choice but use post-processor.
However bug of SOLR-8030 makes me not feel like using it.

By the way, version of solr is 8.4.

Best,
Yoshiaki

Re: AtomicUpdate on SolrCloud is not working

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/17/2020 1:32 AM, yo tomi wrote:
> When I did AtomicUpdate on SolrCloud by the following setting, it does
> not work properly.

As Jörn Franke already mentioned, you haven't said exactly what "does 
not work properly" actually means in your situation.  Without that 
information, it will be very difficult to provide any real help.

Atomic update functionality is currently implemented in 
DistributedUpdateProcessorFactory.

> ---
> <updateRequestProcessorChain name="skip-empty">
>   <processor class="solr.DistributedUpdateProcessorFactory"/>
>   <processor class="TrimFieldUpdateProcessorFactory" />
>   <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>   <processor class="solr.LogUpdateProcessorFactory" />
>   <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> ---
> When changed as follows and made it work, it became as expected.
> ---
> <updateRequestProcessorChain name="skip-empty">
>   <processor class="TrimFieldUpdateProcessorFactory" />
>   <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>   <processor class="solr.LogUpdateProcessorFactory" />
>   <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> ---

The effective result difference between these configurations is that 
atomic updates will happen first with the first config, and in the 
second, atomic updates will happen second to last -- just before 
RunUpdateProcessorFactory.

Also, with the first config, most of the update processors are going to 
be executed on the machine with the shard leader (after the update is 
distributed) and if there is more than one NRT replica, they will be 
executed multiple times.  With the second config, most of the processors 
will be executed on the machine that actually receives the update 
request.  For the purposes of that discussion, remember that when a PULL 
replica is elected leader, it is effectively an NRT replica.

Does that information help you determine why it doesn't do what you expect?

> The later setting and the way of using post-processor could make the
> same result, I though,
> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
> By the latter setting even, is there any possibility of SOLR-8030 to
> become?

See this part of the reference guide for a bunch of gory details about 
DistributedUpdateProcessorFactory:

https://cwiki.apache.org/confluence/display/SOLR/UpdateRequestProcessor#UpdateRequestProcessor-DistributedUpdates

In SOLR-8030, the general consensus among committers is that you should 
configure almost all update processors as "pre" processors -- placed 
before DistributedUpdatePorcessorFactory in the config.  When done this 
way, updates are usually faster and less likely to yield inconsistent 
results.

There may be situations where having them as "post" processors is 
correct, but that won't happen very often.  The second config above does 
implicitly use "pre" for most of the processors.

Thanks,
Shawn

Re: AtomicUpdate on SolrCloud is not working

Posted by Issei Nishigata <du...@gmail.com>.
I have the same problem in my Solr8.
I think it's because in the first way,
TrimFieldUpdateProcessorFactory and RemoveBlankFieldUpdateProcessorFactory
is not taking effect.

On SolrCloud, TrimFieldUpdateProcessorFactory,
RemoveBlankFieldUpdateProcessorFactory and other processors
only run on the first node that receives an update request.
Consequently, it's necessary to execute TrimFieldUpdateProcessorFactory and
RemoveBlankFieldUpdateProcessorFactory
after giving the document to the replica node using the
DistributedUpdateProcessor,
so we need to use the second way that he described otherwise it won't
operate properly.

But even with this way, both I and he are worried whether it will be cause
of SOLR-8030.
I also want to know about this, does anyone have any comment about this?


Best,
Issei

2020年7月17日(金) 18:34 Jörn Franke <jo...@gmail.com>:

> What does „not work correctly mean“?
>
> Have you checked that all fields are stored or doc values?
>
> > Am 17.07.2020 um 11:26 schrieb yo tomi <yo...@gmail.com>:
> >
> > Hi All
> >
> > Sorry, above settings are contrary with each other.
> > Actually, following setting does not work properly.
> > ---
> > <updateRequestProcessorChain name="skip-empty">
> > <processor class="TrimFieldUpdateProcessorFactory" />
> > <processor class="RemoveBlankFieldUpdateProcessorFactory" />
> > <processor class="solr.LogUpdateProcessorFactory" />
> > <processor class="solr.RunUpdateProcessorFactory" />
> > </updateRequestProcessorChain>
> > ---
> > And follows is working as expected.
> > ---
> > <updateRequestProcessorChain name="skip-empty">
> > <processor class="solr.DistributedUpdateProcessorFactory"/>
> > <processor class="TrimFieldUpdateProcessorFactory" />
> > <processor class="RemoveBlankFieldUpdateProcessorFactory" />
> > <processor class="solr.LogUpdateProcessorFactory" />
> > <processor class="solr.RunUpdateProcessorFactory" />
> > </updateRequestProcessorChain>
> > ---
> >
> > Thanks,
> > Yoshiaki
> >
> >
> > 2020年7月17日(金) 16:32 yo tomi <yo...@gmail.com>:
> >
> >> Hi, All
> >> When I did AtomicUpdate on SolrCloud by the following setting, it does
> not work properly.
> >>
> >> ---
> >> <updateRequestProcessorChain name="skip-empty">
> >> <processor class="solr.DistributedUpdateProcessorFactory"/>
> >> <processor class="TrimFieldUpdateProcessorFactory" />
> >> <processor class="RemoveBlankFieldUpdateProcessorFactory" />
> >> <processor class="solr.LogUpdateProcessorFactory" />
> >> <processor class="solr.RunUpdateProcessorFactory" />
> >> </updateRequestProcessorChain>
> >> ---
> >> When changed as follows and made it work, it became as expected.
> >> ---
> >> <updateRequestProcessorChain name="skip-empty">
> >> <processor class="TrimFieldUpdateProcessorFactory" />
> >> <processor class="RemoveBlankFieldUpdateProcessorFactory" />
> >> <processor class="solr.LogUpdateProcessorFactory" />
> >> <processor class="solr.RunUpdateProcessorFactory" />
> >> </updateRequestProcessorChain>
> >> ---
> >> The later setting and the way of using post-processor could make the
> same result, I though,
> >> but using post-processor, bug of SOLR-8030 makes me not feel like using
> it.
> >> By the latter setting even, is there any possibility of SOLR-8030 to
> become? Seeing the source code, tlog which is from leader comes to Replica
> seems to be processed correctly with UpdateRequestProcessor,
> >> the latter setting had not been the right one for the bug, I
> though.Anyone knows the most appropriate way to configure AtomicUpdate on
> SolrCloud?
> >>
> >> Thanks,
> >> Yoshiaki
> >>
> >>
>

Re: AtomicUpdate on SolrCloud is not working

Posted by Jörn Franke <jo...@gmail.com>.
What does „not work correctly mean“?

Have you checked that all fields are stored or doc values?

> Am 17.07.2020 um 11:26 schrieb yo tomi <yo...@gmail.com>:
> 
> Hi All
> 
> Sorry, above settings are contrary with each other.
> Actually, following setting does not work properly.
> ---
> <updateRequestProcessorChain name="skip-empty">
> <processor class="TrimFieldUpdateProcessorFactory" />
> <processor class="RemoveBlankFieldUpdateProcessorFactory" />
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> ---
> And follows is working as expected.
> ---
> <updateRequestProcessorChain name="skip-empty">
> <processor class="solr.DistributedUpdateProcessorFactory"/>
> <processor class="TrimFieldUpdateProcessorFactory" />
> <processor class="RemoveBlankFieldUpdateProcessorFactory" />
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> ---
> 
> Thanks,
> Yoshiaki
> 
> 
> 2020年7月17日(金) 16:32 yo tomi <yo...@gmail.com>:
> 
>> Hi, All
>> When I did AtomicUpdate on SolrCloud by the following setting, it does not work properly.
>> 
>> ---
>> <updateRequestProcessorChain name="skip-empty">
>> <processor class="solr.DistributedUpdateProcessorFactory"/>
>> <processor class="TrimFieldUpdateProcessorFactory" />
>> <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>> <processor class="solr.LogUpdateProcessorFactory" />
>> <processor class="solr.RunUpdateProcessorFactory" />
>> </updateRequestProcessorChain>
>> ---
>> When changed as follows and made it work, it became as expected.
>> ---
>> <updateRequestProcessorChain name="skip-empty">
>> <processor class="TrimFieldUpdateProcessorFactory" />
>> <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>> <processor class="solr.LogUpdateProcessorFactory" />
>> <processor class="solr.RunUpdateProcessorFactory" />
>> </updateRequestProcessorChain>
>> ---
>> The later setting and the way of using post-processor could make the same result, I though,
>> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
>> By the latter setting even, is there any possibility of SOLR-8030 to become? Seeing the source code, tlog which is from leader comes to Replica seems to be processed correctly with UpdateRequestProcessor,
>> the latter setting had not been the right one for the bug, I though.Anyone knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>> 
>> Thanks,
>> Yoshiaki
>> 
>> 

Re: AtomicUpdate on SolrCloud is not working

Posted by yo tomi <yo...@gmail.com>.
Hi All

Sorry, above settings are contrary with each other.
Actually, following setting does not work properly.
---
<updateRequestProcessorChain name="skip-empty">
 <processor class="TrimFieldUpdateProcessorFactory" />
 <processor class="RemoveBlankFieldUpdateProcessorFactory" />
 <processor class="solr.LogUpdateProcessorFactory" />
 <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---
And follows is working as expected.
---
<updateRequestProcessorChain name="skip-empty">
 <processor class="solr.DistributedUpdateProcessorFactory"/>
 <processor class="TrimFieldUpdateProcessorFactory" />
 <processor class="RemoveBlankFieldUpdateProcessorFactory" />
 <processor class="solr.LogUpdateProcessorFactory" />
 <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---

Thanks,
Yoshiaki


2020年7月17日(金) 16:32 yo tomi <yo...@gmail.com>:

> Hi, All
> When I did AtomicUpdate on SolrCloud by the following setting, it does not work properly.
>
> ---
> <updateRequestProcessorChain name="skip-empty">
>  <processor class="solr.DistributedUpdateProcessorFactory"/>
>  <processor class="TrimFieldUpdateProcessorFactory" />
>  <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>  <processor class="solr.LogUpdateProcessorFactory" />
>  <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> ---
> When changed as follows and made it work, it became as expected.
> ---
> <updateRequestProcessorChain name="skip-empty">
>  <processor class="TrimFieldUpdateProcessorFactory" />
>  <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>  <processor class="solr.LogUpdateProcessorFactory" />
>  <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> ---
> The later setting and the way of using post-processor could make the same result, I though,
> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
> By the latter setting even, is there any possibility of SOLR-8030 to become? Seeing the source code, tlog which is from leader comes to Replica seems to be processed correctly with UpdateRequestProcessor,
> the latter setting had not been the right one for the bug, I though.Anyone knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>
> Thanks,
> Yoshiaki
>
>