You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by lamine lamine <ya...@yahoo.fr.INVALID> on 2021/05/07 19:19:34 UTC

Solr: Get leaderness status of local node

Hi Solr people,
I am writing a custom UpdateProcessor, part of a custom plugin, and need to run some code only on the shard leader. This is a plugin, so I cannot access the DistributedUpdateProcessor.isLeader()  method which is not public.
For now I am copying-pasting the below code, but I am thinking there's got to be a better way to do it.
private boolean getIsLeader() {
    final boolean isZkAware = req.getCore().getCoreContainer().isZooKeeperAware();
    if (!isZkAware) {
        return getNonZkLeaderAssumption(req);
    }
    String shardId = cloudDesc.getShardId();
    try {
        Replica leaderReplica = zkController.getZkStateReader().getLeaderRetry(collection, shardId);
        return leaderReplica.getName().equals(cloudDesc.getCoreNodeName());
    }
    catch (InterruptedException e) {
        throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, “Error TODO", e);
    }
}
I think there is also another way to do it using cloudDesc.isLeader()  but my understanding, if I am not wrong, is that the first code gives the most accurate state. Am I right? Is this the only way to get the most accurate state about the current leader? 

Also, should I run it every time I need to check the current status as the leader can change anytime? What's the impact in terms of performance?

Thank you for your help in advance.
Lamine

Re: Solr: Get leaderness status of local node

Posted by David Smiley <ds...@apache.org>.

If the true requirement is merely to process the document once, then I
agree with Shawn's solution.  You needn't concern yourself with knowing who
the leader is.  If somehow it's important that it be guaranteed to execute
on the leader in particular, then take inspiration from some existing
URPs.  I'm thinking this:
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SkipExistingDocumentsProcessorFactory.java#L217
 (method isLeader).  Also note that this particular URP implements
RunAlways, thus its order can be before or after DURP as you please.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Sun, May 9, 2021 at 12:23 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/8/2021 8:05 PM, lamine lamine wrote:
> >   I only want the code be run once per shard. One way to guarantee that
> is to do it in the leader, as there is always one leader per shard. I don't
> want to run it in all the replicas. The code to run is "external" it
> doesn't touch any document.
>
> What I am saying is that when you define the processor chain, include
> DistributedUpdateProcessor in it.  It gets added by SolrCloud even if
> you don't include it, so it's better to have control over its placement.
>   And then place your update processor in the list *BEFORE*
> DistributedUpdateProcessor.
>
> This should accomplish your goals automatically, with no code required.
>
> Thanks,
> Shawn
>

Re: Solr: Get leaderness status of local node

Posted by Shawn Heisey <ap...@elyograg.org>.

On 5/8/2021 8:05 PM, lamine lamine wrote:
>   I only want the code be run once per shard. One way to guarantee that is to do it in the leader, as there is always one leader per shard. I don't want to run it in all the replicas. The code to run is "external" it doesn't touch any document.

What I am saying is that when you define the processor chain, include 
DistributedUpdateProcessor in it.  It gets added by SolrCloud even if 
you don't include it, so it's better to have control over its placement. 
  And then place your update processor in the list *BEFORE* 
DistributedUpdateProcessor.

This should accomplish your goals automatically, with no code required.

Thanks,
Shawn

Re: Solr: Get leaderness status of local node

Posted by lamine lamine <ya...@yahoo.fr.INVALID>.

 I only want the code be run once per shard. One way to guarantee that is to do it in the leader, as there is always one leader per shard. I don't want to run it in all the replicas. The code to run is "external" it doesn't touch any document.

    Le samedi 8 mai 2021 à 17:21:50 UTC−5, Shawn Heisey <ap...@elyograg.org> a écrit :  

 On 5/8/2021 12:21 PM, Walter Underwood wrote:
> Why do you only want to run it on the leader? This seems like a misunderstanding
> of how NRT replication works.

As I understand NRT, any update processor that is configured *before* 
DistributedUpdateProcessor in the chain will be run once by the node 
that first handles the update (probably the leader), and any processor 
that is configured *after* DistributedUpdateProcessor will be run by 
every indexing replica.

Can anyone confirm or deny my understanding?

Thanks,
Shawn

Re: Solr: Get leaderness status of local node

Posted by Shawn Heisey <ap...@elyograg.org>.

On 5/8/2021 12:21 PM, Walter Underwood wrote:
> Why do you only want to run it on the leader? This seems like a misunderstanding
> of how NRT replication works.

As I understand NRT, any update processor that is configured *before* 
DistributedUpdateProcessor in the chain will be run once by the node 
that first handles the update (probably the leader), and any processor 
that is configured *after* DistributedUpdateProcessor will be run by 
every indexing replica.

Can anyone confirm or deny my understanding?

Thanks,
Shawn

Re: Solr: Get leaderness status of local node

Posted by Walter Underwood <wu...@wunderwood.org>.

Why do you only want to run it on the leader? This seems like a misunderstanding
of how NRT replication works.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 8, 2021, at 6:26 AM, Ilan Ginzburg <il...@gmail.com> wrote:
> 
> I'm not familiar with the way update processors work but an update is
> processed by the leader after having been routed to it, right?
> Would it be possible to run the leader specific code there/then while
> taking advantage of knowing execution happens on the leader?
> 
> Ilan
> 
> Le ven. 7 mai 2021 à 21:20, lamine lamine <ya...@yahoo.fr.invalid> a
> écrit :
> 
>> Hi Solr people,
>> I am writing a custom UpdateProcessor, part of a custom plugin, and need
>> to run some code only on the shard leader. This is a plugin, so I cannot
>> access the DistributedUpdateProcessor.isLeader()  method which is not
>> public.
>> For now I am copying-pasting the below code, but I am thinking there's got
>> to be a better way to do it.
>> private boolean getIsLeader() {
>>    final boolean isZkAware =
>> req.getCore().getCoreContainer().isZooKeeperAware();
>>    if (!isZkAware) {
>>        return getNonZkLeaderAssumption(req);
>>    }
>>    String shardId = cloudDesc.getShardId();
>>    try {
>>        Replica leaderReplica =
>> zkController.getZkStateReader().getLeaderRetry(collection, shardId);
>>        return leaderReplica.getName().equals(cloudDesc.getCoreNodeName());
>>    }
>>    catch (InterruptedException e) {
>>        throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
>> “Error TODO", e);
>>    }
>> }
>> I think there is also another way to do it using cloudDesc.isLeader()  but
>> my understanding, if I am not wrong, is that the first code gives the most
>> accurate state. Am I right? Is this the only way to get the most accurate
>> state about the current leader?
>> 
>> Also, should I run it every time I need to check the current status as the
>> leader can change anytime? What's the impact in terms of performance?
>> 
>> Thank you for your help in advance.
>> Lamine
>> 
>> 
>>

Re: Solr: Get leaderness status of local node

Posted by Ilan Ginzburg <il...@gmail.com>.

I'm not familiar with the way update processors work but an update is
processed by the leader after having been routed to it, right?
Would it be possible to run the leader specific code there/then while
taking advantage of knowing execution happens on the leader?

Ilan

Le ven. 7 mai 2021 à 21:20, lamine lamine <ya...@yahoo.fr.invalid> a
écrit :

> Hi Solr people,
> I am writing a custom UpdateProcessor, part of a custom plugin, and need
> to run some code only on the shard leader. This is a plugin, so I cannot
> access the DistributedUpdateProcessor.isLeader()  method which is not
> public.
> For now I am copying-pasting the below code, but I am thinking there's got
> to be a better way to do it.
> private boolean getIsLeader() {
>     final boolean isZkAware =
> req.getCore().getCoreContainer().isZooKeeperAware();
>     if (!isZkAware) {
>         return getNonZkLeaderAssumption(req);
>     }
>     String shardId = cloudDesc.getShardId();
>     try {
>         Replica leaderReplica =
> zkController.getZkStateReader().getLeaderRetry(collection, shardId);
>         return leaderReplica.getName().equals(cloudDesc.getCoreNodeName());
>     }
>     catch (InterruptedException e) {
>         throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
> “Error TODO", e);
>     }
> }
> I think there is also another way to do it using cloudDesc.isLeader()  but
> my understanding, if I am not wrong, is that the first code gives the most
> accurate state. Am I right? Is this the only way to get the most accurate
> state about the current leader?
>
> Also, should I run it every time I need to check the current status as the
> leader can change anytime? What's the impact in terms of performance?
>
> Thank you for your help in advance.
> Lamine
>
>
>