You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Radu Gheorghe <ra...@sematext.com> on 2022/08/01 07:36:43 UTC
Re: Trouble with REBALANCELEADERS api calls
Hi Stephen,
I would generally prefer a low value of "maxAtOnce" when calling
REBALANCELEADERS, so that I don't add too much pressure at once. Something
like 1 or 2 should be OK, unless there are other constraints that get in
the way.
I assume that if you have too many at once (and by default, it tries to do
it all at once), something might time out - maybe on the Zookeeper or
Overseer? That's where I would expect to see some logs.
Best regards,
Radu
--
Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
Sematext Cloud - Full Stack Observability
https://sematext.com/ <http://sematext.com/>
On Thu, Jul 28, 2022 at 10:06 PM Stephen Lewis Bianamara <
stephen.bianamara@gmail.com> wrote:
> Hey Solr Folks!
>
> I'm managing a Solr 8.3.1 cluster and have had trouble with the
> REBALANCELEADERS API calls.
>
> These calls seem to always fail on one or two shards (clusters ranging from
> 24 to 60 shards experiencing this problem). These failures range from
> "soft" failures (e.g. the API returns that it could not change leader) to
> "hard" failures (each node in the shard goes down).
>
> *Details*
> The cluster runs a dedicated overseer and external 3 node zk cluster, each
> on a dedicated VM. None of these, nor the instances themselves (in advance
> of the API call) seem to be particularly throttled. Nor are there any logs
> on the instance(s) which fail to give up/assume leadership.
>
> This is based on an automation script which does the following --
>
> 1. Generate the list of new preferred leaders
> 2. Iterate the shards and add the preferredLeader property to all nodes
> we wish to be leaders, or skip if already present, waiting 3 seconds
> between each call
> 3. Wait 30 seconds; then call REBALANCELEADERS
>
> *My questions*
>
> 1. Is there something wrong or missing with my strategy above?
> 2. Given I can't find any logs and don't see any system limitations, do
> you have any recommendations for what to look at to trace down the
> source
> of the issue?
> 3. Are there any improvements to this API stability in solr 8.4-9.0, or
> planned for the future?
>
> Thanks in advance!
> Stephen
>
Re: Trouble with REBALANCELEADERS api calls
Posted by Stephen Lewis Bianamara <st...@gmail.com>.
Hi Radu,
Thanks for the advice. I'll try out setting maxAtOnce to 1 going forward.
Best,
Stephen
On Mon, Aug 1, 2022 at 12:37 AM Radu Gheorghe <ra...@sematext.com>
wrote:
> Hi Stephen,
>
> I would generally prefer a low value of "maxAtOnce" when calling
> REBALANCELEADERS, so that I don't add too much pressure at once. Something
> like 1 or 2 should be OK, unless there are other constraints that get in
> the way.
>
> I assume that if you have too many at once (and by default, it tries to do
> it all at once), something might time out - maybe on the Zookeeper or
> Overseer? That's where I would expect to see some logs.
>
> Best regards,
> Radu
> --
> Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
> Sematext Cloud - Full Stack Observability
> https://sematext.com/ <http://sematext.com/>
>
>
> On Thu, Jul 28, 2022 at 10:06 PM Stephen Lewis Bianamara <
> stephen.bianamara@gmail.com> wrote:
>
> > Hey Solr Folks!
> >
> > I'm managing a Solr 8.3.1 cluster and have had trouble with the
> > REBALANCELEADERS API calls.
> >
> > These calls seem to always fail on one or two shards (clusters ranging
> from
> > 24 to 60 shards experiencing this problem). These failures range from
> > "soft" failures (e.g. the API returns that it could not change leader) to
> > "hard" failures (each node in the shard goes down).
> >
> > *Details*
> > The cluster runs a dedicated overseer and external 3 node zk cluster,
> each
> > on a dedicated VM. None of these, nor the instances themselves (in
> advance
> > of the API call) seem to be particularly throttled. Nor are there any
> logs
> > on the instance(s) which fail to give up/assume leadership.
> >
> > This is based on an automation script which does the following --
> >
> > 1. Generate the list of new preferred leaders
> > 2. Iterate the shards and add the preferredLeader property to all
> nodes
> > we wish to be leaders, or skip if already present, waiting 3 seconds
> > between each call
> > 3. Wait 30 seconds; then call REBALANCELEADERS
> >
> > *My questions*
> >
> > 1. Is there something wrong or missing with my strategy above?
> > 2. Given I can't find any logs and don't see any system limitations,
> do
> > you have any recommendations for what to look at to trace down the
> > source
> > of the issue?
> > 3. Are there any improvements to this API stability in solr 8.4-9.0,
> or
> > planned for the future?
> >
> > Thanks in advance!
> > Stephen
> >
>