You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Michael Craig <mc...@box.com> on 2016/10/19 18:53:00 UTC

Correct way to redistribute work from disconnected instances?

I've noticed that partitions/replicas assigned to disconnected instances
are not automatically redistributed to live instances. What's the correct
way to do this?

For example, given this setup with Helix 0.6.5:
- 1 resource
- 2 replicas
- LeaderStandby state model
- FULL_AUTO rebalance mode
- 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)

Then drop N1:
- N2 becomes LEADER
- Nothing happens to N3

Naively, I would have expected N3 to transition from Offline to Standby,
but that doesn't happen.

I can force redistribution from GenericHelixController#onLiveInstanceChange
by
- dropping non-live instances from the cluster
- calling rebalance

The instance dropping seems pretty unsafe! Is there a better way?

Re: Correct way to redistribute work from disconnected instances?

Posted by Michael Craig <mc...@box.com>.

Moving this 2nd rebalancing question to another thread to clarify. Thanks
Kishore and Lei for your help!

On Thu, Oct 20, 2016 at 10:28 AM, Michael Craig <mc...@box.com> wrote:

> That works! The cluster is automatically rebalancing when nodes
> start/stop. This has raised other questions about rebalancing:
>
> Example output below, and I updated the gist: https://gist.github.com/
> mkscrg/bcb2ab1dd1b3e84ac93e7ca16e2824f8
>
>    - When NODE_0 restarts, why is the resource moved back? This seems
>    like unhelpful churn in the cluster.
>    - Why does the resource stay in the OFFLINE state on NODE_0?
>
>
> 2 node cluster with a single resource with 1 partition/replica, using
> OnlineOffline:
>
> Starting ZooKeeper at localhost:2199
> Setting up cluster THE_CLUSTER
> Starting CONTROLLER
> Starting NODE_0
> Starting NODE_1
> Adding resource THE_RESOURCE
> Rebalancing resource THE_RESOURCE
> Transition: NODE_0 OFFLINE to ONLINE for THE_RESOURCE
> Cluster state after setup:
> NODE_0: ONLINE
> NODE_1: null
> ------------------------------------------------------------
> Stopping NODE_0
> Transition: NODE_1 OFFLINE to ONLINE for THE_RESOURCE
> Cluster state after stopping first node:
> NODE_0: null
> NODE_1: ONLINE
> ------------------------------------------------------------
> Starting NODE_0
> Transition: NODE_1 ONLINE to OFFLINE for THE_RESOURCE
> Transition: NODE_1 OFFLINE to DROPPED for THE_RESOURCE
> Cluster state after restarting first node:
> NODE_0: OFFLINE
> NODE_1: null
> ------------------------------------------------------------
>
> On Thu, Oct 20, 2016 at 9:18 AM, Lei Xia <lx...@linkedin.com> wrote:
>
>> Hi, Michael
>>
>>   To answer your questions:
>>
>>    - Should you have to `rebalance` a resource when adding a new node to
>>    the cluster?
>> *--- No, if you are using full-auto rebalance mode,  yes if you are in
>>    semi-auto rebalance mode. *
>>    - Should you have to `rebalance` when a node is dropped? *-- Again,
>>    same answer, No, you do not need to in full-auto mode.  In full-auto mode,
>>    Helix is supposed to detect nodes add/delete/online/offline and rebalance
>>    the resource automatically. *
>>
>>
>>   The problem you saw was because your resource was created in SEMI-AUTO
>> mode instead of FULL-AUTO mode.  HelixAdmin.addResource() creates a
>> resource in semi-auto mode by default if you do not specify a rebalance
>> mode explicitly.  Please see my comments below on how to fix it.
>>
>>
>> static void addResource() throws Exception {
>>   echo("Adding resource " + RESOURCE_NAME);
>>   ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME, NUM_PARTITIONS,
>> STATE_MODEL_NAME);  *==> ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME,
>> NUM_PARTITIONS, STATE_MODEL_NAME, RebalanceMode.FULL_AUTO); *
>>   echo("Rebalancing resource " + RESOURCE_NAME);
>>   ADMIN.rebalance(CLUSTER_NAME, RESOURCE_NAME, NUM_REPLICAS);  * // This
>> just needs to be called once after the resource was created, no need to
>> call when there is node change. *
>> }
>>
>>
>> Please give it a try and let me know whether it works.  Thanks!
>>
>>
>> Lei
>>
>> On Wed, Oct 19, 2016 at 11:52 PM, Michael Craig <mc...@box.com> wrote:
>>
>>> Here is some repro code for "drop a node, resource is not redistributed"
>>> case I described: https://gist.github.com/mkscrg/bcb2ab1dd1b3e84ac9
>>> 3e7ca16e2824f8
>>>
>>> Can we answer these 2 questions? That would help clarify things:
>>>
>>>    - Should you have to `rebalance` a resource when adding a new node
>>>    to the cluster?
>>>    - If no, this is an easy bug to reproduce. The example code
>>>       <https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java#L198>
>>>       calls rebalance after adding a node, and it breaks if you comment out that
>>>       line.
>>>       - If yes, what is the correct way to manage many resources on a
>>>       cluster? Iterate through all resources and rebalance them for every new
>>>       node?
>>>    - Should you have to `rebalance` when a node is dropped?
>>>       - If no, there is a bug. See the repro code posted above.
>>>       - If yes, we are in the same rebalance-every-resource situation
>>>       as above.
>>>
>>> My use case is to manage a set of ad-hoc tasks across a cluster of
>>> machines. Each task would be a separate resource with a unique name, with 1
>>> partition and 1 replica. Each resource would reside on exactly 1 node, and
>>> there is no limit on the number of resources per node.
>>>
>>> On Wed, Oct 19, 2016 at 9:23 PM, Lei Xia <xi...@gmail.com> wrote:
>>>
>>>> Hi, Michael
>>>>
>>>>   Could you be more specific on the issue you see? Specifically:
>>>>   1) For 1 resource and 2 replicas, you mean the resource has only 1
>>>> partition, with replica number equals to 2, right?
>>>>   2) You see* REBALANCE_MODE="FULL_AUTO"*, not* IDEALSTATE_MODE="AUTO"
>>>> *in your idealState, right?
>>>>   3) by dropping N1, you mean disconnect N1 from helix/zookeeper, so N1
>>>> is not in liveInstances, right?
>>>>
>>>>   If your answers to all of above questions are yes, then there may be
>>>> some bug here.  If possible, please paste your idealstate, and your test
>>>> code (if there is any) here, I will try to reproduce and debug it.  Thanks
>>>>
>>>>
>>>> Lei
>>>>
>>>> On Wed, Oct 19, 2016 at 9:02 PM, kishore g <g....@gmail.com> wrote:
>>>>
>>>>> Can you describe your scenario in detail and the expected behavior?. I
>>>>> agree calling rebalance on every live instance change is ugly and
>>>>> definitely not as per the design. It was an oversight (we focussed a lot of
>>>>> large number of partitions and failed to handle this simple case).
>>>>>
>>>>> Please file and jira and we will work on that. Lei, do you think the
>>>>> recent bug we fixed with AutoRebalancer will handle this case?
>>>>>
>>>>> thanks,
>>>>> Kishore G
>>>>>
>>>>> On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mc...@box.com> wrote:
>>>>>
>>>>>> Thanks for the quick response Kishore. This issue is definitely tied
>>>>>> to the condition that partitions * replicas < NODE_COUNT.
>>>>>> If all running nodes have a "piece" of the resource, then they behave
>>>>>> well when the LEADER node goes away.
>>>>>>
>>>>>> Is it possible to use Helix to manage a set of resources where that
>>>>>> condition is true? I.e. where the *total *number of
>>>>>> partitions/replicas in the cluster is greater than the node count, but each
>>>>>> individual resource has a small number of partitions/replicas.
>>>>>>
>>>>>> (Calling rebalance on every liveInstance change does not seem like a
>>>>>> good solution, because you would have to iterate through all resources in
>>>>>> the cluster and rebalance each individually.)
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I think this might be a corner case when partitions * replicas <
>>>>>>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
>>>>>>> check if the issue still exists.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've noticed that partitions/replicas assigned to disconnected
>>>>>>>> instances are not automatically redistributed to live instances. What's the
>>>>>>>> correct way to do this?
>>>>>>>>
>>>>>>>> For example, given this setup with Helix 0.6.5:
>>>>>>>> - 1 resource
>>>>>>>> - 2 replicas
>>>>>>>> - LeaderStandby state model
>>>>>>>> - FULL_AUTO rebalance mode
>>>>>>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>>>>>>>
>>>>>>>> Then drop N1:
>>>>>>>> - N2 becomes LEADER
>>>>>>>> - Nothing happens to N3
>>>>>>>>
>>>>>>>> Naively, I would have expected N3 to transition from Offline to
>>>>>>>> Standby, but that doesn't happen.
>>>>>>>>
>>>>>>>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>>>>>>>> by
>>>>>>>> - dropping non-live instances from the cluster
>>>>>>>> - calling rebalance
>>>>>>>>
>>>>>>>> The instance dropping seems pretty unsafe! Is there a better way?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Lei Xia
>>>>
>>>
>>>
>>
>>
>> --
>>
>> *Lei Xia *Senior Software Engineer
>> Data Infra/Nuage & Helix
>> LinkedIn
>>
>> lxia@linkedin.com
>> www.linkedin.com/in/lxia1
>>
>
>

Re: Correct way to redistribute work from disconnected instances?

Posted by Michael Craig <mc...@box.com>.

That works! The cluster is automatically rebalancing when nodes start/stop.
This has raised other questions about rebalancing:

Example output below, and I updated the gist:
https://gist.github.com/mkscrg/bcb2ab1dd1b3e84ac93e7ca16e2824f8

   - When NODE_0 restarts, why is the resource moved back? This seems like
   unhelpful churn in the cluster.
   - Why does the resource stay in the OFFLINE state on NODE_0?


2 node cluster with a single resource with 1 partition/replica, using
OnlineOffline:

Starting ZooKeeper at localhost:2199
Setting up cluster THE_CLUSTER
Starting CONTROLLER
Starting NODE_0
Starting NODE_1
Adding resource THE_RESOURCE
Rebalancing resource THE_RESOURCE
Transition: NODE_0 OFFLINE to ONLINE for THE_RESOURCE
Cluster state after setup:
NODE_0: ONLINE
NODE_1: null
------------------------------------------------------------
Stopping NODE_0
Transition: NODE_1 OFFLINE to ONLINE for THE_RESOURCE
Cluster state after stopping first node:
NODE_0: null
NODE_1: ONLINE
------------------------------------------------------------
Starting NODE_0
Transition: NODE_1 ONLINE to OFFLINE for THE_RESOURCE
Transition: NODE_1 OFFLINE to DROPPED for THE_RESOURCE
Cluster state after restarting first node:
NODE_0: OFFLINE
NODE_1: null
------------------------------------------------------------

On Thu, Oct 20, 2016 at 9:18 AM, Lei Xia <lx...@linkedin.com> wrote:

> Hi, Michael
>
>   To answer your questions:
>
>    - Should you have to `rebalance` a resource when adding a new node to
>    the cluster?
> *--- No, if you are using full-auto rebalance mode,  yes if you are in
>    semi-auto rebalance mode. *
>    - Should you have to `rebalance` when a node is dropped? *-- Again,
>    same answer, No, you do not need to in full-auto mode.  In full-auto mode,
>    Helix is supposed to detect nodes add/delete/online/offline and rebalance
>    the resource automatically. *
>
>
>   The problem you saw was because your resource was created in SEMI-AUTO
> mode instead of FULL-AUTO mode.  HelixAdmin.addResource() creates a
> resource in semi-auto mode by default if you do not specify a rebalance
> mode explicitly.  Please see my comments below on how to fix it.
>
>
> static void addResource() throws Exception {
>   echo("Adding resource " + RESOURCE_NAME);
>   ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME, NUM_PARTITIONS,
> STATE_MODEL_NAME);  *==> ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME,
> NUM_PARTITIONS, STATE_MODEL_NAME, RebalanceMode.FULL_AUTO); *
>   echo("Rebalancing resource " + RESOURCE_NAME);
>   ADMIN.rebalance(CLUSTER_NAME, RESOURCE_NAME, NUM_REPLICAS);  * // This
> just needs to be called once after the resource was created, no need to
> call when there is node change. *
> }
>
>
> Please give it a try and let me know whether it works.  Thanks!
>
>
> Lei
>
> On Wed, Oct 19, 2016 at 11:52 PM, Michael Craig <mc...@box.com> wrote:
>
>> Here is some repro code for "drop a node, resource is not redistributed"
>> case I described: https://gist.github.com/mkscrg/bcb2ab1dd1b3e84ac9
>> 3e7ca16e2824f8
>>
>> Can we answer these 2 questions? That would help clarify things:
>>
>>    - Should you have to `rebalance` a resource when adding a new node to
>>    the cluster?
>>    - If no, this is an easy bug to reproduce. The example code
>>       <https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java#L198>
>>       calls rebalance after adding a node, and it breaks if you comment out that
>>       line.
>>       - If yes, what is the correct way to manage many resources on a
>>       cluster? Iterate through all resources and rebalance them for every new
>>       node?
>>    - Should you have to `rebalance` when a node is dropped?
>>       - If no, there is a bug. See the repro code posted above.
>>       - If yes, we are in the same rebalance-every-resource situation as
>>       above.
>>
>> My use case is to manage a set of ad-hoc tasks across a cluster of
>> machines. Each task would be a separate resource with a unique name, with 1
>> partition and 1 replica. Each resource would reside on exactly 1 node, and
>> there is no limit on the number of resources per node.
>>
>> On Wed, Oct 19, 2016 at 9:23 PM, Lei Xia <xi...@gmail.com> wrote:
>>
>>> Hi, Michael
>>>
>>>   Could you be more specific on the issue you see? Specifically:
>>>   1) For 1 resource and 2 replicas, you mean the resource has only 1
>>> partition, with replica number equals to 2, right?
>>>   2) You see* REBALANCE_MODE="FULL_AUTO"*, not* IDEALSTATE_MODE="AUTO" *in
>>> your idealState, right?
>>>   3) by dropping N1, you mean disconnect N1 from helix/zookeeper, so N1
>>> is not in liveInstances, right?
>>>
>>>   If your answers to all of above questions are yes, then there may be
>>> some bug here.  If possible, please paste your idealstate, and your test
>>> code (if there is any) here, I will try to reproduce and debug it.  Thanks
>>>
>>>
>>> Lei
>>>
>>> On Wed, Oct 19, 2016 at 9:02 PM, kishore g <g....@gmail.com> wrote:
>>>
>>>> Can you describe your scenario in detail and the expected behavior?. I
>>>> agree calling rebalance on every live instance change is ugly and
>>>> definitely not as per the design. It was an oversight (we focussed a lot of
>>>> large number of partitions and failed to handle this simple case).
>>>>
>>>> Please file and jira and we will work on that. Lei, do you think the
>>>> recent bug we fixed with AutoRebalancer will handle this case?
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>> On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mc...@box.com> wrote:
>>>>
>>>>> Thanks for the quick response Kishore. This issue is definitely tied
>>>>> to the condition that partitions * replicas < NODE_COUNT.
>>>>> If all running nodes have a "piece" of the resource, then they behave
>>>>> well when the LEADER node goes away.
>>>>>
>>>>> Is it possible to use Helix to manage a set of resources where that
>>>>> condition is true? I.e. where the *total *number of
>>>>> partitions/replicas in the cluster is greater than the node count, but each
>>>>> individual resource has a small number of partitions/replicas.
>>>>>
>>>>> (Calling rebalance on every liveInstance change does not seem like a
>>>>> good solution, because you would have to iterate through all resources in
>>>>> the cluster and rebalance each individually.)
>>>>>
>>>>>
>>>>> On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I think this might be a corner case when partitions * replicas <
>>>>>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
>>>>>> check if the issue still exists.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I've noticed that partitions/replicas assigned to disconnected
>>>>>>> instances are not automatically redistributed to live instances. What's the
>>>>>>> correct way to do this?
>>>>>>>
>>>>>>> For example, given this setup with Helix 0.6.5:
>>>>>>> - 1 resource
>>>>>>> - 2 replicas
>>>>>>> - LeaderStandby state model
>>>>>>> - FULL_AUTO rebalance mode
>>>>>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>>>>>>
>>>>>>> Then drop N1:
>>>>>>> - N2 becomes LEADER
>>>>>>> - Nothing happens to N3
>>>>>>>
>>>>>>> Naively, I would have expected N3 to transition from Offline to
>>>>>>> Standby, but that doesn't happen.
>>>>>>>
>>>>>>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>>>>>>> by
>>>>>>> - dropping non-live instances from the cluster
>>>>>>> - calling rebalance
>>>>>>>
>>>>>>> The instance dropping seems pretty unsafe! Is there a better way?
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Lei Xia
>>>
>>
>>
>
>
> --
>
> *Lei Xia *Senior Software Engineer
> Data Infra/Nuage & Helix
> LinkedIn
>
> lxia@linkedin.com
> www.linkedin.com/in/lxia1
>

Re: Correct way to redistribute work from disconnected instances?

Posted by Lei Xia <lx...@linkedin.com>.

Hi, Michael

  To answer your questions:

   - Should you have to `rebalance` a resource when adding a new node to
   the cluster?
*--- No, if you are using full-auto rebalance mode,  yes if you are in
   semi-auto rebalance mode. *
   - Should you have to `rebalance` when a node is dropped? *-- Again, same
   answer, No, you do not need to in full-auto mode.  In full-auto mode, Helix
   is supposed to detect nodes add/delete/online/offline and rebalance the
   resource automatically. *


  The problem you saw was because your resource was created in SEMI-AUTO
mode instead of FULL-AUTO mode.  HelixAdmin.addResource() creates a
resource in semi-auto mode by default if you do not specify a rebalance
mode explicitly.  Please see my comments below on how to fix it.


static void addResource() throws Exception {
  echo("Adding resource " + RESOURCE_NAME);
  ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME, NUM_PARTITIONS,
STATE_MODEL_NAME);  *==> ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME,
NUM_PARTITIONS, STATE_MODEL_NAME, RebalanceMode.FULL_AUTO); *
  echo("Rebalancing resource " + RESOURCE_NAME);
  ADMIN.rebalance(CLUSTER_NAME, RESOURCE_NAME, NUM_REPLICAS);  * // This
just needs to be called once after the resource was created, no need to
call when there is node change. *
}


Please give it a try and let me know whether it works.  Thanks!


Lei

On Wed, Oct 19, 2016 at 11:52 PM, Michael Craig <mc...@box.com> wrote:

> Here is some repro code for "drop a node, resource is not redistributed"
> case I described: https://gist.github.com/mkscrg/
> bcb2ab1dd1b3e84ac93e7ca16e2824f8
>
> Can we answer these 2 questions? That would help clarify things:
>
>    - Should you have to `rebalance` a resource when adding a new node to
>    the cluster?
>    - If no, this is an easy bug to reproduce. The example code
>       <https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java#L198>
>       calls rebalance after adding a node, and it breaks if you comment out that
>       line.
>       - If yes, what is the correct way to manage many resources on a
>       cluster? Iterate through all resources and rebalance them for every new
>       node?
>    - Should you have to `rebalance` when a node is dropped?
>       - If no, there is a bug. See the repro code posted above.
>       - If yes, we are in the same rebalance-every-resource situation as
>       above.
>
> My use case is to manage a set of ad-hoc tasks across a cluster of
> machines. Each task would be a separate resource with a unique name, with 1
> partition and 1 replica. Each resource would reside on exactly 1 node, and
> there is no limit on the number of resources per node.
>
> On Wed, Oct 19, 2016 at 9:23 PM, Lei Xia <xi...@gmail.com> wrote:
>
>> Hi, Michael
>>
>>   Could you be more specific on the issue you see? Specifically:
>>   1) For 1 resource and 2 replicas, you mean the resource has only 1
>> partition, with replica number equals to 2, right?
>>   2) You see* REBALANCE_MODE="FULL_AUTO"*, not* IDEALSTATE_MODE="AUTO" *in
>> your idealState, right?
>>   3) by dropping N1, you mean disconnect N1 from helix/zookeeper, so N1
>> is not in liveInstances, right?
>>
>>   If your answers to all of above questions are yes, then there may be
>> some bug here.  If possible, please paste your idealstate, and your test
>> code (if there is any) here, I will try to reproduce and debug it.  Thanks
>>
>>
>> Lei
>>
>> On Wed, Oct 19, 2016 at 9:02 PM, kishore g <g....@gmail.com> wrote:
>>
>>> Can you describe your scenario in detail and the expected behavior?. I
>>> agree calling rebalance on every live instance change is ugly and
>>> definitely not as per the design. It was an oversight (we focussed a lot of
>>> large number of partitions and failed to handle this simple case).
>>>
>>> Please file and jira and we will work on that. Lei, do you think the
>>> recent bug we fixed with AutoRebalancer will handle this case?
>>>
>>> thanks,
>>> Kishore G
>>>
>>> On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mc...@box.com> wrote:
>>>
>>>> Thanks for the quick response Kishore. This issue is definitely tied to
>>>> the condition that partitions * replicas < NODE_COUNT.
>>>> If all running nodes have a "piece" of the resource, then they behave
>>>> well when the LEADER node goes away.
>>>>
>>>> Is it possible to use Helix to manage a set of resources where that
>>>> condition is true? I.e. where the *total *number of
>>>> partitions/replicas in the cluster is greater than the node count, but each
>>>> individual resource has a small number of partitions/replicas.
>>>>
>>>> (Calling rebalance on every liveInstance change does not seem like a
>>>> good solution, because you would have to iterate through all resources in
>>>> the cluster and rebalance each individually.)
>>>>
>>>>
>>>> On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com>
>>>> wrote:
>>>>
>>>>> I think this might be a corner case when partitions * replicas <
>>>>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
>>>>> check if the issue still exists.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com>
>>>>> wrote:
>>>>>
>>>>>> I've noticed that partitions/replicas assigned to disconnected
>>>>>> instances are not automatically redistributed to live instances. What's the
>>>>>> correct way to do this?
>>>>>>
>>>>>> For example, given this setup with Helix 0.6.5:
>>>>>> - 1 resource
>>>>>> - 2 replicas
>>>>>> - LeaderStandby state model
>>>>>> - FULL_AUTO rebalance mode
>>>>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>>>>>
>>>>>> Then drop N1:
>>>>>> - N2 becomes LEADER
>>>>>> - Nothing happens to N3
>>>>>>
>>>>>> Naively, I would have expected N3 to transition from Offline to
>>>>>> Standby, but that doesn't happen.
>>>>>>
>>>>>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>>>>>> by
>>>>>> - dropping non-live instances from the cluster
>>>>>> - calling rebalance
>>>>>>
>>>>>> The instance dropping seems pretty unsafe! Is there a better way?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Lei Xia
>>
>
>


-- 

*Lei Xia *Senior Software Engineer
Data Infra/Nuage & Helix
LinkedIn

lxia@linkedin.com
www.linkedin.com/in/lxia1

Re: Correct way to redistribute work from disconnected instances?

Posted by Michael Craig <mc...@box.com>.

Here is some repro code for "drop a node, resource is not redistributed"
case I described:
https://gist.github.com/mkscrg/bcb2ab1dd1b3e84ac93e7ca16e2824f8

Can we answer these 2 questions? That would help clarify things:

   - Should you have to `rebalance` a resource when adding a new node to
   the cluster?
   - If no, this is an easy bug to reproduce. The example code
      <https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/examples/Quickstart.java#L198>
      calls rebalance after adding a node, and it breaks if you
comment out that
      line.
      - If yes, what is the correct way to manage many resources on a
      cluster? Iterate through all resources and rebalance them for every new
      node?
   - Should you have to `rebalance` when a node is dropped?
      - If no, there is a bug. See the repro code posted above.
      - If yes, we are in the same rebalance-every-resource situation as
      above.

My use case is to manage a set of ad-hoc tasks across a cluster of
machines. Each task would be a separate resource with a unique name, with 1
partition and 1 replica. Each resource would reside on exactly 1 node, and
there is no limit on the number of resources per node.

On Wed, Oct 19, 2016 at 9:23 PM, Lei Xia <xi...@gmail.com> wrote:

> Hi, Michael
>
>   Could you be more specific on the issue you see? Specifically:
>   1) For 1 resource and 2 replicas, you mean the resource has only 1
> partition, with replica number equals to 2, right?
>   2) You see* REBALANCE_MODE="FULL_AUTO"*, not* IDEALSTATE_MODE="AUTO" *in
> your idealState, right?
>   3) by dropping N1, you mean disconnect N1 from helix/zookeeper, so N1 is
> not in liveInstances, right?
>
>   If your answers to all of above questions are yes, then there may be
> some bug here.  If possible, please paste your idealstate, and your test
> code (if there is any) here, I will try to reproduce and debug it.  Thanks
>
>
> Lei
>
> On Wed, Oct 19, 2016 at 9:02 PM, kishore g <g....@gmail.com> wrote:
>
>> Can you describe your scenario in detail and the expected behavior?. I
>> agree calling rebalance on every live instance change is ugly and
>> definitely not as per the design. It was an oversight (we focussed a lot of
>> large number of partitions and failed to handle this simple case).
>>
>> Please file and jira and we will work on that. Lei, do you think the
>> recent bug we fixed with AutoRebalancer will handle this case?
>>
>> thanks,
>> Kishore G
>>
>> On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mc...@box.com> wrote:
>>
>>> Thanks for the quick response Kishore. This issue is definitely tied to
>>> the condition that partitions * replicas < NODE_COUNT.
>>> If all running nodes have a "piece" of the resource, then they behave
>>> well when the LEADER node goes away.
>>>
>>> Is it possible to use Helix to manage a set of resources where that
>>> condition is true? I.e. where the *total *number of partitions/replicas
>>> in the cluster is greater than the node count, but each individual resource
>>> has a small number of partitions/replicas.
>>>
>>> (Calling rebalance on every liveInstance change does not seem like a
>>> good solution, because you would have to iterate through all resources in
>>> the cluster and rebalance each individually.)
>>>
>>>
>>> On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com> wrote:
>>>
>>>> I think this might be a corner case when partitions * replicas <
>>>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
>>>> check if the issue still exists.
>>>>
>>>>
>>>>
>>>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com> wrote:
>>>>
>>>>> I've noticed that partitions/replicas assigned to disconnected
>>>>> instances are not automatically redistributed to live instances. What's the
>>>>> correct way to do this?
>>>>>
>>>>> For example, given this setup with Helix 0.6.5:
>>>>> - 1 resource
>>>>> - 2 replicas
>>>>> - LeaderStandby state model
>>>>> - FULL_AUTO rebalance mode
>>>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>>>>
>>>>> Then drop N1:
>>>>> - N2 becomes LEADER
>>>>> - Nothing happens to N3
>>>>>
>>>>> Naively, I would have expected N3 to transition from Offline to
>>>>> Standby, but that doesn't happen.
>>>>>
>>>>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>>>>> by
>>>>> - dropping non-live instances from the cluster
>>>>> - calling rebalance
>>>>>
>>>>> The instance dropping seems pretty unsafe! Is there a better way?
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Lei Xia
>

Re: Correct way to redistribute work from disconnected instances?

Posted by Lei Xia <xi...@gmail.com>.

Hi, Michael

  Could you be more specific on the issue you see? Specifically:
  1) For 1 resource and 2 replicas, you mean the resource has only 1
partition, with replica number equals to 2, right?
  2) You see* REBALANCE_MODE="FULL_AUTO"*, not* IDEALSTATE_MODE="AUTO" *in
your idealState, right?
  3) by dropping N1, you mean disconnect N1 from helix/zookeeper, so N1 is
not in liveInstances, right?

  If your answers to all of above questions are yes, then there may be some
bug here.  If possible, please paste your idealstate, and your test code
(if there is any) here, I will try to reproduce and debug it.  Thanks


Lei

On Wed, Oct 19, 2016 at 9:02 PM, kishore g <g....@gmail.com> wrote:

> Can you describe your scenario in detail and the expected behavior?. I
> agree calling rebalance on every live instance change is ugly and
> definitely not as per the design. It was an oversight (we focussed a lot of
> large number of partitions and failed to handle this simple case).
>
> Please file and jira and we will work on that. Lei, do you think the
> recent bug we fixed with AutoRebalancer will handle this case?
>
> thanks,
> Kishore G
>
> On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mc...@box.com> wrote:
>
>> Thanks for the quick response Kishore. This issue is definitely tied to
>> the condition that partitions * replicas < NODE_COUNT.
>> If all running nodes have a "piece" of the resource, then they behave
>> well when the LEADER node goes away.
>>
>> Is it possible to use Helix to manage a set of resources where that
>> condition is true? I.e. where the *total *number of partitions/replicas
>> in the cluster is greater than the node count, but each individual resource
>> has a small number of partitions/replicas.
>>
>> (Calling rebalance on every liveInstance change does not seem like a good
>> solution, because you would have to iterate through all resources in the
>> cluster and rebalance each individually.)
>>
>>
>> On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com> wrote:
>>
>>> I think this might be a corner case when partitions * replicas <
>>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
>>> check if the issue still exists.
>>>
>>>
>>>
>>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com> wrote:
>>>
>>>> I've noticed that partitions/replicas assigned to disconnected
>>>> instances are not automatically redistributed to live instances. What's the
>>>> correct way to do this?
>>>>
>>>> For example, given this setup with Helix 0.6.5:
>>>> - 1 resource
>>>> - 2 replicas
>>>> - LeaderStandby state model
>>>> - FULL_AUTO rebalance mode
>>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>>>
>>>> Then drop N1:
>>>> - N2 becomes LEADER
>>>> - Nothing happens to N3
>>>>
>>>> Naively, I would have expected N3 to transition from Offline to
>>>> Standby, but that doesn't happen.
>>>>
>>>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>>>> by
>>>> - dropping non-live instances from the cluster
>>>> - calling rebalance
>>>>
>>>> The instance dropping seems pretty unsafe! Is there a better way?
>>>>
>>>
>>>
>>
>


-- 
Lei Xia

Re: Correct way to redistribute work from disconnected instances?

Posted by kishore g <g....@gmail.com>.

Can you describe your scenario in detail and the expected behavior?. I
agree calling rebalance on every live instance change is ugly and
definitely not as per the design. It was an oversight (we focussed a lot of
large number of partitions and failed to handle this simple case).

Please file and jira and we will work on that. Lei, do you think the recent
bug we fixed with AutoRebalancer will handle this case?

thanks,
Kishore G

On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mc...@box.com> wrote:

> Thanks for the quick response Kishore. This issue is definitely tied to
> the condition that partitions * replicas < NODE_COUNT.
> If all running nodes have a "piece" of the resource, then they behave well
> when the LEADER node goes away.
>
> Is it possible to use Helix to manage a set of resources where that
> condition is true? I.e. where the *total *number of partitions/replicas
> in the cluster is greater than the node count, but each individual resource
> has a small number of partitions/replicas.
>
> (Calling rebalance on every liveInstance change does not seem like a good
> solution, because you would have to iterate through all resources in the
> cluster and rebalance each individually.)
>
>
> On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com> wrote:
>
>> I think this might be a corner case when partitions * replicas <
>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
>> check if the issue still exists.
>>
>>
>>
>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com> wrote:
>>
>>> I've noticed that partitions/replicas assigned to disconnected instances
>>> are not automatically redistributed to live instances. What's the correct
>>> way to do this?
>>>
>>> For example, given this setup with Helix 0.6.5:
>>> - 1 resource
>>> - 2 replicas
>>> - LeaderStandby state model
>>> - FULL_AUTO rebalance mode
>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>>
>>> Then drop N1:
>>> - N2 becomes LEADER
>>> - Nothing happens to N3
>>>
>>> Naively, I would have expected N3 to transition from Offline to Standby,
>>> but that doesn't happen.
>>>
>>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>>> by
>>> - dropping non-live instances from the cluster
>>> - calling rebalance
>>>
>>> The instance dropping seems pretty unsafe! Is there a better way?
>>>
>>
>>
>

Re: Correct way to redistribute work from disconnected instances?

Posted by Michael Craig <mc...@box.com>.

Thanks for the quick response Kishore. This issue is definitely tied to the
condition that partitions * replicas < NODE_COUNT.
If all running nodes have a "piece" of the resource, then they behave well
when the LEADER node goes away.

Is it possible to use Helix to manage a set of resources where that
condition is true? I.e. where the *total *number of partitions/replicas in
the cluster is greater than the node count, but each individual resource
has a small number of partitions/replicas.

(Calling rebalance on every liveInstance change does not seem like a good
solution, because you would have to iterate through all resources in the
cluster and rebalance each individually.)

On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g....@gmail.com> wrote:

> I think this might be a corner case when partitions * replicas <
> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
> check if the issue still exists.
>
>
>
> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com> wrote:
>
>> I've noticed that partitions/replicas assigned to disconnected instances
>> are not automatically redistributed to live instances. What's the correct
>> way to do this?
>>
>> For example, given this setup with Helix 0.6.5:
>> - 1 resource
>> - 2 replicas
>> - LeaderStandby state model
>> - FULL_AUTO rebalance mode
>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>>
>> Then drop N1:
>> - N2 becomes LEADER
>> - Nothing happens to N3
>>
>> Naively, I would have expected N3 to transition from Offline to Standby,
>> but that doesn't happen.
>>
>> I can force redistribution from GenericHelixController#onLiveInstanceChange
>> by
>> - dropping non-live instances from the cluster
>> - calling rebalance
>>
>> The instance dropping seems pretty unsafe! Is there a better way?
>>
>
>

Re: Correct way to redistribute work from disconnected instances?

Posted by kishore g <g....@gmail.com>.

I think this might be a corner case when partitions * replicas <
TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
check if the issue still exists.



On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mc...@box.com> wrote:

> I've noticed that partitions/replicas assigned to disconnected instances
> are not automatically redistributed to live instances. What's the correct
> way to do this?
>
> For example, given this setup with Helix 0.6.5:
> - 1 resource
> - 2 replicas
> - LeaderStandby state model
> - FULL_AUTO rebalance mode
> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>
> Then drop N1:
> - N2 becomes LEADER
> - Nothing happens to N3
>
> Naively, I would have expected N3 to transition from Offline to Standby,
> but that doesn't happen.
>
> I can force redistribution from GenericHelixController#onLiveInstanceChange
> by
> - dropping non-live instances from the cluster
> - calling rebalance
>
> The instance dropping seems pretty unsafe! Is there a better way?
>