You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by kishore g <g....@gmail.com> on 2014/01/19 18:20:20 UTC

TaskRebalancer

Hi,

I am trying to use TaskRebalancer but not able to understand how it works,
is there any example I can try?

thanks,
Kishore G

Re: TaskRebalancer

Posted by kishore g <g....@gmail.com>.

Actually that makes a lot of sense. Let me look at that.


On Sun, Jan 19, 2014 at 8:49 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:

>
> This sounds a lot like what we did in AutoRebalanceStrategy. There's an
> interface called ReplicaPlacementScheme that the algorithm calls into, and
> a DefaultPlacementScheme that just does evenly balanced assignment.
>
> The simplest thing we could do is have a task rebalancer config and set a
> switch for which placement scheme to use. The current task rebalancer
> already has to specify things like the DAG, so this could just be another
> field to add on.
>
> > Date: Sun, 19 Jan 2014 13:14:33 -0800
> > Subject: Re: TaskRebalancer
> > From: g.kishore@gmail.com
> > To: dev@helix.apache.org
> > CC: dev@helix.incubator.apache.org; user@helix.incubator.apache.org
>
> >
> > Thanks Jason, I was looking at the rebalancer. Looks like target resource
> > is mandatory. What do you suggest is the right way to make target
> resource
> > optional.
> >
> > This is my understanding of what task rebalancer is doing today.
> >
> > It assumes that the system is already hosting a resource something like a
> > database, index etc. Now one can use the task framework to launch
> arbitrary
> > tasks on nodes hosting these resources. For example lets say there is a
> > database MyDB with 3 partitions and 2 replicas and using Master Slave
> state
> > model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> > this
> >
> > {
> > "id":"MyDB",
> > "mapFields":{
> > "MyDB_0":{
> > "N1":"MASTER",
> > "N2":"SLAVE"
> > },
> > "MyDB_1":{
> > "N2":"MASTER",
> > "N3":"SLAVE"
> > },
> > "MyDB_2":{
> > "N1":"SLAVE",
> > "N3":"MASTER"
> > }
> > }
> > }
> >
> > Lets say one wants to take backup of these databases but run only the
> > SLAVEs. One can define the back up task and launch 3 back up tasks (one
> for
> > each partition) only on SLAVEs.
> >
> > What we have currently works perfectly for this scenario. One has to
> simply
> > define the target resource and state for the backup tasks and they will
> be
> > launched in appropriate place. So in this scenario, back task for
> > partitions 0,1,2 will be launched at N2, N3, and N1.
> >
> > But what if the tasks dont have any target resource and can be run on any
> > node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> >
> > We should decouple the logic of where a task is placed from the logic of
> > distributing the tasks. For example, we can abstract out the placement
> > constraint from the rebalancer logic. So we can have a placement provider
> > that computes placement randomly and one that computes placement based on
> > another resource. Probably another one that computes placement based on
> > data locality.
> >
> > What is the right way to approach this ?
> >
> > thanks,
> > Kishore G
> >
> >
> > On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <ne...@gmail.com>
> wrote:
> >
> > > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to use TaskRebalancer but not able to understand how it
> > > works,
> > > > is there any example I can try?
> > > >
> > > > thanks,
> > > > Kishore G
> > > >
> > >
>

RE: TaskRebalancer

Posted by Kanak Biscuitwala <ka...@hotmail.com>.

This sounds a lot like what we did in AutoRebalanceStrategy. There's an interface called ReplicaPlacementScheme that the algorithm calls into, and a DefaultPlacementScheme that just does evenly balanced assignment.
The simplest thing we could do is have a task rebalancer config and set a switch for which placement scheme to use. The current task rebalancer already has to specify things like the DAG, so this could just be another field to add on.
> Date: Sun, 19 Jan 2014 13:14:33 -0800
> Subject: Re: TaskRebalancer
> From: g.kishore@gmail.com
> To: dev@helix.apache.org
> CC: dev@helix.incubator.apache.org; user@helix.incubator.apache.org
> 
> Thanks Jason, I was looking at the rebalancer. Looks like target resource
> is mandatory. What do you suggest is the right way to make target resource
> optional.
> 
> This is my understanding of what task rebalancer is doing today.
> 
> It assumes that the system is already hosting a resource something like a
> database, index etc. Now one can use the task framework to launch arbitrary
> tasks on nodes hosting these resources. For example lets say there is a
> database MyDB with 3 partitions and 2 replicas and using Master Slave state
> model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> this
> 
> {
>   "id":"MyDB",
>   "mapFields":{
>     "MyDB_0":{
>       "N1":"MASTER",
>       "N2":"SLAVE"
>     },
>     "MyDB_1":{
>       "N2":"MASTER",
>       "N3":"SLAVE"
>     },
>     "MyDB_2":{
>       "N1":"SLAVE",
>       "N3":"MASTER"
>     }
>   }
> }
> 
> Lets say one wants to take backup of these databases but run only the
> SLAVEs. One can define the back up task and launch 3 back up tasks (one for
> each partition) only on SLAVEs.
> 
> What we have currently works perfectly for this scenario. One has to simply
> define the target resource and state for the backup tasks and they will be
> launched in appropriate place. So in this scenario, back task for
> partitions 0,1,2 will be launched at N2, N3, and N1.
> 
> But what if the tasks dont have any target resource and can be run on any
> node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> 
> We should decouple the logic of where a task is placed from the logic of
> distributing the tasks. For example, we can abstract out the placement
> constraint from the rebalancer logic. So we can have a placement provider
> that computes placement randomly and one that computes placement based on
> another resource. Probably another one that computes placement based on
> data locality.
> 
> What is the right way to approach this ?
> 
> thanks,
> Kishore G
> 
> 
> On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <ne...@gmail.com> wrote:
> 
> > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> >
> > Thanks,
> > Jason
> >
> >
> > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use TaskRebalancer but not able to understand how it
> > works,
> > > is there any example I can try?
> > >
> > > thanks,
> > > Kishore G
> > >
> >

RE: TaskRebalancer

Posted by Kanak Biscuitwala <ka...@hotmail.com>.

This sounds a lot like what we did in AutoRebalanceStrategy. There's an interface called ReplicaPlacementScheme that the algorithm calls into, and a DefaultPlacementScheme that just does evenly balanced assignment.
The simplest thing we could do is have a task rebalancer config and set a switch for which placement scheme to use. The current task rebalancer already has to specify things like the DAG, so this could just be another field to add on.
> Date: Sun, 19 Jan 2014 13:14:33 -0800
> Subject: Re: TaskRebalancer
> From: g.kishore@gmail.com
> To: dev@helix.apache.org
> CC: dev@helix.incubator.apache.org; user@helix.incubator.apache.org
> 
> Thanks Jason, I was looking at the rebalancer. Looks like target resource
> is mandatory. What do you suggest is the right way to make target resource
> optional.
> 
> This is my understanding of what task rebalancer is doing today.
> 
> It assumes that the system is already hosting a resource something like a
> database, index etc. Now one can use the task framework to launch arbitrary
> tasks on nodes hosting these resources. For example lets say there is a
> database MyDB with 3 partitions and 2 replicas and using Master Slave state
> model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> this
> 
> {
>   "id":"MyDB",
>   "mapFields":{
>     "MyDB_0":{
>       "N1":"MASTER",
>       "N2":"SLAVE"
>     },
>     "MyDB_1":{
>       "N2":"MASTER",
>       "N3":"SLAVE"
>     },
>     "MyDB_2":{
>       "N1":"SLAVE",
>       "N3":"MASTER"
>     }
>   }
> }
> 
> Lets say one wants to take backup of these databases but run only the
> SLAVEs. One can define the back up task and launch 3 back up tasks (one for
> each partition) only on SLAVEs.
> 
> What we have currently works perfectly for this scenario. One has to simply
> define the target resource and state for the backup tasks and they will be
> launched in appropriate place. So in this scenario, back task for
> partitions 0,1,2 will be launched at N2, N3, and N1.
> 
> But what if the tasks dont have any target resource and can be run on any
> node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> 
> We should decouple the logic of where a task is placed from the logic of
> distributing the tasks. For example, we can abstract out the placement
> constraint from the rebalancer logic. So we can have a placement provider
> that computes placement randomly and one that computes placement based on
> another resource. Probably another one that computes placement based on
> data locality.
> 
> What is the right way to approach this ?
> 
> thanks,
> Kishore G
> 
> 
> On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <ne...@gmail.com> wrote:
> 
> > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> >
> > Thanks,
> > Jason
> >
> >
> > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use TaskRebalancer but not able to understand how it
> > works,
> > > is there any example I can try?
> > >
> > > thanks,
> > > Kishore G
> > >
> >

Re: TaskRebalancer

Posted by kishore g <g....@gmail.com>.

Thanks Jason, I was looking at the rebalancer. Looks like target resource
is mandatory. What do you suggest is the right way to make target resource
optional.

This is my understanding of what task rebalancer is doing today.

It assumes that the system is already hosting a resource something like a
database, index etc. Now one can use the task framework to launch arbitrary
tasks on nodes hosting these resources. For example lets say there is a
database MyDB with 3 partitions and 2 replicas and using Master Slave state
model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
this

{
  "id":"MyDB",
  "mapFields":{
    "MyDB_0":{
      "N1":"MASTER",
      "N2":"SLAVE"
    },
    "MyDB_1":{
      "N2":"MASTER",
      "N3":"SLAVE"
    },
    "MyDB_2":{
      "N1":"SLAVE",
      "N3":"MASTER"
    }
  }
}

Lets say one wants to take backup of these databases but run only the
SLAVEs. One can define the back up task and launch 3 back up tasks (one for
each partition) only on SLAVEs.

What we have currently works perfectly for this scenario. One has to simply
define the target resource and state for the backup tasks and they will be
launched in appropriate place. So in this scenario, back task for
partitions 0,1,2 will be launched at N2, N3, and N1.

But what if the tasks dont have any target resource and can be run on any
node N1 N2 or N3 and the only requirement is distribute the tasks evenly.

We should decouple the logic of where a task is placed from the logic of
distributing the tasks. For example, we can abstract out the placement
constraint from the rebalancer logic. So we can have a placement provider
that computes placement randomly and one that computes placement based on
another resource. Probably another one that computes placement based on
data locality.

What is the right way to approach this ?

thanks,
Kishore G

On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <ne...@gmail.com> wrote:

> TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
>
> Thanks,
> Jason
>
>
> On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to use TaskRebalancer but not able to understand how it
> works,
> > is there any example I can try?
> >
> > thanks,
> > Kishore G
> >
>

Re: TaskRebalancer

Posted by kishore g <g....@gmail.com>.

Thanks Jason, I was looking at the rebalancer. Looks like target resource
is mandatory. What do you suggest is the right way to make target resource
optional.

This is my understanding of what task rebalancer is doing today.

It assumes that the system is already hosting a resource something like a
database, index etc. Now one can use the task framework to launch arbitrary
tasks on nodes hosting these resources. For example lets say there is a
database MyDB with 3 partitions and 2 replicas and using Master Slave state
model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
this

{
  "id":"MyDB",
  "mapFields":{
    "MyDB_0":{
      "N1":"MASTER",
      "N2":"SLAVE"
    },
    "MyDB_1":{
      "N2":"MASTER",
      "N3":"SLAVE"
    },
    "MyDB_2":{
      "N1":"SLAVE",
      "N3":"MASTER"
    }
  }
}

Lets say one wants to take backup of these databases but run only the
SLAVEs. One can define the back up task and launch 3 back up tasks (one for
each partition) only on SLAVEs.

What we have currently works perfectly for this scenario. One has to simply
define the target resource and state for the backup tasks and they will be
launched in appropriate place. So in this scenario, back task for
partitions 0,1,2 will be launched at N2, N3, and N1.

But what if the tasks dont have any target resource and can be run on any
node N1 N2 or N3 and the only requirement is distribute the tasks evenly.

We should decouple the logic of where a task is placed from the logic of
distributing the tasks. For example, we can abstract out the placement
constraint from the rebalancer logic. So we can have a placement provider
that computes placement randomly and one that computes placement based on
another resource. Probably another one that computes placement based on
data locality.

What is the right way to approach this ?

thanks,
Kishore G

On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <ne...@gmail.com> wrote:

> TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
>
> Thanks,
> Jason
>
>
> On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to use TaskRebalancer but not able to understand how it
> works,
> > is there any example I can try?
> >
> > thanks,
> > Kishore G
> >
>

Re: TaskRebalancer

Posted by Zhen Zhang <ne...@gmail.com>.

TestTaskRebalancer and TestTaskRebalancerStopResume are examples.

Thanks,
Jason


On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com> wrote:

> Hi,
>
> I am trying to use TaskRebalancer but not able to understand how it works,
> is there any example I can try?
>
> thanks,
> Kishore G
>

Re: TaskRebalancer

Posted by Zhen Zhang <ne...@gmail.com>.

TestTaskRebalancer and TestTaskRebalancerStopResume are examples.

Thanks,
Jason


On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g....@gmail.com> wrote:

> Hi,
>
> I am trying to use TaskRebalancer but not able to understand how it works,
> is there any example I can try?
>
> thanks,
> Kishore G
>