You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Ming Fang <mi...@mac.com> on 2013/07/24 06:17:35 UTC

Custom Controller

Hi

I have a particular use case and wish to hear your expert opinion.
We currently have a MASTER and SLAVE cluster using Manual Placement, eg Node1 is always MASTER and Node2 is always SLAVE.

The problem is during startup, if Node2 starts up first then the Controller would transition it to MASTER.
I want to change this behavior such that the Controller would wait for Node1 to come up, even if Node2 is already up.
Another way to look at it is that the Controller would only make Node2 a MASTER if and only if there was a known failure in Node1.
One example of a known failure is for Node1 to come up, becomes MASTER, and then later crashes.

Question is where is the correct place to implement something like this? Should I... 
a)Extend GenericHelixController?
b)Implement a customer Rebalancer?
c)or something else.

Thanks for your help.
--ming

Re: Custom Controller

Posted by kishore g <g....@gmail.com>.

I am guessing you are using AUTO mode execution where you provide a
preference list for each partition. If thats the case, then every one when
it starts can simply check if its the preferred one( first in list), if its
not it can disable itself.

The preferred node will be the only one that starts up in enabled state and
when it becomes master it can enable the remaining nodes.

You can get the preference list from idealstate.


thanks,
Kishore G


On Wed, Jul 24, 2013 at 5:02 AM, Ming Fang <mi...@mac.com> wrote:

>
> On Jul 24, 2013, at 2:29 AM, kishore g <g....@gmail.com> wrote:
>
> > You can write a custom rebalancer. But its not clear to me how you would
> differentiate between node coming up for the first time v/s current master
> failing.
>
> I was going to store something in Zookeeper to record the fact that any
> Node was started.  We will have an end of day scheduled job to clear those
> records.
>
> > In general, its a good idea to avoid having logic that depends on the
> order of cluster events that happen in the cluster. This will make it
> difficult to scale the cluster or increase the number of partitions.
>
> I agree with you about scaling.  But our goal is not to dynamically scale
> out, rather is to create a MASTER/SLAVE set where the Nodes have a
> deterministic role.
> Let say we have 20 physical machines, with 10 newer then the other 10
> maybe due to our upgrade cycle.
> I want the MASTER to always run on the newer machines and the SLAVE always
> running on the older machines.
> Currently we have to schedule the MASTERS to come up first but that's not
> ideal.
>
> >
> > How about this, Node2 always starts in disabled mode (call
> admin.disableNode at startup before it connects to the cluster. After Node1
> becomes the master, as part of Slave-->Master transition enables Node2.
> This guarantees that Node2 always waits until it sees Node1 as Master.
> >
> > Will this work for you ?
>
> That might work.
> Although our code is identical between all the nodes.
> We use a json file to describe our cluster.
> Is there a way to disable a node using the json file?
>
> Thanks Kishore
> --ming

Re: Custom Controller

Posted by Ming Fang <mi...@mac.com>.

On Jul 24, 2013, at 2:29 AM, kishore g <g....@gmail.com> wrote:

> You can write a custom rebalancer. But its not clear to me how you would differentiate between node coming up for the first time v/s current master failing.

I was going to store something in Zookeeper to record the fact that any Node was started.  We will have an end of day scheduled job to clear those records.

> In general, its a good idea to avoid having logic that depends on the order of cluster events that happen in the cluster. This will make it difficult to scale the cluster or increase the number of partitions.

I agree with you about scaling.  But our goal is not to dynamically scale out, rather is to create a MASTER/SLAVE set where the Nodes have a deterministic role.
Let say we have 20 physical machines, with 10 newer then the other 10 maybe due to our upgrade cycle.
I want the MASTER to always run on the newer machines and the SLAVE always running on the older machines.
Currently we have to schedule the MASTERS to come up first but that's not ideal.

> 
> How about this, Node2 always starts in disabled mode (call admin.disableNode at startup before it connects to the cluster. After Node1 becomes the master, as part of Slave-->Master transition enables Node2. This guarantees that Node2 always waits until it sees Node1 as Master.
> 
> Will this work for you ?

That might work. 
Although our code is identical between all the nodes.
We use a json file to describe our cluster. 
Is there a way to disable a node using the json file?

Thanks Kishore
--ming

Re: Custom Controller

Posted by kishore g <g....@gmail.com>.

You can write a custom rebalancer. But its not clear to me how you would
differentiate between node coming up for the first time v/s current master
failing. In general, its a good idea to avoid having logic that depends on
the order of cluster events that happen in the cluster. This will make it
difficult to scale the cluster or increase the number of partitions.

How about this, Node2 always starts in disabled mode (call
admin.disableNode at startup before it connects to the cluster. After Node1
becomes the master, as part of Slave-->Master transition enables Node2.
This guarantees that Node2 always waits until it sees Node1 as Master.

Will this work for you ?

thanks,
Kishore G

On Tue, Jul 23, 2013 at 9:17 PM, Ming Fang <mi...@mac.com> wrote:

> Hi
>
> I have a particular use case and wish to hear your expert opinion.
> We currently have a MASTER and SLAVE cluster using Manual Placement, eg
> Node1 is always MASTER and Node2 is always SLAVE.
>
> The problem is during startup, if Node2 starts up first then the
> Controller would transition it to MASTER.
> I want to change this behavior such that the Controller would wait for
> Node1 to come up, even if Node2 is already up.
> Another way to look at it is that the Controller would only make Node2 a
> MASTER if and only if there was a known failure in Node1.
> One example of a known failure is for Node1 to come up, becomes MASTER,
> and then later crashes.
>
> Question is where is the correct place to implement something like this?
> Should I...
> a)Extend GenericHelixController?
> b)Implement a customer Rebalancer?
> c)or something else.
>
> Thanks for your help.
> --ming