You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Puneet Zaroo <pu...@gmail.com> on 2013/02/22 02:22:56 UTC

A few questions about helix.

I am a helix newbie. I have read the paper and the wiki pages and am
just starting to get familiar with the source code. I had a few
questions :

1) Is it possible to interpose on Primary selection. I.e. instead of
relying completely on Helix to select a Primary, is it possible to
implement a voting based protocol, where the replicas have a say in
who becomes the next primary. One possible way would be to have a
state "LEADER_ELECTION", in which the replicas do the voting, and
finally just the winner sets the ideal state with itself as the
PRIMARY.

Are there any gotchas in what I outlined above, or is there a
completely alternative and better way of doing this ?

2) Is it possible to throttle state transitions. E.g. If a node goes
offline, the replicas hosted on it should not be transferred to a new
node immediately; but in a throttled manner.

3) When is a spectator informed of the new ExternalView ? Is it when
currentState becomes equal to the idealState, or are they informed on
all state changes due to each state transition.

thanks,
- Puneet

Re: A few questions about helix.

Posted by kishore g <g....@gmail.com>.

Hi Puneet,

CustomCodeInvoker can run either in controller or participant, but the more
appealing use case is running it in participant.

I had implemented a way for the controller to query the participants before
calculating the ideal state, i had to remove it because one of the
libraries we used was not apache license. We used it for exactly the same
requirement of chosing the master, but instead of asking the participants
to run the election we ask them to update their SCN in ZK and based on that
we re order the preference list in idealstate dynamically.
Another reason why i removed it was even though the idea was good, i dint
like the implementation.
Given that you have a similar requirement, its probably a good idea to
brainstorm on multiple solutions and come up with an elegant solution.
I still like solving it via state machine abstraction like LEADER_ELECTION,
MASTER_READY.

Another design principle of Helix is controller not talking to participants
directly, this will work and probably be fast for small clusters but if the
size of cluster because large controller will become the bottle neck. We
want to use the push/pull model where controller pushes to ZK and
participants pull it from ZK. It allows us the solution to be fault
tolerant and extensible. We have put in lot of enhancements in Helix use ZK
in an optimal way.

I see that you have started another thread for creating slave after some
time. Lets continue the discussion in that thread.

thanks,
Kishore G







On Sun, Feb 24, 2013 at 3:47 PM, Puneet Zaroo <pu...@gmail.com> wrote:

> Kishore,
> Thanks for the helpful and detailed answers once again.
>
> On Sun, Feb 24, 2013 at 8:45 AM, kishore g <g....@gmail.com> wrote:
> > 3) Regarding overhead in case of too many spectators.
> > Do you mean over head in terms of  controller informing the spectator ?
> > Controller does not communicate directly with the spectator. All
> > communication is via zookeeper. Its more like a push/pull model where
> > controller pushes to ZK and spectators pull from ZK. This is an important
> > difference from other systems where controllers communicate directly with
> > other  components in the system. This allows us to scale the system and
> not
> > be bottle necked by controller. Eventually ZK might be a bottle neck but
> for
> > spectators we can easily scale reads on ZK by adding more ZK observers.
> In
> > fact, if the system as lot of spectators its better to connect only to ZK
> > observers. Apart from Helix has group commit feature where transitions
> are
> > grouped together where reduces the number of notifications to spectators.
> >
>
> Thanks for the clarification.
>
> > 2) We dont have the feature to wait for configurable time before
> selecting
> > another slave partition. We have been asked for this feature many times,
> we
> > should probably add it :-).  However, we do have another feature which
> might
> > actually be useful and more elegant. You can pause/unpause the
> controller.
> > When the controller is paused no transitions will occur in the system. Is
> > this something that will be useful? The pause/unpause is a cluster level.
> >
>
> The global pause does not work for us, as we want other transitions,
> e.g. slave -> master to keep on happening.
> The requirement in more detail is this.
>
> When a node hosting a partition in the SLAVE state  becomes
> unreachable, the partition should not be immediately assigned to a new
> node in the OFFLINE state. Lets say there is an additional state into
> which the partition can go, which is the DOWN state; but the partition
> is not reassigned. It stays on the same node for a configurable
> timeout. If the node comes back up before the timeout expires, the
> partition transitions back to the SLAVE state on the same node. If the
> node remains down past the timeout, the partition is reassigned to a
> new node and enters the OFFLINE state.
>
> How much work would it be to add such support in Helix ?
>
>
> > 1) CustoomcodeInvoker example
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=blob;f=helix-core/src/test/java/org/apache/helix/integration/TestHelixCustomCodeRunner.java;h=9bf79b8b34c14b7ce1e3fc45a45ceb19fdac4874;hb=437eb42e
> >
> A very newbie question perhaps.
> Does the CustomCondeInvoker run on the controller or the participant ?
> I initially thought it runs on the controller and allows one to extend
> the controller. But it seems it runs on the participants, but I maybe
> wrong.
>
> > regarding LEADER_ELECTION state model. I see what you mean, this is
> actually
> > a very nice and cool idea. I got the part until all participants getting
> > into LEADER_ELECTION state and one of them is selected as the master.
> What
> > happens after that?
> >
> > a. what will be the outcome of this transition SLAVE-->LEADER_ELECTION on
> > each participant ?
> > b. What will be the new idealstate which will allow one of the
> participants
> > to become MASTER and others SLAVE.
> >
>
> These are pretty preliminary ideas on my part and perhaps there are
> better ways of doing this in Helix. I was thinking the participant
> elected as the "MASTER" sets the idealState in which it specifies
> itself as the MASTER for that partition and the other nodes as
> "SLAVES". So, this participant undergoes a "LEADER_ELECTION" ->
> "MASTER" transition while the other participants do a
> "LEADER_ELECTION" -> "SLAVE" transition, with the transitions being
> sent out by the controller. Only the idealState is being specified by
> the winning participant. Again, not very sure if this is practical.
>
> > Another feature I thought about sometime back is conditional transition.
> > Basically have a transition that can have two outcomes, so in this case
> we
> > can have something like LEADER_ELECTION -> MASTER_READY, SLAVE and then
> do
> > the election in that transition and either go to MASTER_READY or go back
> to
> > SLAVE state. Helix can then promote MASTER_READY to MASTER. We might need
> > some changes in Helix but looks doable. We should file a jira for this
> > feature and  track this discussion .
> >
>
> The conditional transition idea also sounds promising. The only thing
> to consider is how would the controller know which participant to pick
> as MASTER_READY, as this selection depends on some information
> available on the participants themselves.  Is there a way for the
> controller to query the participants before calculating the
> IdealState. If its possible to do so, the conditional transition idea
> seems elegant.
>
> Thanks again for the engaging discussion.
> - Puneet
>
> >
> >
> >
> > On Sat, Feb 23, 2013 at 7:22 PM, Puneet Zaroo <pu...@gmail.com>
> wrote:
> >>
> >> Kishore,
> >> Thanks for the detailed reply.
> >> Please see further comments inline.
> >>
> >> >
> >> > 3) Spectator is informed of the changes due to each state transition.
> >> >
> >>
> >> OK. Will that not cause a lot of overhead if there are a lot of
> >> Spectators in the system. Or was the rationale that there will be just
> >> a few spectators in the system.
> >>
> >> > 2) Yes it is possible to throttle the state transitions in a
> controlled
> >> > manner. You can basically specify the max number of transitions that
> can
> >> > occur at a resource, instance, instanceGroup, Cluster level. Helix
> will
> >> > ensure that none of those constraints are violated.
> >> >
> >>
> >> What I had in mind was throttling based on time and not the number of
> >> events. I.e. if a slave partition is lost, then the controller should
> >> wait for some configurable time before selecting another slave
> >> partition. This is to handle the case where a node is rebooting and we
> >> do not want its partitions to be moved to a new node immediately.
> >>
> >> > 1) Interpose Primary selection, yes it is possible  implement a custom
> >> > primary selection algorithm. Here is how we achieve that in LinkedIn
> >> >
> >> > a) A separate entity watches the ExternalView and as soon as it finds
> >> > out
> >> > there is no primary for a partition, it can do the leader election and
> >> > set
> >> > the idealstate. You can do this using the CustomCodeInvoker option
> which
> >> > ensures only one process watches the external view and computes the
> new
> >> > primary and sets the idealstate.
> >> >
> >> > Your suggestion of LEADER_ELECTION state sounds interesting. Can you
> >> > elaborate a bit more on the state machine ( states and transitions and
> >> > constraints). How will they get into this state?.
> >> >
> >>
> >> Are there any examples of how to use the CustomCodeInvoker ?
> >>
> >> Regarding a separate entity watching the ExternalView. Maybe I did not
> >> follow this fully, but the external entity looks similar to the
> >> controller; so I am not sure if this would solve the particular
> >> problem.
> >>
> >> We actually want the participants to take part in the decision of who
> >> should become the next Primary or Master. I havent thought this
> >> through completely, but one way could be to add a state
> >> "LEADER_ELECTION" between the states "SLAVE" and "MASTER". In the
> >> "LEADER_ELECTION" state the participants communicate with each other
> >> and decide who should be the next Master, and the participant elected
> >> as the next "Master" sets the IdealState.  This is fully auto mode,
> >> except for one transition "LEADER_ELECTION" -> "MASTER" which is
> >> custom.
> >> Perhaps there are simpler ways of doing this.
> >>
> >> thanks,
> >> - Puneet
> >>
> >>
> >>
> >> > On Thu, Feb 21, 2013 at 5:22 PM, Puneet Zaroo <pu...@gmail.com>
> >> > wrote:
> >> >>
> >> >> I am a helix newbie. I have read the paper and the wiki pages and am
> >> >> just starting to get familiar with the source code. I had a few
> >> >> questions :
> >> >>
> >> >> 1) Is it possible to interpose on Primary selection. I.e. instead of
> >> >> relying completely on Helix to select a Primary, is it possible to
> >> >> implement a voting based protocol, where the replicas have a say in
> >> >> who becomes the next primary. One possible way would be to have a
> >> >> state "LEADER_ELECTION", in which the replicas do the voting, and
> >> >> finally just the winner sets the ideal state with itself as the
> >> >> PRIMARY.
> >> >>
> >> >> Are there any gotchas in what I outlined above, or is there a
> >> >> completely alternative and better way of doing this ?
> >> >>
> >> >> 2) Is it possible to throttle state transitions. E.g. If a node goes
> >> >> offline, the replicas hosted on it should not be transferred to a new
> >> >> node immediately; but in a throttled manner.
> >> >>
> >> >> 3) When is a spectator informed of the new ExternalView ? Is it when
> >> >> currentState becomes equal to the idealState, or are they informed on
> >> >> all state changes due to each state transition.
> >> >>
> >> >> thanks,
> >> >> - Puneet
> >> >
> >> >
> >
> >
>

Re: A few questions about helix.

Posted by Puneet Zaroo <pu...@gmail.com>.

Kishore,
Thanks for the helpful and detailed answers once again.

On Sun, Feb 24, 2013 at 8:45 AM, kishore g <g....@gmail.com> wrote:
> 3) Regarding overhead in case of too many spectators.
> Do you mean over head in terms of  controller informing the spectator ?
> Controller does not communicate directly with the spectator. All
> communication is via zookeeper. Its more like a push/pull model where
> controller pushes to ZK and spectators pull from ZK. This is an important
> difference from other systems where controllers communicate directly with
> other  components in the system. This allows us to scale the system and not
> be bottle necked by controller. Eventually ZK might be a bottle neck but for
> spectators we can easily scale reads on ZK by adding more ZK observers. In
> fact, if the system as lot of spectators its better to connect only to ZK
> observers. Apart from Helix has group commit feature where transitions are
> grouped together where reduces the number of notifications to spectators.
>

Thanks for the clarification.

> 2) We dont have the feature to wait for configurable time before selecting
> another slave partition. We have been asked for this feature many times, we
> should probably add it :-).  However, we do have another feature which might
> actually be useful and more elegant. You can pause/unpause the controller.
> When the controller is paused no transitions will occur in the system. Is
> this something that will be useful? The pause/unpause is a cluster level.
>

The global pause does not work for us, as we want other transitions,
e.g. slave -> master to keep on happening.
The requirement in more detail is this.

When a node hosting a partition in the SLAVE state  becomes
unreachable, the partition should not be immediately assigned to a new
node in the OFFLINE state. Lets say there is an additional state into
which the partition can go, which is the DOWN state; but the partition
is not reassigned. It stays on the same node for a configurable
timeout. If the node comes back up before the timeout expires, the
partition transitions back to the SLAVE state on the same node. If the
node remains down past the timeout, the partition is reassigned to a
new node and enters the OFFLINE state.

How much work would it be to add such support in Helix ?


> 1) CustoomcodeInvoker example
> https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=blob;f=helix-core/src/test/java/org/apache/helix/integration/TestHelixCustomCodeRunner.java;h=9bf79b8b34c14b7ce1e3fc45a45ceb19fdac4874;hb=437eb42e
>
A very newbie question perhaps.
Does the CustomCondeInvoker run on the controller or the participant ?
I initially thought it runs on the controller and allows one to extend
the controller. But it seems it runs on the participants, but I maybe
wrong.

> regarding LEADER_ELECTION state model. I see what you mean, this is actually
> a very nice and cool idea. I got the part until all participants getting
> into LEADER_ELECTION state and one of them is selected as the master. What
> happens after that?
>
> a. what will be the outcome of this transition SLAVE-->LEADER_ELECTION on
> each participant ?
> b. What will be the new idealstate which will allow one of the participants
> to become MASTER and others SLAVE.
>

These are pretty preliminary ideas on my part and perhaps there are
better ways of doing this in Helix. I was thinking the participant
elected as the "MASTER" sets the idealState in which it specifies
itself as the MASTER for that partition and the other nodes as
"SLAVES". So, this participant undergoes a "LEADER_ELECTION" ->
"MASTER" transition while the other participants do a
"LEADER_ELECTION" -> "SLAVE" transition, with the transitions being
sent out by the controller. Only the idealState is being specified by
the winning participant. Again, not very sure if this is practical.

> Another feature I thought about sometime back is conditional transition.
> Basically have a transition that can have two outcomes, so in this case we
> can have something like LEADER_ELECTION -> MASTER_READY, SLAVE and then do
> the election in that transition and either go to MASTER_READY or go back to
> SLAVE state. Helix can then promote MASTER_READY to MASTER. We might need
> some changes in Helix but looks doable. We should file a jira for this
> feature and  track this discussion .
>

The conditional transition idea also sounds promising. The only thing
to consider is how would the controller know which participant to pick
as MASTER_READY, as this selection depends on some information
available on the participants themselves.  Is there a way for the
controller to query the participants before calculating the
IdealState. If its possible to do so, the conditional transition idea
seems elegant.

Thanks again for the engaging discussion.
- Puneet

>
>
>
> On Sat, Feb 23, 2013 at 7:22 PM, Puneet Zaroo <pu...@gmail.com> wrote:
>>
>> Kishore,
>> Thanks for the detailed reply.
>> Please see further comments inline.
>>
>> >
>> > 3) Spectator is informed of the changes due to each state transition.
>> >
>>
>> OK. Will that not cause a lot of overhead if there are a lot of
>> Spectators in the system. Or was the rationale that there will be just
>> a few spectators in the system.
>>
>> > 2) Yes it is possible to throttle the state transitions in a controlled
>> > manner. You can basically specify the max number of transitions that can
>> > occur at a resource, instance, instanceGroup, Cluster level. Helix will
>> > ensure that none of those constraints are violated.
>> >
>>
>> What I had in mind was throttling based on time and not the number of
>> events. I.e. if a slave partition is lost, then the controller should
>> wait for some configurable time before selecting another slave
>> partition. This is to handle the case where a node is rebooting and we
>> do not want its partitions to be moved to a new node immediately.
>>
>> > 1) Interpose Primary selection, yes it is possible  implement a custom
>> > primary selection algorithm. Here is how we achieve that in LinkedIn
>> >
>> > a) A separate entity watches the ExternalView and as soon as it finds
>> > out
>> > there is no primary for a partition, it can do the leader election and
>> > set
>> > the idealstate. You can do this using the CustomCodeInvoker option which
>> > ensures only one process watches the external view and computes the new
>> > primary and sets the idealstate.
>> >
>> > Your suggestion of LEADER_ELECTION state sounds interesting. Can you
>> > elaborate a bit more on the state machine ( states and transitions and
>> > constraints). How will they get into this state?.
>> >
>>
>> Are there any examples of how to use the CustomCodeInvoker ?
>>
>> Regarding a separate entity watching the ExternalView. Maybe I did not
>> follow this fully, but the external entity looks similar to the
>> controller; so I am not sure if this would solve the particular
>> problem.
>>
>> We actually want the participants to take part in the decision of who
>> should become the next Primary or Master. I havent thought this
>> through completely, but one way could be to add a state
>> "LEADER_ELECTION" between the states "SLAVE" and "MASTER". In the
>> "LEADER_ELECTION" state the participants communicate with each other
>> and decide who should be the next Master, and the participant elected
>> as the next "Master" sets the IdealState.  This is fully auto mode,
>> except for one transition "LEADER_ELECTION" -> "MASTER" which is
>> custom.
>> Perhaps there are simpler ways of doing this.
>>
>> thanks,
>> - Puneet
>>
>>
>>
>> > On Thu, Feb 21, 2013 at 5:22 PM, Puneet Zaroo <pu...@gmail.com>
>> > wrote:
>> >>
>> >> I am a helix newbie. I have read the paper and the wiki pages and am
>> >> just starting to get familiar with the source code. I had a few
>> >> questions :
>> >>
>> >> 1) Is it possible to interpose on Primary selection. I.e. instead of
>> >> relying completely on Helix to select a Primary, is it possible to
>> >> implement a voting based protocol, where the replicas have a say in
>> >> who becomes the next primary. One possible way would be to have a
>> >> state "LEADER_ELECTION", in which the replicas do the voting, and
>> >> finally just the winner sets the ideal state with itself as the
>> >> PRIMARY.
>> >>
>> >> Are there any gotchas in what I outlined above, or is there a
>> >> completely alternative and better way of doing this ?
>> >>
>> >> 2) Is it possible to throttle state transitions. E.g. If a node goes
>> >> offline, the replicas hosted on it should not be transferred to a new
>> >> node immediately; but in a throttled manner.
>> >>
>> >> 3) When is a spectator informed of the new ExternalView ? Is it when
>> >> currentState becomes equal to the idealState, or are they informed on
>> >> all state changes due to each state transition.
>> >>
>> >> thanks,
>> >> - Puneet
>> >
>> >
>
>

Re: A few questions about helix.

Posted by kishore g <g....@gmail.com>.

3) Regarding overhead in case of too many spectators.
Do you mean over head in terms of  controller informing the spectator ?
 Controller does not communicate directly with the spectator. All
communication is via zookeeper. Its more like a push/pull model where
controller pushes to ZK and spectators pull from ZK. This is an important
difference from other systems where controllers communicate directly with
other  components in the system. This allows us to scale the system and not
be bottle necked by controller. Eventually ZK might be a bottle neck but
for spectators we can easily scale reads on ZK by adding more ZK observers.
In fact, if the system as lot of spectators its better to connect only to
ZK observers. Apart from Helix has group commit feature where transitions
are grouped together where reduces the number of notifications to
spectators.

2) We dont have the feature to wait for configurable time before selecting
another slave partition. We have been asked for this feature many times, we
should probably add it :-).  However, we do have another feature which
might actually be useful and more elegant. You can pause/unpause the
controller. When the controller is paused no transitions will occur in the
system. Is this something that will be useful? The pause/unpause is a
cluster level.

1) CustoomcodeInvoker example
https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=blob;f=helix-core/src/test/java/org/apache/helix/integration/TestHelixCustomCodeRunner.java;h=9bf79b8b34c14b7ce1e3fc45a45ceb19fdac4874;hb=437eb42e

regarding LEADER_ELECTION state model. I see what you mean, this is
actually a very nice and cool idea. I got the part until all participants
getting into LEADER_ELECTION state and one of them is selected as the
master. What happens after that?

a. what will be the outcome of this transition SLAVE-->LEADER_ELECTION on
each participant ?
b. What will be the new idealstate which will allow one of the participants
to become MASTER and others SLAVE.

Another feature I thought about sometime back is conditional transition.
Basically have a transition that can have two outcomes, so in this case we
can have something like LEADER_ELECTION -> MASTER_READY, SLAVE and then do
the election in that transition and either go to MASTER_READY or go back to
SLAVE state. Helix can then promote MASTER_READY to MASTER. We might need
some changes in Helix but looks doable. We should file a jira for this
feature and  track this discussion .

Thanks for coming up with these ideas.

thanks,
Kishore G

On Sat, Feb 23, 2013 at 7:22 PM, Puneet Zaroo <pu...@gmail.com> wrote:

> Kishore,
> Thanks for the detailed reply.
> Please see further comments inline.
>
> >
> > 3) Spectator is informed of the changes due to each state transition.
> >
>
> OK. Will that not cause a lot of overhead if there are a lot of
> Spectators in the system. Or was the rationale that there will be just
> a few spectators in the system.
>
> > 2) Yes it is possible to throttle the state transitions in a controlled
> > manner. You can basically specify the max number of transitions that can
> > occur at a resource, instance, instanceGroup, Cluster level. Helix will
> > ensure that none of those constraints are violated.
> >
>
> What I had in mind was throttling based on time and not the number of
> events. I.e. if a slave partition is lost, then the controller should
> wait for some configurable time before selecting another slave
> partition. This is to handle the case where a node is rebooting and we
> do not want its partitions to be moved to a new node immediately.
>
> > 1) Interpose Primary selection, yes it is possible  implement a custom
> > primary selection algorithm. Here is how we achieve that in LinkedIn
> >
> > a) A separate entity watches the ExternalView and as soon as it finds out
> > there is no primary for a partition, it can do the leader election and
> set
> > the idealstate. You can do this using the CustomCodeInvoker option which
> > ensures only one process watches the external view and computes the new
> > primary and sets the idealstate.
> >
> > Your suggestion of LEADER_ELECTION state sounds interesting. Can you
> > elaborate a bit more on the state machine ( states and transitions and
> > constraints). How will they get into this state?.
> >
>
> Are there any examples of how to use the CustomCodeInvoker ?
>
> Regarding a separate entity watching the ExternalView. Maybe I did not
> follow this fully, but the external entity looks similar to the
> controller; so I am not sure if this would solve the particular
> problem.
>
> We actually want the participants to take part in the decision of who
> should become the next Primary or Master. I havent thought this
> through completely, but one way could be to add a state
> "LEADER_ELECTION" between the states "SLAVE" and "MASTER". In the
> "LEADER_ELECTION" state the participants communicate with each other
> and decide who should be the next Master, and the participant elected
> as the next "Master" sets the IdealState.  This is fully auto mode,
> except for one transition "LEADER_ELECTION" -> "MASTER" which is
> custom.
> Perhaps there are simpler ways of doing this.
>
> thanks,
> - Puneet
>
>
>
> > On Thu, Feb 21, 2013 at 5:22 PM, Puneet Zaroo <pu...@gmail.com>
> wrote:
> >>
> >> I am a helix newbie. I have read the paper and the wiki pages and am
> >> just starting to get familiar with the source code. I had a few
> >> questions :
> >>
> >> 1) Is it possible to interpose on Primary selection. I.e. instead of
> >> relying completely on Helix to select a Primary, is it possible to
> >> implement a voting based protocol, where the replicas have a say in
> >> who becomes the next primary. One possible way would be to have a
> >> state "LEADER_ELECTION", in which the replicas do the voting, and
> >> finally just the winner sets the ideal state with itself as the
> >> PRIMARY.
> >>
> >> Are there any gotchas in what I outlined above, or is there a
> >> completely alternative and better way of doing this ?
> >>
> >> 2) Is it possible to throttle state transitions. E.g. If a node goes
> >> offline, the replicas hosted on it should not be transferred to a new
> >> node immediately; but in a throttled manner.
> >>
> >> 3) When is a spectator informed of the new ExternalView ? Is it when
> >> currentState becomes equal to the idealState, or are they informed on
> >> all state changes due to each state transition.
> >>
> >> thanks,
> >> - Puneet
> >
> >
>

Re: A few questions about helix.

Posted by Puneet Zaroo <pu...@gmail.com>.

Kishore,
Thanks for the detailed reply.
Please see further comments inline.

>
> 3) Spectator is informed of the changes due to each state transition.
>

OK. Will that not cause a lot of overhead if there are a lot of
Spectators in the system. Or was the rationale that there will be just
a few spectators in the system.

> 2) Yes it is possible to throttle the state transitions in a controlled
> manner. You can basically specify the max number of transitions that can
> occur at a resource, instance, instanceGroup, Cluster level. Helix will
> ensure that none of those constraints are violated.
>

What I had in mind was throttling based on time and not the number of
events. I.e. if a slave partition is lost, then the controller should
wait for some configurable time before selecting another slave
partition. This is to handle the case where a node is rebooting and we
do not want its partitions to be moved to a new node immediately.

> 1) Interpose Primary selection, yes it is possible  implement a custom
> primary selection algorithm. Here is how we achieve that in LinkedIn
>
> a) A separate entity watches the ExternalView and as soon as it finds out
> there is no primary for a partition, it can do the leader election and set
> the idealstate. You can do this using the CustomCodeInvoker option which
> ensures only one process watches the external view and computes the new
> primary and sets the idealstate.
>
> Your suggestion of LEADER_ELECTION state sounds interesting. Can you
> elaborate a bit more on the state machine ( states and transitions and
> constraints). How will they get into this state?.
>

Are there any examples of how to use the CustomCodeInvoker ?

Regarding a separate entity watching the ExternalView. Maybe I did not
follow this fully, but the external entity looks similar to the
controller; so I am not sure if this would solve the particular
problem.

We actually want the participants to take part in the decision of who
should become the next Primary or Master. I havent thought this
through completely, but one way could be to add a state
"LEADER_ELECTION" between the states "SLAVE" and "MASTER". In the
"LEADER_ELECTION" state the participants communicate with each other
and decide who should be the next Master, and the participant elected
as the next "Master" sets the IdealState.  This is fully auto mode,
except for one transition "LEADER_ELECTION" -> "MASTER" which is
custom.
Perhaps there are simpler ways of doing this.

thanks,
- Puneet

> On Thu, Feb 21, 2013 at 5:22 PM, Puneet Zaroo <pu...@gmail.com> wrote:
>>
>> I am a helix newbie. I have read the paper and the wiki pages and am
>> just starting to get familiar with the source code. I had a few
>> questions :
>>
>> 1) Is it possible to interpose on Primary selection. I.e. instead of
>> relying completely on Helix to select a Primary, is it possible to
>> implement a voting based protocol, where the replicas have a say in
>> who becomes the next primary. One possible way would be to have a
>> state "LEADER_ELECTION", in which the replicas do the voting, and
>> finally just the winner sets the ideal state with itself as the
>> PRIMARY.
>>
>> Are there any gotchas in what I outlined above, or is there a
>> completely alternative and better way of doing this ?
>>
>> 2) Is it possible to throttle state transitions. E.g. If a node goes
>> offline, the replicas hosted on it should not be transferred to a new
>> node immediately; but in a throttled manner.
>>
>> 3) When is a spectator informed of the new ExternalView ? Is it when
>> currentState becomes equal to the idealState, or are they informed on
>> all state changes due to each state transition.
>>
>> thanks,
>> - Puneet
>
>

Re: A few questions about helix.

Posted by kishore g <g....@gmail.com>.

Thanks Puneet for the interesting questions. I will answer them in a
slightly different order since there are multiple answers to 1).

3) Spectator is informed of the changes due to each state transition.

2) Yes it is possible to throttle the state transitions in a controlled
manner. You can basically specify the max number of transitions that can
occur at a resource, instance, instanceGroup, Cluster level. Helix will
ensure that none of those constraints are violated.

1) Interpose Primary selection, yes it is possible  implement a custom
primary selection algorithm. Here is how we achieve that in LinkedIn

a) A separate entity watches the ExternalView and as soon as it finds out
there is no primary for a partition, it can do the leader election and set
the idealstate. You can do this using the CustomCodeInvoker option which
ensures only one process watches the external view and computes the new
primary and sets the idealstate.

Your suggestion of LEADER_ELECTION state sounds interesting. Can you
elaborate a bit more on the state machine ( states and transitions and
constraints). How will they get into this state?.

Thanks,
Kishore G

On Thu, Feb 21, 2013 at 5:22 PM, Puneet Zaroo <pu...@gmail.com> wrote:

> I am a helix newbie. I have read the paper and the wiki pages and am
> just starting to get familiar with the source code. I had a few
> questions :
>
> 1) Is it possible to interpose on Primary selection. I.e. instead of
> relying completely on Helix to select a Primary, is it possible to
> implement a voting based protocol, where the replicas have a say in
> who becomes the next primary. One possible way would be to have a
> state "LEADER_ELECTION", in which the replicas do the voting, and
> finally just the winner sets the ideal state with itself as the
> PRIMARY.
>
> Are there any gotchas in what I outlined above, or is there a
> completely alternative and better way of doing this ?
>
> 2) Is it possible to throttle state transitions. E.g. If a node goes
> offline, the replicas hosted on it should not be transferred to a new
> node immediately; but in a throttled manner.
>
> 3) When is a spectator informed of the new ExternalView ? Is it when
> currentState becomes equal to the idealState, or are they informed on
> all state changes due to each state transition.
>
> thanks,
> - Puneet
>