You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Puneet Zaroo <pu...@gmail.com> on 2013/02/26 21:12:25 UTC

A state transition requirement.

Hi,

I wanted to know how to implement a specific state machine requirement in Helix.
Lets say a partition is in the state S2.

1. On an instance hosting it going down, the partition moves to state
S3 (but stays on the same instance).
2. If the instance comes back up before a timeout expires, the
partition moves to state S1 (stays on the same instance).
3. If the instance does not come back up before the timeout expiry,
the partition moves to state S0 (the initial state, on a different
instance picked up by the controller).

I have a few questions.

1. I believe in order to implement Requirement 1, I have to use the
CUSTOM rebalancing feature (as otherwise the partitions will get
assigned to a new node).
The wiki page says the following about the CUSTOM mode.

"Applications will have to implement an interface that Helix will
invoke when the cluster state changes. Within this callback, the
application can recompute the partition assignment mapping"

Which interface does one have to implement ?  I am assuming the
callbacks are triggered inside the controller.

 2. The transition from S2 -> S3 should not issue a callback on the
participant (instance) holding that partition. This is because the
participant is unavailable and so cannot execute the callback. Is this
doable ?

3. One way the time-out (Requirement 3) can be implemented is to
occasionally trigger IdealState calculation after a time-out and not
only on liveness changes. Does that sound doable ?

thanks,
- Puneet

Re: A state transition requirement.

Posted by Puneet Zaroo <pu...@gmail.com>.

Kishore,
Thanks for the helpful pointers as usual. You are correct that the
delayed transition will also delay the normal bootstrap of a node,
which is unacceptable. Thanks for pointing this out.

The idea I had in mind was to extend the notion of "REBALANCE_TIMER"
associated with each resource inside Helix to also support multiple
timers. Each timer would be associated with a node, and would
rebalance partitions hosted on it to other nodes. Supporting this
inside Helix would be too intrusive a change.

So, I could implement this outside of helix. I would need to implement
something similar to ZKHelixAdmin.rebalance(), but the rebalance()
would be a targeted rebalance that only rebalances partitions hosted
on a particular node.

thanks,
- Puneet

On Sun, Mar 3, 2013 at 8:47 PM, kishore g <g....@gmail.com> wrote:
> Hi Puneet,
>
> Your explanation is correct.
>
> Regarding the race condition, yes its possible that N1 finished its
> transition before receiving the cancellation. But then Helix will send a
> opposite transition  SLAVE to OFFLINE to N1. Thats the best we can do.
>
> Yes the support for conflicting transitions need to be built. Currently we
> only have the ability to manually cancel a transition. We need the support
> for canceling conflicting transitions. Lets file a JIRA and flush out the
> design.
>
> By the way, let me know about the other ideas you had. Its good to have
> multiple options and discuss the pros and cons. For example, the problem
> with delayed transition is it might add some delay during the cluster the
> cluster start up.
>
> thanks,
> Kishore G
>
>
>
>
>
>
> On Sun, Mar 3, 2013 at 8:02 PM, Puneet Zaroo <pu...@gmail.com> wrote:
>>
>> Kishore,
>>
>> Over the weekend I had some other thoughts of how to implement this.
>> But thinking some more about it, the timed transition idea looks like
>> the one that requires less intrusive changes to Helix. But please let
>> me step through it slowly to understand it more.
>>
>> Lets say node N0 goes down and the partitions on it are moved to N1.
>> Lets say  N1 receives the callback for the OFFLINE, SLAVE
>> transition... but this transition has a configurable delay in it, and
>> so does not complete immediately.
>>
>> In the meantime, node N0 comes back up, so the idealState is
>> recalculated in the CustomCodeInvoker to move the partitions of N0
>> back to it. This will make Helix cancel all other conflicting
>> transitions. Does this cancellation get propagated to N1 (which is
>> inside the OFFLINE, SLAVE transition). This seems a bit racy. What if
>> N1 had finished its transition just before receiving the cancellation.
>>
>> And if I understand correctly, the support for cancelling conflicting
>> transitions needs to be built.
>>
>> Thanks,
>> - Puneet
>>
>>
>>
>> On Fri, Mar 1, 2013 at 7:33 AM, kishore g <g....@gmail.com> wrote:
>> > Hi Puneet,
>> >
>> > Your understanding of AUTO mode is correct, no partitions will be ever
>> > moved
>> > by controller to a new node. And if node comes back up, it will still
>> > host
>> > the partitions it had before going down.
>> >
>> > This is how it works,
>> > in AUTO_REBALANCE Helix has full control so it will create new replicas,
>> > assign states as needed.
>> >
>> > in AUTO mode, it will only not create new replicas unless the idealstate
>> > is
>> > changed externally ( this can happen when you add new boxes).
>> >
>> >>>Or will the partition move only happen when some constraints are being
>> >>>violated. E.g. if the minimum number of replicas specified is "2",
>> >>>then a partition will be assigned to a new node if there are just 2
>> >>>replicas in the system and one of the nodes goes down.
>> >
>> > In AUTO mode, Helix will try to satisfy the constraints with existing
>> > replicas, so if you had assigned 2 replicas but 1 is down, it will see
>> > whats
>> > the best it can do with that 1 replica. thats where the priority of
>> > states
>> > come into picture, you specify master is more important than slave, so
>> > it
>> > will make that replica a master.
>> >
>> > In AUTO_REBALANCE it would create that replica on another node. This
>> > mode is
>> > generally suited for stateless systems where moving partition might
>> > simply
>> > mean moving processing and not data.
>> >
>> > Thanks,
>> > Kishore G
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Mar 1, 2013 at 6:33 AM, Puneet Zaroo <pu...@gmail.com>
>> > wrote:
>> >>
>> >> Kishore,
>> >> Thanks for the prompt reply once again.
>> >>
>> >> On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g....@gmail.com> wrote:
>> >> > Hi Puneet,
>> >> >
>> >> > I was about to reply to your previous email but I think its better to
>> >> > have a
>> >> > separate thread for each requirement.
>> >> >
>> >>
>> >> I agree.
>> >>
>> >> > We already have ability 3 to trigger rebalance occasionally. Take a
>> >> > look
>> >> > at
>> >> > timer tasks in controller. But i dont think that will be sufficient
>> >> > in
>> >> > your
>> >> > case.
>> >> >
>> >> > There is another way to solve this which is probably easier to reason
>> >> > about
>> >> > and elegant.  Basically we can introduce a notion of timed transition
>> >> > (
>> >> > we
>> >> > can discuss on how to implement this). What this means is when a node
>> >> > fails
>> >> > Helix can request another node to create the replica but with
>> >> > additional
>> >> > configuration that it should be scheduled after X timeout, we already
>> >> > have a
>> >> > notion of cancellable transitions built in. So if the old node comes
>> >> > up
>> >> > within that time helix can cancel the existence transition and put
>> >> > the
>> >> > old
>> >> > node back into SLAVE state.
>> >> >
>> >>
>> >> The timed transition idea does look promising. I will have to think a
>> >> bit more about it.
>> >> I had a few more mundane questions.
>> >> In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
>> >> responsible for object placement. But how does the DDS implement the
>> >> object placement support.
>> >>
>> >> The StateModelDefinition.Builder() class allows one to set the
>> >> "upperBound" and the "dynamicUpperBound". But how does one specify a
>> >> lower bound for a particular state ?
>> >>
>> >> Can one safely say that in the "AUTO" mode no partitions will be ever
>> >> moved  by the controller to a new node, except when the DDS so
>> >> desires.
>> >> If a node were to go down and come back up, it will still host the
>> >> partitions that it had before going down.
>> >> Or will the partition move only happen when some constraints are being
>> >> violated. E.g. if the minimum number of replicas specified is "2",
>> >> then a partition will be assigned to a new node if there are just 2
>> >> replicas in the system and one of the nodes goes down.
>> >>
>> >> Thanks again for your replies and for open-sourcing a great tool.
>> >>
>> >> > This design does not require any additional work to handle failures
>> >> > of
>> >> > controllers or participants and any modification to state model. Its
>> >> > basically adding the notion of timed transition that can be cancelled
>> >> > if
>> >> > needed.
>> >> >
>> >> > What do you think about the solution? Does it make sense ?
>> >> >
>> >> > Regarding implementation, this solution can be implemented in the
>> >> > current
>> >> > state by simply adding additional sleep in the transition (OFFLINE to
>> >> > SLAVE)
>> >> > and in the custom code invoker you can first send cancel message to
>> >> > the
>> >> > existing transition and then set the ideal state. But its possible
>> >> > for
>> >> > Helix
>> >> > to automatically cancel it. We need to have additional logic in Helix
>> >> > that
>> >> > if there is a pending transition and if we compute another transition
>> >> > that
>> >> > is opposite of that, we can automatically detect that its cancellable
>> >> > and
>> >> > cancel the existing transition. That will make it more generic and we
>> >> > can
>> >> > then simply have the transition delay set as a configuration.
>> >> >
>> >> > thanks,
>> >> > Kishore G
>> >> >
>> >> >
>> >> > On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo
>> >> > <pu...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I wanted to know how to implement a specific state machine
>> >> >> requirement
>> >> >> in
>> >> >> Helix.
>> >> >> Lets say a partition is in the state S2.
>> >> >>
>> >> >> 1. On an instance hosting it going down, the partition moves to
>> >> >> state
>> >> >> S3 (but stays on the same instance).
>> >> >> 2. If the instance comes back up before a timeout expires, the
>> >> >> partition moves to state S1 (stays on the same instance).
>> >> >> 3. If the instance does not come back up before the timeout expiry,
>> >> >> the partition moves to state S0 (the initial state, on a different
>> >> >> instance picked up by the controller).
>> >> >>
>> >> >> I have a few questions.
>> >> >>
>> >> >> 1. I believe in order to implement Requirement 1, I have to use the
>> >> >> CUSTOM rebalancing feature (as otherwise the partitions will get
>> >> >> assigned to a new node).
>> >> >> The wiki page says the following about the CUSTOM mode.
>> >> >>
>> >> >> "Applications will have to implement an interface that Helix will
>> >> >> invoke when the cluster state changes. Within this callback, the
>> >> >> application can recompute the partition assignment mapping"
>> >> >>
>> >> >> Which interface does one have to implement ?  I am assuming the
>> >> >> callbacks are triggered inside the controller.
>> >> >>
>> >> >>  2. The transition from S2 -> S3 should not issue a callback on the
>> >> >> participant (instance) holding that partition. This is because the
>> >> >> participant is unavailable and so cannot execute the callback. Is
>> >> >> this
>> >> >> doable ?
>> >> >>
>> >> >> 3. One way the time-out (Requirement 3) can be implemented is to
>> >> >> occasionally trigger IdealState calculation after a time-out and not
>> >> >> only on liveness changes. Does that sound doable ?
>> >> >>
>> >> >> thanks,
>> >> >> - Puneet
>> >> >
>> >> >
>> >
>> >
>
>

Re: A state transition requirement.

Posted by kishore g <g....@gmail.com>.

Hi Puneet,

Your explanation is correct.

Regarding the race condition, yes its possible that N1 finished its
transition before receiving the cancellation. But then Helix will send a
opposite transition  SLAVE to OFFLINE to N1. Thats the best we can do.

Yes the support for conflicting transitions need to be built. Currently we
only have the ability to manually cancel a transition. We need the support
for canceling conflicting transitions. Lets file a JIRA and flush out the
design.

By the way, let me know about the other ideas you had. Its good to have
multiple options and discuss the pros and cons. For example, the problem
with delayed transition is it might add some delay during the cluster the
cluster start up.

thanks,
Kishore G






On Sun, Mar 3, 2013 at 8:02 PM, Puneet Zaroo <pu...@gmail.com> wrote:

> Kishore,
>
> Over the weekend I had some other thoughts of how to implement this.
> But thinking some more about it, the timed transition idea looks like
> the one that requires less intrusive changes to Helix. But please let
> me step through it slowly to understand it more.
>
> Lets say node N0 goes down and the partitions on it are moved to N1.
> Lets say  N1 receives the callback for the OFFLINE, SLAVE
> transition... but this transition has a configurable delay in it, and
> so does not complete immediately.
>
> In the meantime, node N0 comes back up, so the idealState is
> recalculated in the CustomCodeInvoker to move the partitions of N0
> back to it. This will make Helix cancel all other conflicting
> transitions. Does this cancellation get propagated to N1 (which is
> inside the OFFLINE, SLAVE transition). This seems a bit racy. What if
> N1 had finished its transition just before receiving the cancellation.
>
> And if I understand correctly, the support for cancelling conflicting
> transitions needs to be built.
>
> Thanks,
> - Puneet
>
>
>
> On Fri, Mar 1, 2013 at 7:33 AM, kishore g <g....@gmail.com> wrote:
> > Hi Puneet,
> >
> > Your understanding of AUTO mode is correct, no partitions will be ever
> moved
> > by controller to a new node. And if node comes back up, it will still
> host
> > the partitions it had before going down.
> >
> > This is how it works,
> > in AUTO_REBALANCE Helix has full control so it will create new replicas,
> > assign states as needed.
> >
> > in AUTO mode, it will only not create new replicas unless the idealstate
> is
> > changed externally ( this can happen when you add new boxes).
> >
> >>>Or will the partition move only happen when some constraints are being
> >>>violated. E.g. if the minimum number of replicas specified is "2",
> >>>then a partition will be assigned to a new node if there are just 2
> >>>replicas in the system and one of the nodes goes down.
> >
> > In AUTO mode, Helix will try to satisfy the constraints with existing
> > replicas, so if you had assigned 2 replicas but 1 is down, it will see
> whats
> > the best it can do with that 1 replica. thats where the priority of
> states
> > come into picture, you specify master is more important than slave, so it
> > will make that replica a master.
> >
> > In AUTO_REBALANCE it would create that replica on another node. This
> mode is
> > generally suited for stateless systems where moving partition might
> simply
> > mean moving processing and not data.
> >
> > Thanks,
> > Kishore G
> >
> >
> >
> >
> >
> >
> > On Fri, Mar 1, 2013 at 6:33 AM, Puneet Zaroo <pu...@gmail.com>
> wrote:
> >>
> >> Kishore,
> >> Thanks for the prompt reply once again.
> >>
> >> On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g....@gmail.com> wrote:
> >> > Hi Puneet,
> >> >
> >> > I was about to reply to your previous email but I think its better to
> >> > have a
> >> > separate thread for each requirement.
> >> >
> >>
> >> I agree.
> >>
> >> > We already have ability 3 to trigger rebalance occasionally. Take a
> look
> >> > at
> >> > timer tasks in controller. But i dont think that will be sufficient in
> >> > your
> >> > case.
> >> >
> >> > There is another way to solve this which is probably easier to reason
> >> > about
> >> > and elegant.  Basically we can introduce a notion of timed transition
> (
> >> > we
> >> > can discuss on how to implement this). What this means is when a node
> >> > fails
> >> > Helix can request another node to create the replica but with
> additional
> >> > configuration that it should be scheduled after X timeout, we already
> >> > have a
> >> > notion of cancellable transitions built in. So if the old node comes
> up
> >> > within that time helix can cancel the existence transition and put the
> >> > old
> >> > node back into SLAVE state.
> >> >
> >>
> >> The timed transition idea does look promising. I will have to think a
> >> bit more about it.
> >> I had a few more mundane questions.
> >> In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
> >> responsible for object placement. But how does the DDS implement the
> >> object placement support.
> >>
> >> The StateModelDefinition.Builder() class allows one to set the
> >> "upperBound" and the "dynamicUpperBound". But how does one specify a
> >> lower bound for a particular state ?
> >>
> >> Can one safely say that in the "AUTO" mode no partitions will be ever
> >> moved  by the controller to a new node, except when the DDS so
> >> desires.
> >> If a node were to go down and come back up, it will still host the
> >> partitions that it had before going down.
> >> Or will the partition move only happen when some constraints are being
> >> violated. E.g. if the minimum number of replicas specified is "2",
> >> then a partition will be assigned to a new node if there are just 2
> >> replicas in the system and one of the nodes goes down.
> >>
> >> Thanks again for your replies and for open-sourcing a great tool.
> >>
> >> > This design does not require any additional work to handle failures of
> >> > controllers or participants and any modification to state model. Its
> >> > basically adding the notion of timed transition that can be cancelled
> if
> >> > needed.
> >> >
> >> > What do you think about the solution? Does it make sense ?
> >> >
> >> > Regarding implementation, this solution can be implemented in the
> >> > current
> >> > state by simply adding additional sleep in the transition (OFFLINE to
> >> > SLAVE)
> >> > and in the custom code invoker you can first send cancel message to
> the
> >> > existing transition and then set the ideal state. But its possible for
> >> > Helix
> >> > to automatically cancel it. We need to have additional logic in Helix
> >> > that
> >> > if there is a pending transition and if we compute another transition
> >> > that
> >> > is opposite of that, we can automatically detect that its cancellable
> >> > and
> >> > cancel the existing transition. That will make it more generic and we
> >> > can
> >> > then simply have the transition delay set as a configuration.
> >> >
> >> > thanks,
> >> > Kishore G
> >> >
> >> >
> >> > On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <puneetzaroo@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I wanted to know how to implement a specific state machine
> requirement
> >> >> in
> >> >> Helix.
> >> >> Lets say a partition is in the state S2.
> >> >>
> >> >> 1. On an instance hosting it going down, the partition moves to state
> >> >> S3 (but stays on the same instance).
> >> >> 2. If the instance comes back up before a timeout expires, the
> >> >> partition moves to state S1 (stays on the same instance).
> >> >> 3. If the instance does not come back up before the timeout expiry,
> >> >> the partition moves to state S0 (the initial state, on a different
> >> >> instance picked up by the controller).
> >> >>
> >> >> I have a few questions.
> >> >>
> >> >> 1. I believe in order to implement Requirement 1, I have to use the
> >> >> CUSTOM rebalancing feature (as otherwise the partitions will get
> >> >> assigned to a new node).
> >> >> The wiki page says the following about the CUSTOM mode.
> >> >>
> >> >> "Applications will have to implement an interface that Helix will
> >> >> invoke when the cluster state changes. Within this callback, the
> >> >> application can recompute the partition assignment mapping"
> >> >>
> >> >> Which interface does one have to implement ?  I am assuming the
> >> >> callbacks are triggered inside the controller.
> >> >>
> >> >>  2. The transition from S2 -> S3 should not issue a callback on the
> >> >> participant (instance) holding that partition. This is because the
> >> >> participant is unavailable and so cannot execute the callback. Is
> this
> >> >> doable ?
> >> >>
> >> >> 3. One way the time-out (Requirement 3) can be implemented is to
> >> >> occasionally trigger IdealState calculation after a time-out and not
> >> >> only on liveness changes. Does that sound doable ?
> >> >>
> >> >> thanks,
> >> >> - Puneet
> >> >
> >> >
> >
> >
>

Re: A state transition requirement.

Posted by Puneet Zaroo <pu...@gmail.com>.

Kishore,

Over the weekend I had some other thoughts of how to implement this.
But thinking some more about it, the timed transition idea looks like
the one that requires less intrusive changes to Helix. But please let
me step through it slowly to understand it more.

Lets say node N0 goes down and the partitions on it are moved to N1.
Lets say  N1 receives the callback for the OFFLINE, SLAVE
transition... but this transition has a configurable delay in it, and
so does not complete immediately.

In the meantime, node N0 comes back up, so the idealState is
recalculated in the CustomCodeInvoker to move the partitions of N0
back to it. This will make Helix cancel all other conflicting
transitions. Does this cancellation get propagated to N1 (which is
inside the OFFLINE, SLAVE transition). This seems a bit racy. What if
N1 had finished its transition just before receiving the cancellation.

And if I understand correctly, the support for cancelling conflicting
transitions needs to be built.

Thanks,
- Puneet



On Fri, Mar 1, 2013 at 7:33 AM, kishore g <g....@gmail.com> wrote:
> Hi Puneet,
>
> Your understanding of AUTO mode is correct, no partitions will be ever moved
> by controller to a new node. And if node comes back up, it will still host
> the partitions it had before going down.
>
> This is how it works,
> in AUTO_REBALANCE Helix has full control so it will create new replicas,
> assign states as needed.
>
> in AUTO mode, it will only not create new replicas unless the idealstate is
> changed externally ( this can happen when you add new boxes).
>
>>>Or will the partition move only happen when some constraints are being
>>>violated. E.g. if the minimum number of replicas specified is "2",
>>>then a partition will be assigned to a new node if there are just 2
>>>replicas in the system and one of the nodes goes down.
>
> In AUTO mode, Helix will try to satisfy the constraints with existing
> replicas, so if you had assigned 2 replicas but 1 is down, it will see whats
> the best it can do with that 1 replica. thats where the priority of states
> come into picture, you specify master is more important than slave, so it
> will make that replica a master.
>
> In AUTO_REBALANCE it would create that replica on another node. This mode is
> generally suited for stateless systems where moving partition might simply
> mean moving processing and not data.
>
> Thanks,
> Kishore G
>
>
>
>
>
>
> On Fri, Mar 1, 2013 at 6:33 AM, Puneet Zaroo <pu...@gmail.com> wrote:
>>
>> Kishore,
>> Thanks for the prompt reply once again.
>>
>> On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g....@gmail.com> wrote:
>> > Hi Puneet,
>> >
>> > I was about to reply to your previous email but I think its better to
>> > have a
>> > separate thread for each requirement.
>> >
>>
>> I agree.
>>
>> > We already have ability 3 to trigger rebalance occasionally. Take a look
>> > at
>> > timer tasks in controller. But i dont think that will be sufficient in
>> > your
>> > case.
>> >
>> > There is another way to solve this which is probably easier to reason
>> > about
>> > and elegant.  Basically we can introduce a notion of timed transition (
>> > we
>> > can discuss on how to implement this). What this means is when a node
>> > fails
>> > Helix can request another node to create the replica but with additional
>> > configuration that it should be scheduled after X timeout, we already
>> > have a
>> > notion of cancellable transitions built in. So if the old node comes up
>> > within that time helix can cancel the existence transition and put the
>> > old
>> > node back into SLAVE state.
>> >
>>
>> The timed transition idea does look promising. I will have to think a
>> bit more about it.
>> I had a few more mundane questions.
>> In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
>> responsible for object placement. But how does the DDS implement the
>> object placement support.
>>
>> The StateModelDefinition.Builder() class allows one to set the
>> "upperBound" and the "dynamicUpperBound". But how does one specify a
>> lower bound for a particular state ?
>>
>> Can one safely say that in the "AUTO" mode no partitions will be ever
>> moved  by the controller to a new node, except when the DDS so
>> desires.
>> If a node were to go down and come back up, it will still host the
>> partitions that it had before going down.
>> Or will the partition move only happen when some constraints are being
>> violated. E.g. if the minimum number of replicas specified is "2",
>> then a partition will be assigned to a new node if there are just 2
>> replicas in the system and one of the nodes goes down.
>>
>> Thanks again for your replies and for open-sourcing a great tool.
>>
>> > This design does not require any additional work to handle failures of
>> > controllers or participants and any modification to state model. Its
>> > basically adding the notion of timed transition that can be cancelled if
>> > needed.
>> >
>> > What do you think about the solution? Does it make sense ?
>> >
>> > Regarding implementation, this solution can be implemented in the
>> > current
>> > state by simply adding additional sleep in the transition (OFFLINE to
>> > SLAVE)
>> > and in the custom code invoker you can first send cancel message to the
>> > existing transition and then set the ideal state. But its possible for
>> > Helix
>> > to automatically cancel it. We need to have additional logic in Helix
>> > that
>> > if there is a pending transition and if we compute another transition
>> > that
>> > is opposite of that, we can automatically detect that its cancellable
>> > and
>> > cancel the existing transition. That will make it more generic and we
>> > can
>> > then simply have the transition delay set as a configuration.
>> >
>> > thanks,
>> > Kishore G
>> >
>> >
>> > On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <pu...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I wanted to know how to implement a specific state machine requirement
>> >> in
>> >> Helix.
>> >> Lets say a partition is in the state S2.
>> >>
>> >> 1. On an instance hosting it going down, the partition moves to state
>> >> S3 (but stays on the same instance).
>> >> 2. If the instance comes back up before a timeout expires, the
>> >> partition moves to state S1 (stays on the same instance).
>> >> 3. If the instance does not come back up before the timeout expiry,
>> >> the partition moves to state S0 (the initial state, on a different
>> >> instance picked up by the controller).
>> >>
>> >> I have a few questions.
>> >>
>> >> 1. I believe in order to implement Requirement 1, I have to use the
>> >> CUSTOM rebalancing feature (as otherwise the partitions will get
>> >> assigned to a new node).
>> >> The wiki page says the following about the CUSTOM mode.
>> >>
>> >> "Applications will have to implement an interface that Helix will
>> >> invoke when the cluster state changes. Within this callback, the
>> >> application can recompute the partition assignment mapping"
>> >>
>> >> Which interface does one have to implement ?  I am assuming the
>> >> callbacks are triggered inside the controller.
>> >>
>> >>  2. The transition from S2 -> S3 should not issue a callback on the
>> >> participant (instance) holding that partition. This is because the
>> >> participant is unavailable and so cannot execute the callback. Is this
>> >> doable ?
>> >>
>> >> 3. One way the time-out (Requirement 3) can be implemented is to
>> >> occasionally trigger IdealState calculation after a time-out and not
>> >> only on liveness changes. Does that sound doable ?
>> >>
>> >> thanks,
>> >> - Puneet
>> >
>> >
>
>

Re: A state transition requirement.

Posted by kishore g <g....@gmail.com>.

Hi Puneet,

Your understanding of AUTO mode is correct, no partitions will be ever
moved by controller to a new node. And if node comes back up, it will still
host the partitions it had before going down.

This is how it works,
in AUTO_REBALANCE Helix has full control so it will create new replicas,
assign states as needed.

in AUTO mode, it will only not create new replicas unless the idealstate is
changed externally ( this can happen when you add new boxes).

>>Or will the partition move only happen when some constraints are being
>>violated. E.g. if the minimum number of replicas specified is "2",
>>then a partition will be assigned to a new node if there are just 2
>>replicas in the system and one of the nodes goes down.

In AUTO mode, Helix will try to satisfy the constraints with existing
replicas, so if you had assigned 2 replicas but 1 is down, it will see
whats the best it can do with that 1 replica. thats where the priority of
states come into picture, you specify master is more important than slave,
so it will make that replica a master.

In AUTO_REBALANCE it would create that replica on another node. This mode
is generally suited for stateless systems where moving partition might
simply mean moving processing and not data.

Thanks,
Kishore G






On Fri, Mar 1, 2013 at 6:33 AM, Puneet Zaroo <pu...@gmail.com> wrote:

> Kishore,
> Thanks for the prompt reply once again.
>
> On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g....@gmail.com> wrote:
> > Hi Puneet,
> >
> > I was about to reply to your previous email but I think its better to
> have a
> > separate thread for each requirement.
> >
>
> I agree.
>
> > We already have ability 3 to trigger rebalance occasionally. Take a look
> at
> > timer tasks in controller. But i dont think that will be sufficient in
> your
> > case.
> >
> > There is another way to solve this which is probably easier to reason
> about
> > and elegant.  Basically we can introduce a notion of timed transition (
> we
> > can discuss on how to implement this). What this means is when a node
> fails
> > Helix can request another node to create the replica but with additional
> > configuration that it should be scheduled after X timeout, we already
> have a
> > notion of cancellable transitions built in. So if the old node comes up
> > within that time helix can cancel the existence transition and put the
> old
> > node back into SLAVE state.
> >
>
> The timed transition idea does look promising. I will have to think a
> bit more about it.
> I had a few more mundane questions.
> In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
> responsible for object placement. But how does the DDS implement the
> object placement support.
>
> The StateModelDefinition.Builder() class allows one to set the
> "upperBound" and the "dynamicUpperBound". But how does one specify a
> lower bound for a particular state ?
>
> Can one safely say that in the "AUTO" mode no partitions will be ever
> moved  by the controller to a new node, except when the DDS so
> desires.
> If a node were to go down and come back up, it will still host the
> partitions that it had before going down.
> Or will the partition move only happen when some constraints are being
> violated. E.g. if the minimum number of replicas specified is "2",
> then a partition will be assigned to a new node if there are just 2
> replicas in the system and one of the nodes goes down.
>
> Thanks again for your replies and for open-sourcing a great tool.
>
> > This design does not require any additional work to handle failures of
> > controllers or participants and any modification to state model. Its
> > basically adding the notion of timed transition that can be cancelled if
> > needed.
> >
> > What do you think about the solution? Does it make sense ?
> >
> > Regarding implementation, this solution can be implemented in the current
> > state by simply adding additional sleep in the transition (OFFLINE to
> SLAVE)
> > and in the custom code invoker you can first send cancel message to the
> > existing transition and then set the ideal state. But its possible for
> Helix
> > to automatically cancel it. We need to have additional logic in Helix
> that
> > if there is a pending transition and if we compute another transition
> that
> > is opposite of that, we can automatically detect that its cancellable and
> > cancel the existing transition. That will make it more generic and we can
> > then simply have the transition delay set as a configuration.
> >
> > thanks,
> > Kishore G
> >
> >
> > On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <pu...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I wanted to know how to implement a specific state machine requirement
> in
> >> Helix.
> >> Lets say a partition is in the state S2.
> >>
> >> 1. On an instance hosting it going down, the partition moves to state
> >> S3 (but stays on the same instance).
> >> 2. If the instance comes back up before a timeout expires, the
> >> partition moves to state S1 (stays on the same instance).
> >> 3. If the instance does not come back up before the timeout expiry,
> >> the partition moves to state S0 (the initial state, on a different
> >> instance picked up by the controller).
> >>
> >> I have a few questions.
> >>
> >> 1. I believe in order to implement Requirement 1, I have to use the
> >> CUSTOM rebalancing feature (as otherwise the partitions will get
> >> assigned to a new node).
> >> The wiki page says the following about the CUSTOM mode.
> >>
> >> "Applications will have to implement an interface that Helix will
> >> invoke when the cluster state changes. Within this callback, the
> >> application can recompute the partition assignment mapping"
> >>
> >> Which interface does one have to implement ?  I am assuming the
> >> callbacks are triggered inside the controller.
> >>
> >>  2. The transition from S2 -> S3 should not issue a callback on the
> >> participant (instance) holding that partition. This is because the
> >> participant is unavailable and so cannot execute the callback. Is this
> >> doable ?
> >>
> >> 3. One way the time-out (Requirement 3) can be implemented is to
> >> occasionally trigger IdealState calculation after a time-out and not
> >> only on liveness changes. Does that sound doable ?
> >>
> >> thanks,
> >> - Puneet
> >
> >
>

Re: A state transition requirement.

Posted by Puneet Zaroo <pu...@gmail.com>.

Kishore,
Thanks for the prompt reply once again.

On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g....@gmail.com> wrote:
> Hi Puneet,
>
> I was about to reply to your previous email but I think its better to have a
> separate thread for each requirement.
>

I agree.

> We already have ability 3 to trigger rebalance occasionally. Take a look at
> timer tasks in controller. But i dont think that will be sufficient in your
> case.
>
> There is another way to solve this which is probably easier to reason about
> and elegant.  Basically we can introduce a notion of timed transition ( we
> can discuss on how to implement this). What this means is when a node fails
> Helix can request another node to create the replica but with additional
> configuration that it should be scheduled after X timeout, we already have a
> notion of cancellable transitions built in. So if the old node comes up
> within that time helix can cancel the existence transition and put the old
> node back into SLAVE state.
>

The timed transition idea does look promising. I will have to think a
bit more about it.
I had a few more mundane questions.
In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
responsible for object placement. But how does the DDS implement the
object placement support.

The StateModelDefinition.Builder() class allows one to set the
"upperBound" and the "dynamicUpperBound". But how does one specify a
lower bound for a particular state ?

Can one safely say that in the "AUTO" mode no partitions will be ever
moved  by the controller to a new node, except when the DDS so
desires.
If a node were to go down and come back up, it will still host the
partitions that it had before going down.
Or will the partition move only happen when some constraints are being
violated. E.g. if the minimum number of replicas specified is "2",
then a partition will be assigned to a new node if there are just 2
replicas in the system and one of the nodes goes down.

Thanks again for your replies and for open-sourcing a great tool.

> This design does not require any additional work to handle failures of
> controllers or participants and any modification to state model. Its
> basically adding the notion of timed transition that can be cancelled if
> needed.
>
> What do you think about the solution? Does it make sense ?
>
> Regarding implementation, this solution can be implemented in the current
> state by simply adding additional sleep in the transition (OFFLINE to SLAVE)
> and in the custom code invoker you can first send cancel message to the
> existing transition and then set the ideal state. But its possible for Helix
> to automatically cancel it. We need to have additional logic in Helix that
> if there is a pending transition and if we compute another transition that
> is opposite of that, we can automatically detect that its cancellable and
> cancel the existing transition. That will make it more generic and we can
> then simply have the transition delay set as a configuration.
>
> thanks,
> Kishore G
>
>
> On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <pu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I wanted to know how to implement a specific state machine requirement in
>> Helix.
>> Lets say a partition is in the state S2.
>>
>> 1. On an instance hosting it going down, the partition moves to state
>> S3 (but stays on the same instance).
>> 2. If the instance comes back up before a timeout expires, the
>> partition moves to state S1 (stays on the same instance).
>> 3. If the instance does not come back up before the timeout expiry,
>> the partition moves to state S0 (the initial state, on a different
>> instance picked up by the controller).
>>
>> I have a few questions.
>>
>> 1. I believe in order to implement Requirement 1, I have to use the
>> CUSTOM rebalancing feature (as otherwise the partitions will get
>> assigned to a new node).
>> The wiki page says the following about the CUSTOM mode.
>>
>> "Applications will have to implement an interface that Helix will
>> invoke when the cluster state changes. Within this callback, the
>> application can recompute the partition assignment mapping"
>>
>> Which interface does one have to implement ?  I am assuming the
>> callbacks are triggered inside the controller.
>>
>>  2. The transition from S2 -> S3 should not issue a callback on the
>> participant (instance) holding that partition. This is because the
>> participant is unavailable and so cannot execute the callback. Is this
>> doable ?
>>
>> 3. One way the time-out (Requirement 3) can be implemented is to
>> occasionally trigger IdealState calculation after a time-out and not
>> only on liveness changes. Does that sound doable ?
>>
>> thanks,
>> - Puneet
>
>

Re: A state transition requirement.

Posted by kishore g <g....@gmail.com>.

Hi Puneet,

I was about to reply to your previous email but I think its better to have
a separate thread for each requirement.

We already have ability 3 to trigger rebalance occasionally. Take a look at
timer tasks in controller. But i dont think that will be sufficient in your
case.

There is another way to solve this which is probably easier to reason about
and elegant.  Basically we can introduce a notion of timed transition ( we
can discuss on how to implement this). What this means is when a node fails
Helix can request another node to create the replica but with additional
configuration that it should be scheduled after X timeout, we already have
a notion of cancellable transitions built in. So if the old node comes up
within that time helix can cancel the existence transition and put the old
node back into SLAVE state.

This design does not require any additional work to handle failures of
controllers or participants and any modification to state model. Its
basically adding the notion of timed transition that can be cancelled if
needed.

What do you think about the solution? Does it make sense ?

Regarding implementation, this solution can be implemented in the current
state by simply adding additional sleep in the transition (OFFLINE to
SLAVE) and in the custom code invoker you can first send cancel message to
the existing transition and then set the ideal state. But its possible for
Helix to automatically cancel it. We need to have additional logic in Helix
that if there is a pending transition and if we compute another transition
that is opposite of that, we can automatically detect that its cancellable
and cancel the existing transition. That will make it more generic and we
can then simply have the transition delay set as a configuration.

thanks,
Kishore G

On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <pu...@gmail.com>wrote:

> Hi,
>
> I wanted to know how to implement a specific state machine requirement in
> Helix.
> Lets say a partition is in the state S2.
>
> 1. On an instance hosting it going down, the partition moves to state
> S3 (but stays on the same instance).
> 2. If the instance comes back up before a timeout expires, the
> partition moves to state S1 (stays on the same instance).
> 3. If the instance does not come back up before the timeout expiry,
> the partition moves to state S0 (the initial state, on a different
> instance picked up by the controller).
>
> I have a few questions.
>
> 1. I believe in order to implement Requirement 1, I have to use the
> CUSTOM rebalancing feature (as otherwise the partitions will get
> assigned to a new node).
> The wiki page says the following about the CUSTOM mode.
>
> "Applications will have to implement an interface that Helix will
> invoke when the cluster state changes. Within this callback, the
> application can recompute the partition assignment mapping"
>
> Which interface does one have to implement ?  I am assuming the
> callbacks are triggered inside the controller.
>
>  2. The transition from S2 -> S3 should not issue a callback on the
> participant (instance) holding that partition. This is because the
> participant is unavailable and so cannot execute the callback. Is this
> doable ?
>
> 3. One way the time-out (Requirement 3) can be implemented is to
> occasionally trigger IdealState calculation after a time-out and not
> only on liveness changes. Does that sound doable ?
>
> thanks,
> - Puneet
>