You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by santosh gujar <sa...@gmail.com> on 2020/05/11 19:18:54 UTC

Long running jobs and node drain

Hello,

I am looking for some clues or inputs on how to achieve following

I am working on a service that involves running a statetful long running
jobs on a node. These long running jobs cannot be preempted and continue on
other nodes.

Problem Requirements :
1. In helix nomenclature, I let's say an helix partition P that involves J
number of such jobs running on a node. (N1)
2. When I put the node in a drain, I want helix to assign a new node to
this partition (P) is also started on the new node (N2).

3. N1 can be put out of service only when all running jobs (J) on it are
over, at this point only N2 will serve P request.

Questions :
1. Can drain process be modeled using helix?
2. If yes, Is there any recipe / pointers for a helix state model?
3. Is there any custom way to trigger state transitions? From
documentation, I gather that Helix controller in full auto mode, triggers
state transitions only when number of partitions change or cluster changes
(node addition or deletion)
3.I guess  spectator will be needed, to custom routing logic in such cases,
any pointers for the the same?

Thank You
Santosh

Re: Long running jobs and node drain

Posted by kishore g <g....@gmail.com>.

There is a way for the participant to invoke setRequestedState in Helix and
controller can trigger that if it does not violate the constraints

On Thu, May 21, 2020 at 4:53 AM santosh gujar <sa...@gmail.com>
wrote:

> Hello All,
>
> Any inputs on below.
>
> Thank You and appreciate your help.
>
> Regards,
> Santosh
>
> On Thu, May 14, 2020 at 2:47 PM santosh gujar <sa...@gmail.com>
> wrote:
>
>> Thanks a lot Lei,
>>
>> One last question on this topic,
>>
>> I gather from documentation that helix controller is the one that directs
>> state transitions in a greedy fashion. But this is a synchronous call, e.g.
>> in the example that we have been discussing,  the moment, the call returns
>> from UpToDrain(), the controller will call DrainToOffiline() immediately
>> and also update the states in Zookeeper accordingly. Is my understanding
>> correct?
>>
>> If yes, Is there anyway the transition can be asynchronous?  i.e.  i get
>> notified for up->drain transition, but drain->offline happens only when I
>> call some api on helix controller? e.g. in my case,  I would have to wait
>> via some kind of thread.wait() / sleep() until all other jobs are over. But
>> that could introduce some brittleness such that the process that is
>> handling the state transition cannot crash until all other jobs (which
>> could be running as separate processes) are finished. My preference would
>> be calling back an api to helix controller for further state transition
>> (drain->offline) for the partition.
>>
>> Thanks,
>> Santosh
>>
>>
>>
>> On Thu, May 14, 2020 at 1:28 AM Lei Xia <xi...@gmail.com> wrote:
>>
>>> Hi, Santosh
>>>
>>>   I meant the DRAIN-OFFLINE transition should be blocked. You can not
>>> block at up->drain, otherwise from Helix perspective the partition will be
>>> still in UP state, it won't bring new partition online.  The code logic
>>> could be something like below.
>>>
>>> class MyModel extends StateModel  {
>>> @Transition(from = "UP", to = "DRAIN")
>>>  public void UpToDrain(Message message, NotificationContext context) {
>>>   // you may disable some flags here to not take new jobs
>>>  }
>>>
>>> @Transition(from = "DRAIN", to = "OFFLINE")
>>>  public void DrainToOffline(Message message, NotificationContext
>>> context) {
>>>    wait (all job completed);
>>>   // additional cleanup works.
>>>  }
>>>
>>> @Transition(from = "OFFLINE", to = "UP")
>>>  public void OfflineToUP(Message message, NotificationContext context) {
>>>   // get ready to take new jobs.
>>>  }
>>>
>>> On Wed, May 13, 2020 at 11:24 AM santosh gujar <sa...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Thanks a lot Lei, I assume by blocking you mean , blocking on in a
>>>> method call that is called
>>>>
>>>> e.g. following pseudo code.
>>>>
>>>> class MyModel extends StateModel  {
>>>> @Transition(from = "UP", to = "DRAIN")
>>>> public void offlineToSlave(Message message, NotificationContext
>>>> context) {
>>>> //don't return until long long running job is running
>>>> }
>>>>
>>>> On Wed, May 13, 2020 at 10:40 PM Lei Xia <xi...@gmail.com> wrote:
>>>>
>>>>> Hi, Santosh
>>>>>
>>>>>   Thanks for explaining your case in detail. In this case, I would
>>>>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>>>>> the constraint of your model to limit # of replica in UP state to be 1,
>>>>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>>>>> you are ready to drain an instance, disable the instance first, then Helix
>>>>> will transit all partitions (jobs) on that instance to DRAIN and then
>>>>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>>>>> completed.  On the other hand, once the old partition is in DRAIN state,
>>>>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>>>>
>>>>>
>>>>>
>>>>> Lei
>>>>>
>>>>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <
>>>>> santosh.techie@gmail.com> wrote:
>>>>>
>>>>>> Hi Hunter,
>>>>>>
>>>>>> For various limitations and constraints at this moment, I cannot go
>>>>>> down the path of Task Framework.
>>>>>>
>>>>>> Thanks,
>>>>>> Santosh
>>>>>>
>>>>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Alternative idea:
>>>>>>>
>>>>>>> Have you considered using Task Framework's targeted jobs for this
>>>>>>> use case? You could make the jobs long-running, and this way, you save
>>>>>>> yourself the trouble of having to implement the routing layer (simply
>>>>>>> specifying which partition to target in your JobConfig would do it).
>>>>>>>
>>>>>>> Task Framework doesn't actively terminate running threads on the
>>>>>>> worker (Participant) nodes, so you could achieve the effect of "draining"
>>>>>>> the node by letting previously assigned tasks to finish by not actively
>>>>>>> canceling them in your cancel() logic.
>>>>>>>
>>>>>>> Hunter
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <
>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Lei,
>>>>>>>>
>>>>>>>> Thanks a lot for your time and response.
>>>>>>>>
>>>>>>>> Some more context about helix partition that i mentioned in my
>>>>>>>> email earlier.
>>>>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>>>>
>>>>>>>> " what exactly you need to do to bring a job from OFFLINE to
>>>>>>>> STARTUP?"
>>>>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>>>>> model can give me such information.
>>>>>>>>
>>>>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring
>>>>>>>> up the job in node-2 (OFFLINE->UP)"
>>>>>>>> I think it may not work in my case. Here is what I see the
>>>>>>>> implications.
>>>>>>>> 1. While node1 is in drain, old jobs continue to run, but i want
>>>>>>>> new jobs (for same partition) to be hosted by partition. Think of it as a
>>>>>>>> partition moves from one node to other but over a long time (hours) as
>>>>>>>> determined by when all existing jobs running on node1 finish.
>>>>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>>>>> One workaround I can have is to handle up->offline transition event
>>>>>>>> in the application and save the information about the node1 somewhere, then
>>>>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>>>>>> central strorage for state of cluster which I can use for my routing logic.
>>>>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>>>>> long time.
>>>>>>>>
>>>>>>>>
>>>>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast,
>>>>>>>> the switch should be fast. "
>>>>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on
>>>>>>>> earlier running node which is slow, the existing jobs cannot be
>>>>>>>> pre-empted to move to new node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Santosh
>>>>>>>>
>>>>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi, Santosh
>>>>>>>>>
>>>>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  From
>>>>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>>>>
>>>>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>>>>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>>>>>>> you block there until the job completes. Once the job (partition) on node-1
>>>>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>>>>>>> is fast, the switch should be fast.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, there would be a database.
>>>>>>>>>> So far i have following state model for partition.
>>>>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>>>>> following
>>>>>>>>>> 1. How to Trigger Drain (This is for example we decide to get
>>>>>>>>>> node out for maintenance)
>>>>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger
>>>>>>>>>> it to offline and move the other partition to UP state.
>>>>>>>>>>
>>>>>>>>>> It might be possible that my thinking is entirely wrong and how
>>>>>>>>>> to fit it in helix model,  but essentially above is the sequence of i want
>>>>>>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Santosh
>>>>>>>>>>
>>>>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I was thinking exactly in that direction - having two states is
>>>>>>>>>>> the right thing to do. Before we get there, one more question -
>>>>>>>>>>>
>>>>>>>>>>> - when you get a request for a job, how do you know if that job
>>>>>>>>>>> is old or new? Is there a database that provides the mapping between job
>>>>>>>>>>> and node
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank You Kishore,
>>>>>>>>>>>>
>>>>>>>>>>>> During drain process N2 will start new jobs, the requests
>>>>>>>>>>>> related to old jobs need to go to N1 and requests for new jobs need to go
>>>>>>>>>>>> to N2. Thus during drain on N1, the partition could be present on both
>>>>>>>>>>>> nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>>>>> START_UP state.
>>>>>>>>>>>> I don't know if my thinking about states is correct, but
>>>>>>>>>>>> looking for any pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Santosh
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the jobs,
>>>>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am working on a service that involves running a
>>>>>>>>>>>>>> statetful long running jobs on a node. These long running jobs cannot be
>>>>>>>>>>>>>> preempted and continue on other nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P
>>>>>>>>>>>>>> that involves J number of such jobs running on a node. (N1)
>>>>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a
>>>>>>>>>>>>>> new node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs
>>>>>>>>>>>>>> (J) on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Questions :
>>>>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>>>>> model?
>>>>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic
>>>>>>>>>>>>>> in such cases, any pointers for the the same?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>> Santosh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Lei Xia
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>> Lei Xia
>>>>>
>>>>
>>>
>>> --
>>> Lei Xia
>>>
>>

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Hello All,

Any inputs on below.

Thank You and appreciate your help.

Regards,
Santosh

On Thu, May 14, 2020 at 2:47 PM santosh gujar <sa...@gmail.com>
wrote:

> Thanks a lot Lei,
>
> One last question on this topic,
>
> I gather from documentation that helix controller is the one that directs
> state transitions in a greedy fashion. But this is a synchronous call, e.g.
> in the example that we have been discussing,  the moment, the call returns
> from UpToDrain(), the controller will call DrainToOffiline() immediately
> and also update the states in Zookeeper accordingly. Is my understanding
> correct?
>
> If yes, Is there anyway the transition can be asynchronous?  i.e.  i get
> notified for up->drain transition, but drain->offline happens only when I
> call some api on helix controller? e.g. in my case,  I would have to wait
> via some kind of thread.wait() / sleep() until all other jobs are over. But
> that could introduce some brittleness such that the process that is
> handling the state transition cannot crash until all other jobs (which
> could be running as separate processes) are finished. My preference would
> be calling back an api to helix controller for further state transition
> (drain->offline) for the partition.
>
> Thanks,
> Santosh
>
>
>
> On Thu, May 14, 2020 at 1:28 AM Lei Xia <xi...@gmail.com> wrote:
>
>> Hi, Santosh
>>
>>   I meant the DRAIN-OFFLINE transition should be blocked. You can not
>> block at up->drain, otherwise from Helix perspective the partition will be
>> still in UP state, it won't bring new partition online.  The code logic
>> could be something like below.
>>
>> class MyModel extends StateModel  {
>> @Transition(from = "UP", to = "DRAIN")
>>  public void UpToDrain(Message message, NotificationContext context) {
>>   // you may disable some flags here to not take new jobs
>>  }
>>
>> @Transition(from = "DRAIN", to = "OFFLINE")
>>  public void DrainToOffline(Message message, NotificationContext context)
>> {
>>    wait (all job completed);
>>   // additional cleanup works.
>>  }
>>
>> @Transition(from = "OFFLINE", to = "UP")
>>  public void OfflineToUP(Message message, NotificationContext context) {
>>   // get ready to take new jobs.
>>  }
>>
>> On Wed, May 13, 2020 at 11:24 AM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>>
>>> Thanks a lot Lei, I assume by blocking you mean , blocking on in a
>>> method call that is called
>>>
>>> e.g. following pseudo code.
>>>
>>> class MyModel extends StateModel  {
>>> @Transition(from = "UP", to = "DRAIN")
>>> public void offlineToSlave(Message message, NotificationContext context)
>>> {
>>> //don't return until long long running job is running
>>> }
>>>
>>> On Wed, May 13, 2020 at 10:40 PM Lei Xia <xi...@gmail.com> wrote:
>>>
>>>> Hi, Santosh
>>>>
>>>>   Thanks for explaining your case in detail. In this case, I would
>>>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>>>> the constraint of your model to limit # of replica in UP state to be 1,
>>>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>>>> you are ready to drain an instance, disable the instance first, then Helix
>>>> will transit all partitions (jobs) on that instance to DRAIN and then
>>>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>>>> completed.  On the other hand, once the old partition is in DRAIN state,
>>>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>>>
>>>>
>>>>
>>>> Lei
>>>>
>>>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <
>>>> santosh.techie@gmail.com> wrote:
>>>>
>>>>> Hi Hunter,
>>>>>
>>>>> For various limitations and constraints at this moment, I cannot go
>>>>> down the path of Task Framework.
>>>>>
>>>>> Thanks,
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:
>>>>>
>>>>>> Alternative idea:
>>>>>>
>>>>>> Have you considered using Task Framework's targeted jobs for this use
>>>>>> case? You could make the jobs long-running, and this way, you save yourself
>>>>>> the trouble of having to implement the routing layer (simply specifying
>>>>>> which partition to target in your JobConfig would do it).
>>>>>>
>>>>>> Task Framework doesn't actively terminate running threads on the
>>>>>> worker (Participant) nodes, so you could achieve the effect of "draining"
>>>>>> the node by letting previously assigned tasks to finish by not actively
>>>>>> canceling them in your cancel() logic.
>>>>>>
>>>>>> Hunter
>>>>>>
>>>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <
>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Lei,
>>>>>>>
>>>>>>> Thanks a lot for your time and response.
>>>>>>>
>>>>>>> Some more context about helix partition that i mentioned in my email
>>>>>>> earlier.
>>>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>>>
>>>>>>> " what exactly you need to do to bring a job from OFFLINE to
>>>>>>> STARTUP?"
>>>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>>>> model can give me such information.
>>>>>>>
>>>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring
>>>>>>> up the job in node-2 (OFFLINE->UP)"
>>>>>>> I think it may not work in my case. Here is what I see the
>>>>>>> implications.
>>>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>>>> partition moves from one node to other but over a long time (hours) as
>>>>>>> determined by when all existing jobs running on node1 finish.
>>>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>>>> One workaround I can have is to handle up->offline transition event
>>>>>>> in the application and save the information about the node1 somewhere, then
>>>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>>>>> central strorage for state of cluster which I can use for my routing logic.
>>>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>>>> long time.
>>>>>>>
>>>>>>>
>>>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast,
>>>>>>> the switch should be fast. "
>>>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>>>>>> to new node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Santosh
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi, Santosh
>>>>>>>>
>>>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  From
>>>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>>>
>>>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>>>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>>>>>> you block there until the job completes. Once the job (partition) on node-1
>>>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>>>>>> is fast, the switch should be fast.
>>>>>>>>
>>>>>>>>
>>>>>>>> Lei
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yes, there would be a database.
>>>>>>>>> So far i have following state model for partition.
>>>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>>>> following
>>>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>>>> out for maintenance)
>>>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger
>>>>>>>>> it to offline and move the other partition to UP state.
>>>>>>>>>
>>>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Santosh
>>>>>>>>>
>>>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I was thinking exactly in that direction - having two states is
>>>>>>>>>> the right thing to do. Before we get there, one more question -
>>>>>>>>>>
>>>>>>>>>> - when you get a request for a job, how do you know if that job
>>>>>>>>>> is old or new? Is there a database that provides the mapping between job
>>>>>>>>>> and node
>>>>>>>>>>
>>>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank You Kishore,
>>>>>>>>>>>
>>>>>>>>>>> During drain process N2 will start new jobs, the requests
>>>>>>>>>>> related to old jobs need to go to N1 and requests for new jobs need to go
>>>>>>>>>>> to N2. Thus during drain on N1, the partition could be present on both
>>>>>>>>>>> nodes.
>>>>>>>>>>>
>>>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>>>> START_UP state.
>>>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>>>> for any pointers.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Santosh
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the jobs,
>>>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>>>> following
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am working on a service that involves running a
>>>>>>>>>>>>> statetful long running jobs on a node. These long running jobs cannot be
>>>>>>>>>>>>> preempted and continue on other nodes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P
>>>>>>>>>>>>> that involves J number of such jobs running on a node. (N1)
>>>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a
>>>>>>>>>>>>> new node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Questions :
>>>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>>>> model?
>>>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic
>>>>>>>>>>>>> in such cases, any pointers for the the same?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>> Santosh
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lei Xia
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Lei Xia
>>>>
>>>
>>
>> --
>> Lei Xia
>>
>

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Thanks a lot Lei,

One last question on this topic,

I gather from documentation that helix controller is the one that directs
state transitions in a greedy fashion. But this is a synchronous call, e.g.
in the example that we have been discussing,  the moment, the call returns
from UpToDrain(), the controller will call DrainToOffiline() immediately
and also update the states in Zookeeper accordingly. Is my understanding
correct?

If yes, Is there anyway the transition can be asynchronous?  i.e.  i get
notified for up->drain transition, but drain->offline happens only when I
call some api on helix controller? e.g. in my case,  I would have to wait
via some kind of thread.wait() / sleep() until all other jobs are over. But
that could introduce some brittleness such that the process that is
handling the state transition cannot crash until all other jobs (which
could be running as separate processes) are finished. My preference would
be calling back an api to helix controller for further state transition
(drain->offline) for the partition.

Thanks,
Santosh



On Thu, May 14, 2020 at 1:28 AM Lei Xia <xi...@gmail.com> wrote:

> Hi, Santosh
>
>   I meant the DRAIN-OFFLINE transition should be blocked. You can not
> block at up->drain, otherwise from Helix perspective the partition will be
> still in UP state, it won't bring new partition online.  The code logic
> could be something like below.
>
> class MyModel extends StateModel  {
> @Transition(from = "UP", to = "DRAIN")
>  public void UpToDrain(Message message, NotificationContext context) {
>   // you may disable some flags here to not take new jobs
>  }
>
> @Transition(from = "DRAIN", to = "OFFLINE")
>  public void DrainToOffline(Message message, NotificationContext context) {
>    wait (all job completed);
>   // additional cleanup works.
>  }
>
> @Transition(from = "OFFLINE", to = "UP")
>  public void OfflineToUP(Message message, NotificationContext context) {
>   // get ready to take new jobs.
>  }
>
> On Wed, May 13, 2020 at 11:24 AM santosh gujar <sa...@gmail.com>
> wrote:
>
>>
>> Thanks a lot Lei, I assume by blocking you mean , blocking on in a method
>> call that is called
>>
>> e.g. following pseudo code.
>>
>> class MyModel extends StateModel  {
>> @Transition(from = "UP", to = "DRAIN")
>> public void offlineToSlave(Message message, NotificationContext context) {
>> //don't return until long long running job is running
>> }
>>
>> On Wed, May 13, 2020 at 10:40 PM Lei Xia <xi...@gmail.com> wrote:
>>
>>> Hi, Santosh
>>>
>>>   Thanks for explaining your case in detail. In this case, I would
>>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>>> the constraint of your model to limit # of replica in UP state to be 1,
>>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>>> you are ready to drain an instance, disable the instance first, then Helix
>>> will transit all partitions (jobs) on that instance to DRAIN and then
>>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>>> completed.  On the other hand, once the old partition is in DRAIN state,
>>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>>
>>>
>>>
>>> Lei
>>>
>>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi Hunter,
>>>>
>>>> For various limitations and constraints at this moment, I cannot go
>>>> down the path of Task Framework.
>>>>
>>>> Thanks,
>>>> Santosh
>>>>
>>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:
>>>>
>>>>> Alternative idea:
>>>>>
>>>>> Have you considered using Task Framework's targeted jobs for this use
>>>>> case? You could make the jobs long-running, and this way, you save yourself
>>>>> the trouble of having to implement the routing layer (simply specifying
>>>>> which partition to target in your JobConfig would do it).
>>>>>
>>>>> Task Framework doesn't actively terminate running threads on the
>>>>> worker (Participant) nodes, so you could achieve the effect of "draining"
>>>>> the node by letting previously assigned tasks to finish by not actively
>>>>> canceling them in your cancel() logic.
>>>>>
>>>>> Hunter
>>>>>
>>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <
>>>>> santosh.techie@gmail.com> wrote:
>>>>>
>>>>>> Hi Lei,
>>>>>>
>>>>>> Thanks a lot for your time and response.
>>>>>>
>>>>>> Some more context about helix partition that i mentioned in my email
>>>>>> earlier.
>>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>>
>>>>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>>> model can give me such information.
>>>>>>
>>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring
>>>>>> up the job in node-2 (OFFLINE->UP)"
>>>>>> I think it may not work in my case. Here is what I see the
>>>>>> implications.
>>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>>> partition moves from one node to other but over a long time (hours) as
>>>>>> determined by when all existing jobs running on node1 finish.
>>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>>> One workaround I can have is to handle up->offline transition event
>>>>>> in the application and save the information about the node1 somewhere, then
>>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>>>> central strorage for state of cluster which I can use for my routing logic.
>>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>>> long time.
>>>>>>
>>>>>>
>>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast,
>>>>>> the switch should be fast. "
>>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>>>>> to new node.
>>>>>>
>>>>>> Regards,
>>>>>> Santosh
>>>>>>
>>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, Santosh
>>>>>>>
>>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  From
>>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>>
>>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>>>>> you block there until the job completes. Once the job (partition) on node-1
>>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>>>>> is fast, the switch should be fast.
>>>>>>>
>>>>>>>
>>>>>>> Lei
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yes, there would be a database.
>>>>>>>> So far i have following state model for partition.
>>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>>> following
>>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>>> out for maintenance)
>>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger
>>>>>>>> it to offline and move the other partition to UP state.
>>>>>>>>
>>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Santosh
>>>>>>>>
>>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I was thinking exactly in that direction - having two states is
>>>>>>>>> the right thing to do. Before we get there, one more question -
>>>>>>>>>
>>>>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>>>>> old or new? Is there a database that provides the mapping between job and
>>>>>>>>> node
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You Kishore,
>>>>>>>>>>
>>>>>>>>>> During drain process N2 will start new jobs, the requests related
>>>>>>>>>> to old jobs need to go to N1 and requests for new jobs need to go to N2.
>>>>>>>>>> Thus during drain on N1, the partition could be present on both nodes.
>>>>>>>>>>
>>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>>> START_UP state.
>>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>>> for any pointers.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Santosh
>>>>>>>>>>
>>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the jobs,
>>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>>> following
>>>>>>>>>>>>
>>>>>>>>>>>> I am working on a service that involves running a
>>>>>>>>>>>> statetful long running jobs on a node. These long running jobs cannot be
>>>>>>>>>>>> preempted and continue on other nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>>
>>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>>
>>>>>>>>>>>> Questions :
>>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>>> model?
>>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You
>>>>>>>>>>>> Santosh
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lei Xia
>>>>>>>
>>>>>>
>>>
>>> --
>>> Lei Xia
>>>
>>
>
> --
> Lei Xia
>

Re: Long running jobs and node drain

Posted by Lei Xia <xi...@gmail.com>.

Hi, Santosh

  I meant the DRAIN-OFFLINE transition should be blocked. You can not block
at up->drain, otherwise from Helix perspective the partition will be still
in UP state, it won't bring new partition online.  The code logic could be
something like below.

class MyModel extends StateModel  {
@Transition(from = "UP", to = "DRAIN")
 public void UpToDrain(Message message, NotificationContext context) {
  // you may disable some flags here to not take new jobs
 }

@Transition(from = "DRAIN", to = "OFFLINE")
 public void DrainToOffline(Message message, NotificationContext context) {
   wait (all job completed);
  // additional cleanup works.
 }

@Transition(from = "OFFLINE", to = "UP")
 public void OfflineToUP(Message message, NotificationContext context) {
  // get ready to take new jobs.
 }

On Wed, May 13, 2020 at 11:24 AM santosh gujar <sa...@gmail.com>
wrote:

>
> Thanks a lot Lei, I assume by blocking you mean , blocking on in a method
> call that is called
>
> e.g. following pseudo code.
>
> class MyModel extends StateModel  {
> @Transition(from = "UP", to = "DRAIN")
> public void offlineToSlave(Message message, NotificationContext context) {
> //don't return until long long running job is running
> }
>
> On Wed, May 13, 2020 at 10:40 PM Lei Xia <xi...@gmail.com> wrote:
>
>> Hi, Santosh
>>
>>   Thanks for explaining your case in detail. In this case, I would
>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>> the constraint of your model to limit # of replica in UP state to be 1,
>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>> you are ready to drain an instance, disable the instance first, then Helix
>> will transit all partitions (jobs) on that instance to DRAIN and then
>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>> completed.  On the other hand, once the old partition is in DRAIN state,
>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>
>>
>>
>> Lei
>>
>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>> Hi Hunter,
>>>
>>> For various limitations and constraints at this moment, I cannot go down
>>> the path of Task Framework.
>>>
>>> Thanks,
>>> Santosh
>>>
>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:
>>>
>>>> Alternative idea:
>>>>
>>>> Have you considered using Task Framework's targeted jobs for this use
>>>> case? You could make the jobs long-running, and this way, you save yourself
>>>> the trouble of having to implement the routing layer (simply specifying
>>>> which partition to target in your JobConfig would do it).
>>>>
>>>> Task Framework doesn't actively terminate running threads on the worker
>>>> (Participant) nodes, so you could achieve the effect of "draining" the node
>>>> by letting previously assigned tasks to finish by not actively canceling
>>>> them in your cancel() logic.
>>>>
>>>> Hunter
>>>>
>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Lei,
>>>>>
>>>>> Thanks a lot for your time and response.
>>>>>
>>>>> Some more context about helix partition that i mentioned in my email
>>>>> earlier.
>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>
>>>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>> model can give me such information.
>>>>>
>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up
>>>>> the job in node-2 (OFFLINE->UP)"
>>>>> I think it may not work in my case. Here is what I see the
>>>>> implications.
>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>> partition moves from one node to other but over a long time (hours) as
>>>>> determined by when all existing jobs running on node1 finish.
>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>> One workaround I can have is to handle up->offline transition event in
>>>>> the application and save the information about the node1 somewhere, then
>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>>> central strorage for state of cluster which I can use for my routing logic.
>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>> long time.
>>>>>
>>>>>
>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast, the
>>>>> switch should be fast. "
>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>>>> to new node.
>>>>>
>>>>> Regards,
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Santosh
>>>>>>
>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  From
>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>
>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>>>> you block there until the job completes. Once the job (partition) on node-1
>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>>>> is fast, the switch should be fast.
>>>>>>
>>>>>>
>>>>>> Lei
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, there would be a database.
>>>>>>> So far i have following state model for partition.
>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>> following
>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>> out for maintenance)
>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it
>>>>>>> to offline and move the other partition to UP state.
>>>>>>>
>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Santosh
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I was thinking exactly in that direction - having two states is the
>>>>>>>> right thing to do. Before we get there, one more question -
>>>>>>>>
>>>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>>>> old or new? Is there a database that provides the mapping between job and
>>>>>>>> node
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank You Kishore,
>>>>>>>>>
>>>>>>>>> During drain process N2 will start new jobs, the requests related
>>>>>>>>> to old jobs need to go to N1 and requests for new jobs need to go to N2.
>>>>>>>>> Thus during drain on N1, the partition could be present on both nodes.
>>>>>>>>>
>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>> START_UP state.
>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>> for any pointers.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Santosh
>>>>>>>>>
>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the jobs,
>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>
>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>> following
>>>>>>>>>>>
>>>>>>>>>>> I am working on a service that involves running a statetful long
>>>>>>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>>>>>>> continue on other nodes.
>>>>>>>>>>>
>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>
>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>
>>>>>>>>>>> Questions :
>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>> model?
>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>> Santosh
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lei Xia
>>>>>>
>>>>>
>>
>> --
>> Lei Xia
>>
>

-- 
Lei Xia

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

correcting typo,

 class MyModel extends StateModel  {
@Transition(from = "UP", to = "DRAIN")
 public void uptoDrain(Message message, NotificationContext context) {
//don't return until long long running job is running
}

On Wed, May 13, 2020 at 11:54 PM santosh gujar <sa...@gmail.com>
wrote:

>
> Thanks a lot Lei, I assume by blocking you mean , blocking on in a method
> call that is called
>
> e.g. following pseudo code.
>
> class MyModel extends StateModel  {
> @Transition(from = "UP", to = "DRAIN")
> public void offlineToSlave(Message message, NotificationContext context) {
> //don't return until long long running job is running
> }
>
> On Wed, May 13, 2020 at 10:40 PM Lei Xia <xi...@gmail.com> wrote:
>
>> Hi, Santosh
>>
>>   Thanks for explaining your case in detail. In this case, I would
>> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
>> the constraint of your model to limit # of replica in UP state to be 1,
>> i.e, Helix will make sure there is only 1 replica in UP at same time. When
>> you are ready to drain an instance, disable the instance first, then Helix
>> will transit all partitions (jobs) on that instance to DRAIN and then
>> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
>> completed.  On the other hand, once the old partition is in DRAIN state,
>> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>>
>>
>>
>> Lei
>>
>> On Tue, May 12, 2020 at 10:58 AM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>> Hi Hunter,
>>>
>>> For various limitations and constraints at this moment, I cannot go down
>>> the path of Task Framework.
>>>
>>> Thanks,
>>> Santosh
>>>
>>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:
>>>
>>>> Alternative idea:
>>>>
>>>> Have you considered using Task Framework's targeted jobs for this use
>>>> case? You could make the jobs long-running, and this way, you save yourself
>>>> the trouble of having to implement the routing layer (simply specifying
>>>> which partition to target in your JobConfig would do it).
>>>>
>>>> Task Framework doesn't actively terminate running threads on the worker
>>>> (Participant) nodes, so you could achieve the effect of "draining" the node
>>>> by letting previously assigned tasks to finish by not actively canceling
>>>> them in your cancel() logic.
>>>>
>>>> Hunter
>>>>
>>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Lei,
>>>>>
>>>>> Thanks a lot for your time and response.
>>>>>
>>>>> Some more context about helix partition that i mentioned in my email
>>>>> earlier.
>>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>>> running some hash function (simplest is taking a mod of an job)
>>>>>
>>>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>>>> I added STARTUP to distinguish the track the fact that a partition
>>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>>> model can give me such information.
>>>>>
>>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up
>>>>> the job in node-2 (OFFLINE->UP)"
>>>>> I think it may not work in my case. Here is what I see the
>>>>> implications.
>>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>>> partition moves from one node to other but over a long time (hours) as
>>>>> determined by when all existing jobs running on node1 finish.
>>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>>> One workaround I can have is to handle up->offline transition event in
>>>>> the application and save the information about the node1 somewhere, then
>>>>> use this information later to distinguish old jobs and new jobs. But this
>>>>> information is stored outside helix and i wanted to avoid it.  What
>>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>>> central strorage for state of cluster which I can use for my routing logic.
>>>>> 3. A job could be running for hours and thus drain can happen for a
>>>>> long time.
>>>>>
>>>>>
>>>>> "  How long you would expect OFFLINE->UP take here, if it is fast, the
>>>>> switch should be fast. "
>>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>>>> to new node.
>>>>>
>>>>> Regards,
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Santosh
>>>>>>
>>>>>>   One question, what exactly you need to do to bring a job from
>>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>>> OFFLINE->UP you will get the job started and ready to serve request.  From
>>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>>
>>>>>>  With this state model, you can start to drain a node by disabling
>>>>>> it. Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>>>> you block there until the job completes. Once the job (partition) on node-1
>>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>>>> is fast, the switch should be fast.
>>>>>>
>>>>>>
>>>>>> Lei
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, there would be a database.
>>>>>>> So far i have following state model for partition.
>>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>>> following
>>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>>> out for maintenance)
>>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in
>>>>>>> and move the partition simultaneously on another node in start_up mode.
>>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it
>>>>>>> to offline and move the other partition to UP state.
>>>>>>>
>>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Santosh
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I was thinking exactly in that direction - having two states is the
>>>>>>>> right thing to do. Before we get there, one more question -
>>>>>>>>
>>>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>>>> old or new? Is there a database that provides the mapping between job and
>>>>>>>> node
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank You Kishore,
>>>>>>>>>
>>>>>>>>> During drain process N2 will start new jobs, the requests related
>>>>>>>>> to old jobs need to go to N1 and requests for new jobs need to go to N2.
>>>>>>>>> Thus during drain on N1, the partition could be present on both nodes.
>>>>>>>>>
>>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>>> START_UP state.
>>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>>> for any pointers.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Santosh
>>>>>>>>>
>>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> what  happens to request during the drain process i.e when you
>>>>>>>>>> put N1 out of service and while N2 is waiting for N1 to finish the jobs,
>>>>>>>>>> where will the requests for P go to - N1 or N2
>>>>>>>>>>
>>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am looking for some clues or inputs on how to achieve
>>>>>>>>>>> following
>>>>>>>>>>>
>>>>>>>>>>> I am working on a service that involves running a statetful long
>>>>>>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>>>>>>> continue on other nodes.
>>>>>>>>>>>
>>>>>>>>>>> Problem Requirements :
>>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>>
>>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J)
>>>>>>>>>>> on it are over, at this point only N2 will serve P request.
>>>>>>>>>>>
>>>>>>>>>>> Questions :
>>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state
>>>>>>>>>>> model?
>>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>>>> (node addition or deletion)
>>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>> Santosh
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lei Xia
>>>>>>
>>>>>
>>
>> --
>> Lei Xia
>>
>

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Thanks a lot Lei, I assume by blocking you mean , blocking on in a method
call that is called

e.g. following pseudo code.

class MyModel extends StateModel  {
@Transition(from = "UP", to = "DRAIN")
public void offlineToSlave(Message message, NotificationContext context) {
//don't return until long long running job is running
}

On Wed, May 13, 2020 at 10:40 PM Lei Xia <xi...@gmail.com> wrote:

> Hi, Santosh
>
>   Thanks for explaining your case in detail. In this case, I would
> recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
> the constraint of your model to limit # of replica in UP state to be 1,
> i.e, Helix will make sure there is only 1 replica in UP at same time. When
> you are ready to drain an instance, disable the instance first, then Helix
> will transit all partitions (jobs) on that instance to DRAIN and then
> OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
> completed.  On the other hand, once the old partition is in DRAIN state,
> Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.
>
>
>
> Lei
>
> On Tue, May 12, 2020 at 10:58 AM santosh gujar <sa...@gmail.com>
> wrote:
>
>> Hi Hunter,
>>
>> For various limitations and constraints at this moment, I cannot go down
>> the path of Task Framework.
>>
>> Thanks,
>> Santosh
>>
>> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:
>>
>>> Alternative idea:
>>>
>>> Have you considered using Task Framework's targeted jobs for this use
>>> case? You could make the jobs long-running, and this way, you save yourself
>>> the trouble of having to implement the routing layer (simply specifying
>>> which partition to target in your JobConfig would do it).
>>>
>>> Task Framework doesn't actively terminate running threads on the worker
>>> (Participant) nodes, so you could achieve the effect of "draining" the node
>>> by letting previously assigned tasks to finish by not actively canceling
>>> them in your cancel() logic.
>>>
>>> Hunter
>>>
>>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi Lei,
>>>>
>>>> Thanks a lot for your time and response.
>>>>
>>>> Some more context about helix partition that i mentioned in my email
>>>> earlier.
>>>> My thinking is to my map multiple long jobs to a helix partition by
>>>> running some hash function (simplest is taking a mod of an job)
>>>>
>>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>>> I added STARTUP to distinguish the track the fact that a partition
>>>> could be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE
>>>> model can give me such information.
>>>>
>>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up
>>>> the job in node-2 (OFFLINE->UP)"
>>>> I think it may not work in my case. Here is what I see the implications.
>>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>>> partition moves from one node to other but over a long time (hours) as
>>>> determined by when all existing jobs running on node1 finish.
>>>> 2. As per your suggestion,  node-2 serves the partition only when
>>>> node-1 is offline. But it cannot satisfy 1 above.
>>>> One workaround I can have is to handle up->offline transition event in
>>>> the application and save the information about the node1 somewhere, then
>>>> use this information later to distinguish old jobs and new jobs. But this
>>>> information is stored outside helix and i wanted to avoid it.  What
>>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>>> central strorage for state of cluster which I can use for my routing logic.
>>>> 3. A job could be running for hours and thus drain can happen for a
>>>> long time.
>>>>
>>>>
>>>> "  How long you would expect OFFLINE->UP take here, if it is fast, the
>>>> switch should be fast. "
>>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>>> to new node.
>>>>
>>>> Regards,
>>>> Santosh
>>>>
>>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>>
>>>>> Hi, Santosh
>>>>>
>>>>>   One question, what exactly you need to do to bring a job from
>>>>> OFFLINE to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From
>>>>> OFFLINE->UP you will get the job started and ready to serve request.  From
>>>>> UP->OFFLINE you will block there until job get drained.
>>>>>
>>>>>  With this state model, you can start to drain a node by disabling it.
>>>>> Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>>> you block there until the job completes. Once the job (partition) on node-1
>>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>>> is fast, the switch should be fast.
>>>>>
>>>>>
>>>>> Lei
>>>>>
>>>>>
>>>>>
>>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <
>>>>> santosh.techie@gmail.com> wrote:
>>>>>
>>>>>> Yes, there would be a database.
>>>>>> So far i have following state model for partition.
>>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>>> following
>>>>>> 1. How to Trigger Drain (This is for example we decide to get node
>>>>>> out for maintenance)
>>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in and
>>>>>> move the partition simultaneously on another node in start_up mode.
>>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it
>>>>>> to offline and move the other partition to UP state.
>>>>>>
>>>>>> It might be possible that my thinking is entirely wrong and how to
>>>>>> fit it in helix model,  but essentially above is the sequence of i want
>>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>>
>>>>>> Regards,
>>>>>> Santosh
>>>>>>
>>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I was thinking exactly in that direction - having two states is the
>>>>>>> right thing to do. Before we get there, one more question -
>>>>>>>
>>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>>> old or new? Is there a database that provides the mapping between job and
>>>>>>> node
>>>>>>>
>>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thank You Kishore,
>>>>>>>>
>>>>>>>> During drain process N2 will start new jobs, the requests related
>>>>>>>> to old jobs need to go to N1 and requests for new jobs need to go to N2.
>>>>>>>> Thus during drain on N1, the partition could be present on both nodes.
>>>>>>>>
>>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>>> START_UP state.
>>>>>>>> I don't know if my thinking about states is correct, but looking
>>>>>>>> for any pointers.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Santosh
>>>>>>>>
>>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> what  happens to request during the drain process i.e when you put
>>>>>>>>> N1 out of service and while N2 is waiting for N1 to finish the jobs, where
>>>>>>>>> will the requests for P go to - N1 or N2
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am looking for some clues or inputs on how to achieve following
>>>>>>>>>>
>>>>>>>>>> I am working on a service that involves running a statetful long
>>>>>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>>>>>> continue on other nodes.
>>>>>>>>>>
>>>>>>>>>> Problem Requirements :
>>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>>
>>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J) on
>>>>>>>>>> it are over, at this point only N2 will serve P request.
>>>>>>>>>>
>>>>>>>>>> Questions :
>>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>>> (node addition or deletion)
>>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>>
>>>>>>>>>> Thank You
>>>>>>>>>> Santosh
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> Lei Xia
>>>>>
>>>>
>
> --
> Lei Xia
>

Re: Long running jobs and node drain

Posted by Lei Xia <xi...@gmail.com>.

Hi, Santosh

  Thanks for explaining your case in detail. In this case, I would
recommend you to use "OFFLINE->UP->DRAIN->OFFLINE" model. And you can set
the constraint of your model to limit # of replica in UP state to be 1,
i.e, Helix will make sure there is only 1 replica in UP at same time. When
you are ready to drain an instance, disable the instance first, then Helix
will transit all partitions (jobs) on that instance to DRAIN and then
OFFLINE, you can block at DRAIN->OFFLINE transition until all jobs are
completed.  On the other hand, once the old partition is in DRAIN state,
Helix should bring up a new partition to UP (OFFLINE->UP) on a new node.



Lei

On Tue, May 12, 2020 at 10:58 AM santosh gujar <sa...@gmail.com>
wrote:

> Hi Hunter,
>
> For various limitations and constraints at this moment, I cannot go down
> the path of Task Framework.
>
> Thanks,
> Santosh
>
> On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:
>
>> Alternative idea:
>>
>> Have you considered using Task Framework's targeted jobs for this use
>> case? You could make the jobs long-running, and this way, you save yourself
>> the trouble of having to implement the routing layer (simply specifying
>> which partition to target in your JobConfig would do it).
>>
>> Task Framework doesn't actively terminate running threads on the worker
>> (Participant) nodes, so you could achieve the effect of "draining" the node
>> by letting previously assigned tasks to finish by not actively canceling
>> them in your cancel() logic.
>>
>> Hunter
>>
>> On Tue, May 12, 2020 at 1:02 AM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>> Hi Lei,
>>>
>>> Thanks a lot for your time and response.
>>>
>>> Some more context about helix partition that i mentioned in my email
>>> earlier.
>>> My thinking is to my map multiple long jobs to a helix partition by
>>> running some hash function (simplest is taking a mod of an job)
>>>
>>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>>> I added STARTUP to distinguish the track the fact that a partition could
>>> be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE model
>>> can give me such information.
>>>
>>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up
>>> the job in node-2 (OFFLINE->UP)"
>>> I think it may not work in my case. Here is what I see the implications.
>>> 1. While node1 is in drain, old jobs continue to run, but i want new
>>> jobs (for same partition) to be hosted by partition. Think of it as a
>>> partition moves from one node to other but over a long time (hours) as
>>> determined by when all existing jobs running on node1 finish.
>>> 2. As per your suggestion,  node-2 serves the partition only when node-1
>>> is offline. But it cannot satisfy 1 above.
>>> One workaround I can have is to handle up->offline transition event in
>>> the application and save the information about the node1 somewhere, then
>>> use this information later to distinguish old jobs and new jobs. But this
>>> information is stored outside helix and i wanted to avoid it.  What
>>> attracted me towards helix is it's auto re-balancing capability and it's a
>>> central strorage for state of cluster which I can use for my routing logic.
>>> 3. A job could be running for hours and thus drain can happen for a long
>>> time.
>>>
>>>
>>> "  How long you would expect OFFLINE->UP take here, if it is fast, the
>>> switch should be fast. "
>>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>>> running node which is slow, the existing jobs cannot be pre-empted to move
>>> to new node.
>>>
>>> Regards,
>>> Santosh
>>>
>>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>>
>>>> Hi, Santosh
>>>>
>>>>   One question, what exactly you need to do to bring a job from OFFLINE
>>>> to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From OFFLINE->UP
>>>> you will get the job started and ready to serve request.  From UP->OFFLINE
>>>> you will block there until job get drained.
>>>>
>>>>  With this state model, you can start to drain a node by disabling it.
>>>> Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>>> you block there until the job completes. Once the job (partition) on node-1
>>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>>> is fast, the switch should be fast.
>>>>
>>>>
>>>> Lei
>>>>
>>>>
>>>>
>>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, there would be a database.
>>>>> So far i have following state model for partition.
>>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>>> following
>>>>> 1. How to Trigger Drain (This is for example we decide to get node out
>>>>> for maintenance)
>>>>> 2. Once a drain has started, I expect helix rebalancer to kick in and
>>>>> move the partition simultaneously on another node in start_up mode.
>>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it
>>>>> to offline and move the other partition to UP state.
>>>>>
>>>>> It might be possible that my thinking is entirely wrong and how to fit
>>>>> it in helix model,  but essentially above is the sequence of i want
>>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>>> long running jobs that cannot be moved immediately to other node.
>>>>>
>>>>> Regards,
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com> wrote:
>>>>>
>>>>>> I was thinking exactly in that direction - having two states is the
>>>>>> right thing to do. Before we get there, one more question -
>>>>>>
>>>>>> - when you get a request for a job, how do you know if that job is
>>>>>> old or new? Is there a database that provides the mapping between job and
>>>>>> node
>>>>>>
>>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>
>>>>>>> Thank You Kishore,
>>>>>>>
>>>>>>> During drain process N2 will start new jobs, the requests related to
>>>>>>> old jobs need to go to N1 and requests for new jobs need to go to N2. Thus
>>>>>>> during drain on N1, the partition could be present on both nodes.
>>>>>>>
>>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>>> START_UP state.
>>>>>>> I don't know if my thinking about states is correct, but looking for
>>>>>>> any pointers.
>>>>>>>
>>>>>>> Regards
>>>>>>> Santosh
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> what  happens to request during the drain process i.e when you put
>>>>>>>> N1 out of service and while N2 is waiting for N1 to finish the jobs, where
>>>>>>>> will the requests for P go to - N1 or N2
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am looking for some clues or inputs on how to achieve following
>>>>>>>>>
>>>>>>>>> I am working on a service that involves running a statetful long
>>>>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>>>>> continue on other nodes.
>>>>>>>>>
>>>>>>>>> Problem Requirements :
>>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>>
>>>>>>>>> 3. N1 can be put out of service only when all running jobs (J) on
>>>>>>>>> it are over, at this point only N2 will serve P request.
>>>>>>>>>
>>>>>>>>> Questions :
>>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>>> (node addition or deletion)
>>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>> Santosh
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Lei Xia
>>>>
>>>

-- 
Lei Xia

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Hi Hunter,

For various limitations and constraints at this moment, I cannot go down
the path of Task Framework.

Thanks,
Santosh

On Tue, May 12, 2020 at 7:23 PM Hunter Lee <na...@gmail.com> wrote:

> Alternative idea:
>
> Have you considered using Task Framework's targeted jobs for this use
> case? You could make the jobs long-running, and this way, you save yourself
> the trouble of having to implement the routing layer (simply specifying
> which partition to target in your JobConfig would do it).
>
> Task Framework doesn't actively terminate running threads on the worker
> (Participant) nodes, so you could achieve the effect of "draining" the node
> by letting previously assigned tasks to finish by not actively canceling
> them in your cancel() logic.
>
> Hunter
>
> On Tue, May 12, 2020 at 1:02 AM santosh gujar <sa...@gmail.com>
> wrote:
>
>> Hi Lei,
>>
>> Thanks a lot for your time and response.
>>
>> Some more context about helix partition that i mentioned in my email
>> earlier.
>> My thinking is to my map multiple long jobs to a helix partition by
>> running some hash function (simplest is taking a mod of an job)
>>
>> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
>> I added STARTUP to distinguish the track the fact that a partition could
>> be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE model
>> can give me such information.
>>
>> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up
>> the job in node-2 (OFFLINE->UP)"
>> I think it may not work in my case. Here is what I see the implications.
>> 1. While node1 is in drain, old jobs continue to run, but i want new jobs
>> (for same partition) to be hosted by partition. Think of it as a partition
>> moves from one node to other but over a long time (hours) as determined by
>> when all existing jobs running on node1 finish.
>> 2. As per your suggestion,  node-2 serves the partition only when node-1
>> is offline. But it cannot satisfy 1 above.
>> One workaround I can have is to handle up->offline transition event in
>> the application and save the information about the node1 somewhere, then
>> use this information later to distinguish old jobs and new jobs. But this
>> information is stored outside helix and i wanted to avoid it.  What
>> attracted me towards helix is it's auto re-balancing capability and it's a
>> central strorage for state of cluster which I can use for my routing logic.
>> 3. A job could be running for hours and thus drain can happen for a long
>> time.
>>
>>
>> "  How long you would expect OFFLINE->UP take here, if it is fast, the
>> switch should be fast. "
>> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
>> running node which is slow, the existing jobs cannot be pre-empted to move
>> to new node.
>>
>> Regards,
>> Santosh
>>
>> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>>
>>> Hi, Santosh
>>>
>>>   One question, what exactly you need to do to bring a job from OFFLINE
>>> to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From OFFLINE->UP
>>> you will get the job started and ready to serve request.  From UP->OFFLINE
>>> you will block there until job get drained.
>>>
>>>  With this state model, you can start to drain a node by disabling it.
>>> Once a node is disabled, Helix will send UP->OFFLINE transition to all
>>> partitions on that node, in your implementation of UP->OFFLINE transition,
>>> you block there until the job completes. Once the job (partition) on node-1
>>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>>> is fast, the switch should be fast.
>>>
>>>
>>> Lei
>>>
>>>
>>>
>>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <sa...@gmail.com>
>>> wrote:
>>>
>>>> Yes, there would be a database.
>>>> So far i have following state model for partition.
>>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>>> following
>>>> 1. How to Trigger Drain (This is for example we decide to get node out
>>>> for maintenance)
>>>> 2. Once a drain has started, I expect helix rebalancer to kick in and
>>>> move the partition simultaneously on another node in start_up mode.
>>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it to
>>>> offline and move the other partition to UP state.
>>>>
>>>> It might be possible that my thinking is entirely wrong and how to fit
>>>> it in helix model,  but essentially above is the sequence of i want
>>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>>> long running jobs that cannot be moved immediately to other node.
>>>>
>>>> Regards,
>>>> Santosh
>>>>
>>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com> wrote:
>>>>
>>>>> I was thinking exactly in that direction - having two states is the
>>>>> right thing to do. Before we get there, one more question -
>>>>>
>>>>> - when you get a request for a job, how do you know if that job is old
>>>>> or new? Is there a database that provides the mapping between job and node
>>>>>
>>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>>> santosh.techie@gmail.com> wrote:
>>>>>
>>>>>> Thank You Kishore,
>>>>>>
>>>>>> During drain process N2 will start new jobs, the requests related to
>>>>>> old jobs need to go to N1 and requests for new jobs need to go to N2. Thus
>>>>>> during drain on N1, the partition could be present on both nodes.
>>>>>>
>>>>>> My current thinking is that in helix somehow i need to model is
>>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>>> START_UP state.
>>>>>> I don't know if my thinking about states is correct, but looking for
>>>>>> any pointers.
>>>>>>
>>>>>> Regards
>>>>>> Santosh
>>>>>>
>>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> what  happens to request during the drain process i.e when you put
>>>>>>> N1 out of service and while N2 is waiting for N1 to finish the jobs, where
>>>>>>> will the requests for P go to - N1 or N2
>>>>>>>
>>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am looking for some clues or inputs on how to achieve following
>>>>>>>>
>>>>>>>> I am working on a service that involves running a statetful long
>>>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>>>> continue on other nodes.
>>>>>>>>
>>>>>>>> Problem Requirements :
>>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>>> 2. When I put the node in a drain, I want helix to assign a new
>>>>>>>> node to this partition (P) is also started on the new node (N2).
>>>>>>>>
>>>>>>>> 3. N1 can be put out of service only when all running jobs (J) on
>>>>>>>> it are over, at this point only N2 will serve P request.
>>>>>>>>
>>>>>>>> Questions :
>>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>>> (node addition or deletion)
>>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in
>>>>>>>> such cases, any pointers for the the same?
>>>>>>>>
>>>>>>>> Thank You
>>>>>>>> Santosh
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Lei Xia
>>>
>>

Re: Long running jobs and node drain

Posted by Hunter Lee <na...@gmail.com>.

Alternative idea:

Have you considered using Task Framework's targeted jobs for this use case?
You could make the jobs long-running, and this way, you save yourself the
trouble of having to implement the routing layer (simply specifying which
partition to target in your JobConfig would do it).

Task Framework doesn't actively terminate running threads on the worker
(Participant) nodes, so you could achieve the effect of "draining" the node
by letting previously assigned tasks to finish by not actively canceling
them in your cancel() logic.

Hunter

On Tue, May 12, 2020 at 1:02 AM santosh gujar <sa...@gmail.com>
wrote:

> Hi Lei,
>
> Thanks a lot for your time and response.
>
> Some more context about helix partition that i mentioned in my email
> earlier.
> My thinking is to my map multiple long jobs to a helix partition by
> running some hash function (simplest is taking a mod of an job)
>
> " what exactly you need to do to bring a job from OFFLINE to STARTUP?"
> I added STARTUP to distinguish the track the fact that a partition could
> be hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE model
> can give me such information.
>
> " Once the job (partition) on node-1 goes OFFLINE, Helix will bring up the
> job in node-2 (OFFLINE->UP)"
> I think it may not work in my case. Here is what I see the implications.
> 1. While node1 is in drain, old jobs continue to run, but i want new jobs
> (for same partition) to be hosted by partition. Think of it as a partition
> moves from one node to other but over a long time (hours) as determined by
> when all existing jobs running on node1 finish.
> 2. As per your suggestion,  node-2 serves the partition only when node-1
> is offline. But it cannot satisfy 1 above.
> One workaround I can have is to handle up->offline transition event in the
> application and save the information about the node1 somewhere, then use
> this information later to distinguish old jobs and new jobs. But this
> information is stored outside helix and i wanted to avoid it.  What
> attracted me towards helix is it's auto re-balancing capability and it's a
> central strorage for state of cluster which I can use for my routing logic.
> 3. A job could be running for hours and thus drain can happen for a long
> time.
>
>
> "  How long you would expect OFFLINE->UP take here, if it is fast, the
> switch should be fast. "
> OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
> running node which is slow, the existing jobs cannot be pre-empted to move
> to new node.
>
> Regards,
> Santosh
>
> On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:
>
>> Hi, Santosh
>>
>>   One question, what exactly you need to do to bring a job from OFFLINE
>> to STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From OFFLINE->UP
>> you will get the job started and ready to serve request.  From UP->OFFLINE
>> you will block there until job get drained.
>>
>>  With this state model, you can start to drain a node by disabling it.
>> Once a node is disabled, Helix will send UP->OFFLINE transition to all
>> partitions on that node, in your implementation of UP->OFFLINE transition,
>> you block there until the job completes. Once the job (partition) on node-1
>> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
>> this work for you?  How long you would expect OFFLINE->UP take here, if it
>> is fast, the switch should be fast.
>>
>>
>> Lei
>>
>>
>>
>> On Mon, May 11, 2020 at 9:02 PM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>> Yes, there would be a database.
>>> So far i have following state model for partition.
>>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>>> following
>>> 1. How to Trigger Drain (This is for example we decide to get node out
>>> for maintenance)
>>> 2. Once a drain has started, I expect helix rebalancer to kick in and
>>> move the partition simultaneously on another node in start_up mode.
>>> 3. Once All jobs  on node1 are done, need a manual way to trigger it to
>>> offline and move the other partition to UP state.
>>>
>>> It might be possible that my thinking is entirely wrong and how to fit
>>> it in helix model,  but essentially above is the sequence of i want
>>> achieve.  Any pointers will be of great help. The constraint is that it's a
>>> long running jobs that cannot be moved immediately to other node.
>>>
>>> Regards,
>>> Santosh
>>>
>>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com> wrote:
>>>
>>>> I was thinking exactly in that direction - having two states is the
>>>> right thing to do. Before we get there, one more question -
>>>>
>>>> - when you get a request for a job, how do you know if that job is old
>>>> or new? Is there a database that provides the mapping between job and node
>>>>
>>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <
>>>> santosh.techie@gmail.com> wrote:
>>>>
>>>>> Thank You Kishore,
>>>>>
>>>>> During drain process N2 will start new jobs, the requests related to
>>>>> old jobs need to go to N1 and requests for new jobs need to go to N2. Thus
>>>>> during drain on N1, the partition could be present on both nodes.
>>>>>
>>>>> My current thinking is that in helix somehow i need to model is
>>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>>> could have partition P in Drain State and N2 can have partition P in
>>>>> START_UP state.
>>>>> I don't know if my thinking about states is correct, but looking for
>>>>> any pointers.
>>>>>
>>>>> Regards
>>>>> Santosh
>>>>>
>>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com> wrote:
>>>>>
>>>>>> what  happens to request during the drain process i.e when you put N1
>>>>>> out of service and while N2 is waiting for N1 to finish the jobs, where
>>>>>> will the requests for P go to - N1 or N2
>>>>>>
>>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>>> santosh.techie@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am looking for some clues or inputs on how to achieve following
>>>>>>>
>>>>>>> I am working on a service that involves running a statetful long
>>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>>> continue on other nodes.
>>>>>>>
>>>>>>> Problem Requirements :
>>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>>> 2. When I put the node in a drain, I want helix to assign a new node
>>>>>>> to this partition (P) is also started on the new node (N2).
>>>>>>>
>>>>>>> 3. N1 can be put out of service only when all running jobs (J) on it
>>>>>>> are over, at this point only N2 will serve P request.
>>>>>>>
>>>>>>> Questions :
>>>>>>> 1. Can drain process be modeled using helix?
>>>>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>>> (node addition or deletion)
>>>>>>> 3.I guess  spectator will be needed, to custom routing logic in such
>>>>>>> cases, any pointers for the the same?
>>>>>>>
>>>>>>> Thank You
>>>>>>> Santosh
>>>>>>>
>>>>>>
>>
>> --
>> Lei Xia
>>
>

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Hi Lei,

Thanks a lot for your time and response.

Some more context about helix partition that i mentioned in my email
earlier.
My thinking is to my map multiple long jobs to a helix partition by running
some hash function (simplest is taking a mod of an job)

" what exactly you need to do to bring a job from OFFLINE to STARTUP?"
I added STARTUP to distinguish the track the fact that a partition could be
hosted on two nodes simultaneously, I doubt offline->UP->OFFLINE model can
give me such information.

" Once the job (partition) on node-1 goes OFFLINE, Helix will bring up the
job in node-2 (OFFLINE->UP)"
I think it may not work in my case. Here is what I see the implications.
1. While node1 is in drain, old jobs continue to run, but i want new jobs
(for same partition) to be hosted by partition. Think of it as a partition
moves from one node to other but over a long time (hours) as determined by
when all existing jobs running on node1 finish.
2. As per your suggestion,  node-2 serves the partition only when node-1 is
offline. But it cannot satisfy 1 above.
One workaround I can have is to handle up->offline transition event in the
application and save the information about the node1 somewhere, then use
this information later to distinguish old jobs and new jobs. But this
information is stored outside helix and i wanted to avoid it.  What
attracted me towards helix is it's auto re-balancing capability and it's a
central strorage for state of cluster which I can use for my routing logic.
3. A job could be running for hours and thus drain can happen for a long
time.


"  How long you would expect OFFLINE->UP take here, if it is fast, the
switch should be fast. "
OFFLINE->UP is fast,  As I describe above, it's the drain on earlier
running node which is slow, the existing jobs cannot be pre-empted to move
to new node.

Regards,
Santosh

On Tue, May 12, 2020 at 10:40 AM Lei Xia <xi...@gmail.com> wrote:

> Hi, Santosh
>
>   One question, what exactly you need to do to bring a job from OFFLINE to
> STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From OFFLINE->UP you
> will get the job started and ready to serve request.  From UP->OFFLINE you
> will block there until job get drained.
>
>  With this state model, you can start to drain a node by disabling it.
> Once a node is disabled, Helix will send UP->OFFLINE transition to all
> partitions on that node, in your implementation of UP->OFFLINE transition,
> you block there until the job completes. Once the job (partition) on node-1
> goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
> this work for you?  How long you would expect OFFLINE->UP take here, if it
> is fast, the switch should be fast.
>
>
> Lei
>
>
>
> On Mon, May 11, 2020 at 9:02 PM santosh gujar <sa...@gmail.com>
> wrote:
>
>> Yes, there would be a database.
>> So far i have following state model for partition.
>> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
>> following
>> 1. How to Trigger Drain (This is for example we decide to get node out
>> for maintenance)
>> 2. Once a drain has started, I expect helix rebalancer to kick in and
>> move the partition simultaneously on another node in start_up mode.
>> 3. Once All jobs  on node1 are done, need a manual way to trigger it to
>> offline and move the other partition to UP state.
>>
>> It might be possible that my thinking is entirely wrong and how to fit it
>> in helix model,  but essentially above is the sequence of i want achieve.
>> Any pointers will be of great help. The constraint is that it's a long
>> running jobs that cannot be moved immediately to other node.
>>
>> Regards,
>> Santosh
>>
>> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com> wrote:
>>
>>> I was thinking exactly in that direction - having two states is the
>>> right thing to do. Before we get there, one more question -
>>>
>>> - when you get a request for a job, how do you know if that job is old
>>> or new? Is there a database that provides the mapping between job and node
>>>
>>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <sa...@gmail.com>
>>> wrote:
>>>
>>>> Thank You Kishore,
>>>>
>>>> During drain process N2 will start new jobs, the requests related to
>>>> old jobs need to go to N1 and requests for new jobs need to go to N2. Thus
>>>> during drain on N1, the partition could be present on both nodes.
>>>>
>>>> My current thinking is that in helix somehow i need to model is
>>>> as Partition P with two different states on these two nodes. . e.g. N1
>>>> could have partition P in Drain State and N2 can have partition P in
>>>> START_UP state.
>>>> I don't know if my thinking about states is correct, but looking for
>>>> any pointers.
>>>>
>>>> Regards
>>>> Santosh
>>>>
>>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com> wrote:
>>>>
>>>>> what  happens to request during the drain process i.e when you put N1
>>>>> out of service and while N2 is waiting for N1 to finish the jobs, where
>>>>> will the requests for P go to - N1 or N2
>>>>>
>>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>>> santosh.techie@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am looking for some clues or inputs on how to achieve following
>>>>>>
>>>>>> I am working on a service that involves running a statetful long
>>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>>> continue on other nodes.
>>>>>>
>>>>>> Problem Requirements :
>>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>>> involves J number of such jobs running on a node. (N1)
>>>>>> 2. When I put the node in a drain, I want helix to assign a new node
>>>>>> to this partition (P) is also started on the new node (N2).
>>>>>>
>>>>>> 3. N1 can be put out of service only when all running jobs (J) on it
>>>>>> are over, at this point only N2 will serve P request.
>>>>>>
>>>>>> Questions :
>>>>>> 1. Can drain process be modeled using helix?
>>>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>>> state transitions only when number of partitions change or cluster changes
>>>>>> (node addition or deletion)
>>>>>> 3.I guess  spectator will be needed, to custom routing logic in such
>>>>>> cases, any pointers for the the same?
>>>>>>
>>>>>> Thank You
>>>>>> Santosh
>>>>>>
>>>>>
>
> --
> Lei Xia
>

Re: Long running jobs and node drain

Posted by Lei Xia <xi...@gmail.com>.

Hi, Santosh

  One question, what exactly you need to do to bring a job from OFFLINE to
STARTUP? Can we simply use OFFLINE->UP->OFFINE model. From OFFLINE->UP you
will get the job started and ready to serve request.  From UP->OFFLINE you
will block there until job get drained.

 With this state model, you can start to drain a node by disabling it. Once
a node is disabled, Helix will send UP->OFFLINE transition to all
partitions on that node, in your implementation of UP->OFFLINE transition,
you block there until the job completes. Once the job (partition) on node-1
goes OFFLINE, Helix will bring up the job in node-2 (OFFLINE->UP).  Does
this work for you?  How long you would expect OFFLINE->UP take here, if it
is fast, the switch should be fast.


Lei



On Mon, May 11, 2020 at 9:02 PM santosh gujar <sa...@gmail.com>
wrote:

> Yes, there would be a database.
> So far i have following state model for partition.
> OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
> following
> 1. How to Trigger Drain (This is for example we decide to get node out for
> maintenance)
> 2. Once a drain has started, I expect helix rebalancer to kick in and move
> the partition simultaneously on another node in start_up mode.
> 3. Once All jobs  on node1 are done, need a manual way to trigger it to
> offline and move the other partition to UP state.
>
> It might be possible that my thinking is entirely wrong and how to fit it
> in helix model,  but essentially above is the sequence of i want achieve.
> Any pointers will be of great help. The constraint is that it's a long
> running jobs that cannot be moved immediately to other node.
>
> Regards,
> Santosh
>
> On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com> wrote:
>
>> I was thinking exactly in that direction - having two states is the right
>> thing to do. Before we get there, one more question -
>>
>> - when you get a request for a job, how do you know if that job is old or
>> new? Is there a database that provides the mapping between job and node
>>
>> On Mon, May 11, 2020 at 12:44 PM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>> Thank You Kishore,
>>>
>>> During drain process N2 will start new jobs, the requests related to old
>>> jobs need to go to N1 and requests for new jobs need to go to N2. Thus
>>> during drain on N1, the partition could be present on both nodes.
>>>
>>> My current thinking is that in helix somehow i need to model is
>>> as Partition P with two different states on these two nodes. . e.g. N1
>>> could have partition P in Drain State and N2 can have partition P in
>>> START_UP state.
>>> I don't know if my thinking about states is correct, but looking for any
>>> pointers.
>>>
>>> Regards
>>> Santosh
>>>
>>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com> wrote:
>>>
>>>> what  happens to request during the drain process i.e when you put N1
>>>> out of service and while N2 is waiting for N1 to finish the jobs, where
>>>> will the requests for P go to - N1 or N2
>>>>
>>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <
>>>> santosh.techie@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am looking for some clues or inputs on how to achieve following
>>>>>
>>>>> I am working on a service that involves running a statetful long
>>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>>> continue on other nodes.
>>>>>
>>>>> Problem Requirements :
>>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>>> involves J number of such jobs running on a node. (N1)
>>>>> 2. When I put the node in a drain, I want helix to assign a new node
>>>>> to this partition (P) is also started on the new node (N2).
>>>>>
>>>>> 3. N1 can be put out of service only when all running jobs (J) on it
>>>>> are over, at this point only N2 will serve P request.
>>>>>
>>>>> Questions :
>>>>> 1. Can drain process be modeled using helix?
>>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>>> 3. Is there any custom way to trigger state transitions? From
>>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>>> state transitions only when number of partitions change or cluster changes
>>>>> (node addition or deletion)
>>>>> 3.I guess  spectator will be needed, to custom routing logic in such
>>>>> cases, any pointers for the the same?
>>>>>
>>>>> Thank You
>>>>> Santosh
>>>>>
>>>>

-- 
Lei Xia

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Yes, there would be a database.
So far i have following state model for partition.
OFFLINE->STARTUP->UP->DRAIN->OFFLINE. But don't have / now to express
following
1. How to Trigger Drain (This is for example we decide to get node out for
maintenance)
2. Once a drain has started, I expect helix rebalancer to kick in and move
the partition simultaneously on another node in start_up mode.
3. Once All jobs  on node1 are done, need a manual way to trigger it to
offline and move the other partition to UP state.

It might be possible that my thinking is entirely wrong and how to fit it
in helix model,  but essentially above is the sequence of i want achieve.
Any pointers will be of great help. The constraint is that it's a long
running jobs that cannot be moved immediately to other node.

Regards,
Santosh

On Tue, May 12, 2020 at 1:25 AM kishore g <g....@gmail.com> wrote:

> I was thinking exactly in that direction - having two states is the right
> thing to do. Before we get there, one more question -
>
> - when you get a request for a job, how do you know if that job is old or
> new? Is there a database that provides the mapping between job and node
>
> On Mon, May 11, 2020 at 12:44 PM santosh gujar <sa...@gmail.com>
> wrote:
>
>> Thank You Kishore,
>>
>> During drain process N2 will start new jobs, the requests related to old
>> jobs need to go to N1 and requests for new jobs need to go to N2. Thus
>> during drain on N1, the partition could be present on both nodes.
>>
>> My current thinking is that in helix somehow i need to model is
>> as Partition P with two different states on these two nodes. . e.g. N1
>> could have partition P in Drain State and N2 can have partition P in
>> START_UP state.
>> I don't know if my thinking about states is correct, but looking for any
>> pointers.
>>
>> Regards
>> Santosh
>>
>> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com> wrote:
>>
>>> what  happens to request during the drain process i.e when you put N1
>>> out of service and while N2 is waiting for N1 to finish the jobs, where
>>> will the requests for P go to - N1 or N2
>>>
>>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am looking for some clues or inputs on how to achieve following
>>>>
>>>> I am working on a service that involves running a statetful long
>>>> running jobs on a node. These long running jobs cannot be preempted and
>>>> continue on other nodes.
>>>>
>>>> Problem Requirements :
>>>> 1. In helix nomenclature, I let's say an helix partition P that
>>>> involves J number of such jobs running on a node. (N1)
>>>> 2. When I put the node in a drain, I want helix to assign a new node to
>>>> this partition (P) is also started on the new node (N2).
>>>>
>>>> 3. N1 can be put out of service only when all running jobs (J) on it
>>>> are over, at this point only N2 will serve P request.
>>>>
>>>> Questions :
>>>> 1. Can drain process be modeled using helix?
>>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>>> 3. Is there any custom way to trigger state transitions? From
>>>> documentation, I gather that Helix controller in full auto mode, triggers
>>>> state transitions only when number of partitions change or cluster changes
>>>> (node addition or deletion)
>>>> 3.I guess  spectator will be needed, to custom routing logic in such
>>>> cases, any pointers for the the same?
>>>>
>>>> Thank You
>>>> Santosh
>>>>
>>>

Re: Long running jobs and node drain

Posted by kishore g <g....@gmail.com>.

I was thinking exactly in that direction - having two states is the right
thing to do. Before we get there, one more question -

- when you get a request for a job, how do you know if that job is old or
new? Is there a database that provides the mapping between job and node

On Mon, May 11, 2020 at 12:44 PM santosh gujar <sa...@gmail.com>
wrote:

> Thank You Kishore,
>
> During drain process N2 will start new jobs, the requests related to old
> jobs need to go to N1 and requests for new jobs need to go to N2. Thus
> during drain on N1, the partition could be present on both nodes.
>
> My current thinking is that in helix somehow i need to model is
> as Partition P with two different states on these two nodes. . e.g. N1
> could have partition P in Drain State and N2 can have partition P in
> START_UP state.
> I don't know if my thinking about states is correct, but looking for any
> pointers.
>
> Regards
> Santosh
>
> On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com> wrote:
>
>> what  happens to request during the drain process i.e when you put N1 out
>> of service and while N2 is waiting for N1 to finish the jobs, where will
>> the requests for P go to - N1 or N2
>>
>> On Mon, May 11, 2020 at 12:19 PM santosh gujar <sa...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am looking for some clues or inputs on how to achieve following
>>>
>>> I am working on a service that involves running a statetful long running
>>> jobs on a node. These long running jobs cannot be preempted and continue on
>>> other nodes.
>>>
>>> Problem Requirements :
>>> 1. In helix nomenclature, I let's say an helix partition P that involves
>>> J number of such jobs running on a node. (N1)
>>> 2. When I put the node in a drain, I want helix to assign a new node to
>>> this partition (P) is also started on the new node (N2).
>>>
>>> 3. N1 can be put out of service only when all running jobs (J) on it are
>>> over, at this point only N2 will serve P request.
>>>
>>> Questions :
>>> 1. Can drain process be modeled using helix?
>>> 2. If yes, Is there any recipe / pointers for a helix state model?
>>> 3. Is there any custom way to trigger state transitions? From
>>> documentation, I gather that Helix controller in full auto mode, triggers
>>> state transitions only when number of partitions change or cluster changes
>>> (node addition or deletion)
>>> 3.I guess  spectator will be needed, to custom routing logic in such
>>> cases, any pointers for the the same?
>>>
>>> Thank You
>>> Santosh
>>>
>>

Re: Long running jobs and node drain

Posted by santosh gujar <sa...@gmail.com>.

Thank You Kishore,

During drain process N2 will start new jobs, the requests related to old
jobs need to go to N1 and requests for new jobs need to go to N2. Thus
during drain on N1, the partition could be present on both nodes.

My current thinking is that in helix somehow i need to model is
as Partition P with two different states on these two nodes. . e.g. N1
could have partition P in Drain State and N2 can have partition P in
START_UP state.
I don't know if my thinking about states is correct, but looking for any
pointers.

Regards
Santosh

On Tue, May 12, 2020 at 1:01 AM kishore g <g....@gmail.com> wrote:

> what  happens to request during the drain process i.e when you put N1 out
> of service and while N2 is waiting for N1 to finish the jobs, where will
> the requests for P go to - N1 or N2
>
> On Mon, May 11, 2020 at 12:19 PM santosh gujar <sa...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am looking for some clues or inputs on how to achieve following
>>
>> I am working on a service that involves running a statetful long running
>> jobs on a node. These long running jobs cannot be preempted and continue on
>> other nodes.
>>
>> Problem Requirements :
>> 1. In helix nomenclature, I let's say an helix partition P that involves
>> J number of such jobs running on a node. (N1)
>> 2. When I put the node in a drain, I want helix to assign a new node to
>> this partition (P) is also started on the new node (N2).
>>
>> 3. N1 can be put out of service only when all running jobs (J) on it are
>> over, at this point only N2 will serve P request.
>>
>> Questions :
>> 1. Can drain process be modeled using helix?
>> 2. If yes, Is there any recipe / pointers for a helix state model?
>> 3. Is there any custom way to trigger state transitions? From
>> documentation, I gather that Helix controller in full auto mode, triggers
>> state transitions only when number of partitions change or cluster changes
>> (node addition or deletion)
>> 3.I guess  spectator will be needed, to custom routing logic in such
>> cases, any pointers for the the same?
>>
>> Thank You
>> Santosh
>>
>

Re: Long running jobs and node drain

Posted by kishore g <g....@gmail.com>.

what  happens to request during the drain process i.e when you put N1 out
of service and while N2 is waiting for N1 to finish the jobs, where will
the requests for P go to - N1 or N2

On Mon, May 11, 2020 at 12:19 PM santosh gujar <sa...@gmail.com>
wrote:

> Hello,
>
> I am looking for some clues or inputs on how to achieve following
>
> I am working on a service that involves running a statetful long running
> jobs on a node. These long running jobs cannot be preempted and continue on
> other nodes.
>
> Problem Requirements :
> 1. In helix nomenclature, I let's say an helix partition P that involves J
> number of such jobs running on a node. (N1)
> 2. When I put the node in a drain, I want helix to assign a new node to
> this partition (P) is also started on the new node (N2).
>
> 3. N1 can be put out of service only when all running jobs (J) on it are
> over, at this point only N2 will serve P request.
>
> Questions :
> 1. Can drain process be modeled using helix?
> 2. If yes, Is there any recipe / pointers for a helix state model?
> 3. Is there any custom way to trigger state transitions? From
> documentation, I gather that Helix controller in full auto mode, triggers
> state transitions only when number of partitions change or cluster changes
> (node addition or deletion)
> 3.I guess  spectator will be needed, to custom routing logic in such
> cases, any pointers for the the same?
>
> Thank You
> Santosh
>