You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexey Kuznetsov <ak...@apache.org> on 2017/06/29 19:22:14 UTC

Distributed scheduling

Hi, All!

I would like to start discussion about distributed scheduling.

So, Ignite already has a module "ignite-schedule" that provide API for
LOCAL scheduling on node.
And if node failed - schedule will be lost.

So, it will be very useful feature to have distributed scheduling.

Lets discuss how it could be implemented.

I see two options:
  1) Extend "ignite-schedule" module to have API for distributed scheduling.
  2) Extend compute API with methods that will allow scheduling of tasks on
cluster.
  3) Implement both of 1) and 2) ?

Any ideas and thought are welcomed!

-- 
Alexey Kuznetsov

Re: Distributed scheduling

Posted by Konstantin Boudnik <co...@apache.org>.
Seems like a simple API for an external cron-like service to invoke an
action in Ignite would suffice, no? There are about a million of
scheduling services like that already, some have very good integration
into orchestrators of all sorts. Perhaps a trivial REST would do
instead of durable services, etc.?

Cos
--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.


On Mon, Jul 3, 2017 at 3:35 PM, Valentin Kulichenko
<va...@gmail.com> wrote:
> Dmitry,
>
> Yes, this can be implemented using services in many cases, but:
>
> - It will require user to implement actual scheduling logic. It's quite a
> generic task, so I think it makes sense to have it directly on the API.
> - Most likely it will imply deploying separate service for each scheduled
> task. I don't think it's a very good idea.
> - Current services implementation is not durable. If cluster is restarted,
> all services are lost.
>
> -Val
>
> On Sat, Jul 1, 2017 at 12:34 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
>> Val,
>>
>> In this case, we should have a notion of a named scheduler and ensure that
>> we don't schedule the same task more than once. This is beginning to look
>> more like a durable cluster singleton service, no?
>>
>> D.
>>
>> On Fri, Jun 30, 2017 at 1:39 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>> > I think this functionality should provide durable way of scheduled task
>> or
>> > closure execution on the cluster. Job descriptors should be persisted on
>> > server side and executed there.
>> >
>> > As for API, I believe this should be part of Compute Grid. I suggest to
>> > introduce IgniteCompute#withSchedulingPolicy(SchedulingPolicy policy)
>> > method, where SchedulingPolicy is smth like this:
>> >
>> > public interface SchedulingPolicy {
>> >     /**
>> >      * @return Timestamp of next execution.
>> >      */
>> >     public Date nextTime();
>> > }
>> >
>> > This will enable scheduling for all compute features (tasks, callables,
>> > closures, etc.) and also very flexible. Policy implementation can provide
>> > simple periodic scheduling, scheduling based on Cron or anything else.
>> >
>> > Thoughts?
>> >
>> > -Val
>> >
>> > On Fri, Jun 30, 2017 at 7:55 AM, Dmitriy Setrakyan <
>> dsetrakyan@apache.org>
>> > wrote:
>> >
>> > > On Fri, Jun 30, 2017 at 12:29 AM, Alexey Kuznetsov <
>> > akuznetsov@apache.org>
>> > > wrote:
>> > >
>> > > > Dmitriy,
>> > > >
>> > > > >> Can you provide a simple example of API calls that will make this
>> > > > possible?
>> > > > API could be like this:
>> > > > 1) via scheduler:
>> > > > Ignite ignite = Ignition.start(....);
>> > > >
>> > > > ignite.scheduler().schedulel(job, "0 0 * * *"); // This will execute
>> > job
>> > > > every day at 00:00
>> > > >
>> > > > 2) via compute
>> > > >
>> > > > Ignite ignite = Ignition.start(....);
>> > > >
>> > > > ignite.compute().schedulel(task, "0 0 * * *"); // This will execute
>> > > > compute
>> > > > task every day at 00:00
>> > > >
>> > > > Make sense?
>> > > >
>> > > >
>> > > Yes, it does, but I am failing to see how is this a *distributed*
>> > > scheduling. Are we persisting the scheduler somewhere in the cluster or
>> > is
>> > > it only triggered on the client side?
>> > >
>> >
>>

Re: Distributed scheduling

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Hm... I think we should definitely make our services durable. Everything in
Ignite should be durable now.

As far as scheduling, makes sense as well. Let's make it durable too.

D.

On Mon, Jul 3, 2017 at 3:35 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Dmitry,
>
> Yes, this can be implemented using services in many cases, but:
>
> - It will require user to implement actual scheduling logic. It's quite a
> generic task, so I think it makes sense to have it directly on the API.
> - Most likely it will imply deploying separate service for each scheduled
> task. I don't think it's a very good idea.
> - Current services implementation is not durable. If cluster is restarted,
> all services are lost.
>
> -Val
>
> On Sat, Jul 1, 2017 at 12:34 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > Val,
> >
> > In this case, we should have a notion of a named scheduler and ensure
> that
> > we don't schedule the same task more than once. This is beginning to look
> > more like a durable cluster singleton service, no?
> >
> > D.
> >
> > On Fri, Jun 30, 2017 at 1:39 PM, Valentin Kulichenko <
> > valentin.kulichenko@gmail.com> wrote:
> >
> > > I think this functionality should provide durable way of scheduled task
> > or
> > > closure execution on the cluster. Job descriptors should be persisted
> on
> > > server side and executed there.
> > >
> > > As for API, I believe this should be part of Compute Grid. I suggest to
> > > introduce IgniteCompute#withSchedulingPolicy(SchedulingPolicy policy)
> > > method, where SchedulingPolicy is smth like this:
> > >
> > > public interface SchedulingPolicy {
> > >     /**
> > >      * @return Timestamp of next execution.
> > >      */
> > >     public Date nextTime();
> > > }
> > >
> > > This will enable scheduling for all compute features (tasks, callables,
> > > closures, etc.) and also very flexible. Policy implementation can
> provide
> > > simple periodic scheduling, scheduling based on Cron or anything else.
> > >
> > > Thoughts?
> > >
> > > -Val
> > >
> > > On Fri, Jun 30, 2017 at 7:55 AM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org>
> > > wrote:
> > >
> > > > On Fri, Jun 30, 2017 at 12:29 AM, Alexey Kuznetsov <
> > > akuznetsov@apache.org>
> > > > wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > >> Can you provide a simple example of API calls that will make
> this
> > > > > possible?
> > > > > API could be like this:
> > > > > 1) via scheduler:
> > > > > Ignite ignite = Ignition.start(....);
> > > > >
> > > > > ignite.scheduler().schedulel(job, "0 0 * * *"); // This will
> execute
> > > job
> > > > > every day at 00:00
> > > > >
> > > > > 2) via compute
> > > > >
> > > > > Ignite ignite = Ignition.start(....);
> > > > >
> > > > > ignite.compute().schedulel(task, "0 0 * * *"); // This will
> execute
> > > > > compute
> > > > > task every day at 00:00
> > > > >
> > > > > Make sense?
> > > > >
> > > > >
> > > > Yes, it does, but I am failing to see how is this a *distributed*
> > > > scheduling. Are we persisting the scheduler somewhere in the cluster
> or
> > > is
> > > > it only triggered on the client side?
> > > >
> > >
> >
>

Re: Distributed scheduling

Posted by Valentin Kulichenko <va...@gmail.com>.
Dmitry,

Yes, this can be implemented using services in many cases, but:

- It will require user to implement actual scheduling logic. It's quite a
generic task, so I think it makes sense to have it directly on the API.
- Most likely it will imply deploying separate service for each scheduled
task. I don't think it's a very good idea.
- Current services implementation is not durable. If cluster is restarted,
all services are lost.

-Val

On Sat, Jul 1, 2017 at 12:34 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Val,
>
> In this case, we should have a notion of a named scheduler and ensure that
> we don't schedule the same task more than once. This is beginning to look
> more like a durable cluster singleton service, no?
>
> D.
>
> On Fri, Jun 30, 2017 at 1:39 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > I think this functionality should provide durable way of scheduled task
> or
> > closure execution on the cluster. Job descriptors should be persisted on
> > server side and executed there.
> >
> > As for API, I believe this should be part of Compute Grid. I suggest to
> > introduce IgniteCompute#withSchedulingPolicy(SchedulingPolicy policy)
> > method, where SchedulingPolicy is smth like this:
> >
> > public interface SchedulingPolicy {
> >     /**
> >      * @return Timestamp of next execution.
> >      */
> >     public Date nextTime();
> > }
> >
> > This will enable scheduling for all compute features (tasks, callables,
> > closures, etc.) and also very flexible. Policy implementation can provide
> > simple periodic scheduling, scheduling based on Cron or anything else.
> >
> > Thoughts?
> >
> > -Val
> >
> > On Fri, Jun 30, 2017 at 7:55 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> > wrote:
> >
> > > On Fri, Jun 30, 2017 at 12:29 AM, Alexey Kuznetsov <
> > akuznetsov@apache.org>
> > > wrote:
> > >
> > > > Dmitriy,
> > > >
> > > > >> Can you provide a simple example of API calls that will make this
> > > > possible?
> > > > API could be like this:
> > > > 1) via scheduler:
> > > > Ignite ignite = Ignition.start(....);
> > > >
> > > > ignite.scheduler().schedulel(job, "0 0 * * *"); // This will execute
> > job
> > > > every day at 00:00
> > > >
> > > > 2) via compute
> > > >
> > > > Ignite ignite = Ignition.start(....);
> > > >
> > > > ignite.compute().schedulel(task, "0 0 * * *"); // This will execute
> > > > compute
> > > > task every day at 00:00
> > > >
> > > > Make sense?
> > > >
> > > >
> > > Yes, it does, but I am failing to see how is this a *distributed*
> > > scheduling. Are we persisting the scheduler somewhere in the cluster or
> > is
> > > it only triggered on the client side?
> > >
> >
>

Re: Distributed scheduling

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Val,

In this case, we should have a notion of a named scheduler and ensure that
we don't schedule the same task more than once. This is beginning to look
more like a durable cluster singleton service, no?

D.

On Fri, Jun 30, 2017 at 1:39 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> I think this functionality should provide durable way of scheduled task or
> closure execution on the cluster. Job descriptors should be persisted on
> server side and executed there.
>
> As for API, I believe this should be part of Compute Grid. I suggest to
> introduce IgniteCompute#withSchedulingPolicy(SchedulingPolicy policy)
> method, where SchedulingPolicy is smth like this:
>
> public interface SchedulingPolicy {
>     /**
>      * @return Timestamp of next execution.
>      */
>     public Date nextTime();
> }
>
> This will enable scheduling for all compute features (tasks, callables,
> closures, etc.) and also very flexible. Policy implementation can provide
> simple periodic scheduling, scheduling based on Cron or anything else.
>
> Thoughts?
>
> -Val
>
> On Fri, Jun 30, 2017 at 7:55 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > On Fri, Jun 30, 2017 at 12:29 AM, Alexey Kuznetsov <
> akuznetsov@apache.org>
> > wrote:
> >
> > > Dmitriy,
> > >
> > > >> Can you provide a simple example of API calls that will make this
> > > possible?
> > > API could be like this:
> > > 1) via scheduler:
> > > Ignite ignite = Ignition.start(....);
> > >
> > > ignite.scheduler().schedulel(job, "0 0 * * *"); // This will execute
> job
> > > every day at 00:00
> > >
> > > 2) via compute
> > >
> > > Ignite ignite = Ignition.start(....);
> > >
> > > ignite.compute().schedulel(task, "0 0 * * *"); // This will execute
> > > compute
> > > task every day at 00:00
> > >
> > > Make sense?
> > >
> > >
> > Yes, it does, but I am failing to see how is this a *distributed*
> > scheduling. Are we persisting the scheduler somewhere in the cluster or
> is
> > it only triggered on the client side?
> >
>

Re: Distributed scheduling

Posted by Valentin Kulichenko <va...@gmail.com>.
I think this functionality should provide durable way of scheduled task or
closure execution on the cluster. Job descriptors should be persisted on
server side and executed there.

As for API, I believe this should be part of Compute Grid. I suggest to
introduce IgniteCompute#withSchedulingPolicy(SchedulingPolicy policy)
method, where SchedulingPolicy is smth like this:

public interface SchedulingPolicy {
    /**
     * @return Timestamp of next execution.
     */
    public Date nextTime();
}

This will enable scheduling for all compute features (tasks, callables,
closures, etc.) and also very flexible. Policy implementation can provide
simple periodic scheduling, scheduling based on Cron or anything else.

Thoughts?

-Val

On Fri, Jun 30, 2017 at 7:55 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> On Fri, Jun 30, 2017 at 12:29 AM, Alexey Kuznetsov <ak...@apache.org>
> wrote:
>
> > Dmitriy,
> >
> > >> Can you provide a simple example of API calls that will make this
> > possible?
> > API could be like this:
> > 1) via scheduler:
> > Ignite ignite = Ignition.start(....);
> >
> > ignite.scheduler().schedulel(job, "0 0 * * *"); // This will execute job
> > every day at 00:00
> >
> > 2) via compute
> >
> > Ignite ignite = Ignition.start(....);
> >
> > ignite.compute().schedulel(task, "0 0 * * *"); // This will execute
> > compute
> > task every day at 00:00
> >
> > Make sense?
> >
> >
> Yes, it does, but I am failing to see how is this a *distributed*
> scheduling. Are we persisting the scheduler somewhere in the cluster or is
> it only triggered on the client side?
>

Re: Distributed scheduling

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Fri, Jun 30, 2017 at 12:29 AM, Alexey Kuznetsov <ak...@apache.org>
wrote:

> Dmitriy,
>
> >> Can you provide a simple example of API calls that will make this
> possible?
> API could be like this:
> 1) via scheduler:
> Ignite ignite = Ignition.start(....);
>
> ignite.scheduler().schedulel(job, "0 0 * * *"); // This will execute job
> every day at 00:00
>
> 2) via compute
>
> Ignite ignite = Ignition.start(....);
>
> ignite.compute().schedulel(task, "0 0 * * *"); // This will execute
> compute
> task every day at 00:00
>
> Make sense?
>
>
Yes, it does, but I am failing to see how is this a *distributed*
scheduling. Are we persisting the scheduler somewhere in the cluster or is
it only triggered on the client side?

Re: Distributed scheduling

Posted by Alexey Kuznetsov <ak...@apache.org>.
Dmitriy,

>> Can you provide a simple example of API calls that will make this
possible?
API could be like this:
1) via scheduler:
Ignite ignite = Ignition.start(....);

ignite.scheduler().schedulel(job, "0 0 * * *"); // This will execute job
every day at 00:00

2) via compute

Ignite ignite = Ignition.start(....);

ignite.compute().schedulel(task, "0 0 * * *"); // This will execute compute
task every day at 00:00

Make sense?


On Fri, Jun 30, 2017 at 12:56 PM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> I am still not clear how it can be used or useful. Can you provide a simple
> example of API calls that will make this possible?
>
> On Thu, Jun 29, 2017 at 7:57 PM, Alexey Kuznetsov <ak...@apache.org>
> wrote:
>
> > Hi,
> >
> > >> Alexey, why do you think it will be useful?
> >
> > I need to execute some tasks periodically on cluster. I think it is a
> > common task.
> > I could aggregate data once a day, I could generate reports and so on...
> >
> > Nodes can fail, cluster could be restarted. And with new persistence
> > feature distributed scheduling
> >  that survives cluster restart could be implemented.
> >
> > >>A similar topic was raised and discussed some time ago:
> > >>http://apache-ignite-developers.2346864.n4.nabble.
> > com/Tasks-Scheduling-and-Chaining-td14293.html
> >
> > I read that topic it is a bit different from my point of view.
> > I'm talking only about periodical or one-time planned jobs on cluster
> that
> > will be executed with some guaranties.
> >
> > But we also can take into account that use-case.
> >
> >
> > On Fri, Jun 30, 2017 at 5:53 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> > wrote:
> >
> > > Alexey, why do you think it will be useful?
> > >
> > > On Thu, Jun 29, 2017 at 12:22 PM, Alexey Kuznetsov <
> > akuznetsov@apache.org>
> > > wrote:
> > >
> > > > Hi, All!
> > > >
> > > > I would like to start discussion about distributed scheduling.
> > > >
> > > > So, Ignite already has a module "ignite-schedule" that provide API
> for
> > > > LOCAL scheduling on node.
> > > > And if node failed - schedule will be lost.
> > > >
> > > > So, it will be very useful feature to have distributed scheduling.
> > > >
> > > > Lets discuss how it could be implemented.
> > > >
> > > > I see two options:
> > > >   1) Extend "ignite-schedule" module to have API for distributed
> > > > scheduling.
> > > >   2) Extend compute API with methods that will allow scheduling of
> > tasks
> > > on
> > > > cluster.
> > > >   3) Implement both of 1) and 2) ?
> > > >
> > > > Any ideas and thought are welcomed!
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>



-- 
Alexey Kuznetsov

Re: Distributed scheduling

Posted by Dmitriy Setrakyan <ds...@apache.org>.
I am still not clear how it can be used or useful. Can you provide a simple
example of API calls that will make this possible?

On Thu, Jun 29, 2017 at 7:57 PM, Alexey Kuznetsov <ak...@apache.org>
wrote:

> Hi,
>
> >> Alexey, why do you think it will be useful?
>
> I need to execute some tasks periodically on cluster. I think it is a
> common task.
> I could aggregate data once a day, I could generate reports and so on...
>
> Nodes can fail, cluster could be restarted. And with new persistence
> feature distributed scheduling
>  that survives cluster restart could be implemented.
>
> >>A similar topic was raised and discussed some time ago:
> >>http://apache-ignite-developers.2346864.n4.nabble.
> com/Tasks-Scheduling-and-Chaining-td14293.html
>
> I read that topic it is a bit different from my point of view.
> I'm talking only about periodical or one-time planned jobs on cluster that
> will be executed with some guaranties.
>
> But we also can take into account that use-case.
>
>
> On Fri, Jun 30, 2017 at 5:53 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > Alexey, why do you think it will be useful?
> >
> > On Thu, Jun 29, 2017 at 12:22 PM, Alexey Kuznetsov <
> akuznetsov@apache.org>
> > wrote:
> >
> > > Hi, All!
> > >
> > > I would like to start discussion about distributed scheduling.
> > >
> > > So, Ignite already has a module "ignite-schedule" that provide API for
> > > LOCAL scheduling on node.
> > > And if node failed - schedule will be lost.
> > >
> > > So, it will be very useful feature to have distributed scheduling.
> > >
> > > Lets discuss how it could be implemented.
> > >
> > > I see two options:
> > >   1) Extend "ignite-schedule" module to have API for distributed
> > > scheduling.
> > >   2) Extend compute API with methods that will allow scheduling of
> tasks
> > on
> > > cluster.
> > >   3) Implement both of 1) and 2) ?
> > >
> > > Any ideas and thought are welcomed!
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>

Re: Distributed scheduling

Posted by Alexey Kuznetsov <ak...@apache.org>.
Hi,

>> Alexey, why do you think it will be useful?

I need to execute some tasks periodically on cluster. I think it is a
common task.
I could aggregate data once a day, I could generate reports and so on...

Nodes can fail, cluster could be restarted. And with new persistence
feature distributed scheduling
 that survives cluster restart could be implemented.

>>A similar topic was raised and discussed some time ago:
>>http://apache-ignite-developers.2346864.n4.nabble.
com/Tasks-Scheduling-and-Chaining-td14293.html

I read that topic it is a bit different from my point of view.
I'm talking only about periodical or one-time planned jobs on cluster that
will be executed with some guaranties.

But we also can take into account that use-case.


On Fri, Jun 30, 2017 at 5:53 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Alexey, why do you think it will be useful?
>
> On Thu, Jun 29, 2017 at 12:22 PM, Alexey Kuznetsov <ak...@apache.org>
> wrote:
>
> > Hi, All!
> >
> > I would like to start discussion about distributed scheduling.
> >
> > So, Ignite already has a module "ignite-schedule" that provide API for
> > LOCAL scheduling on node.
> > And if node failed - schedule will be lost.
> >
> > So, it will be very useful feature to have distributed scheduling.
> >
> > Lets discuss how it could be implemented.
> >
> > I see two options:
> >   1) Extend "ignite-schedule" module to have API for distributed
> > scheduling.
> >   2) Extend compute API with methods that will allow scheduling of tasks
> on
> > cluster.
> >   3) Implement both of 1) and 2) ?
> >
> > Any ideas and thought are welcomed!
> >
> > --
> > Alexey Kuznetsov
> >
>



-- 
Alexey Kuznetsov

Re: Distributed scheduling

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Alexey, why do you think it will be useful?

On Thu, Jun 29, 2017 at 12:22 PM, Alexey Kuznetsov <ak...@apache.org>
wrote:

> Hi, All!
>
> I would like to start discussion about distributed scheduling.
>
> So, Ignite already has a module "ignite-schedule" that provide API for
> LOCAL scheduling on node.
> And if node failed - schedule will be lost.
>
> So, it will be very useful feature to have distributed scheduling.
>
> Lets discuss how it could be implemented.
>
> I see two options:
>   1) Extend "ignite-schedule" module to have API for distributed
> scheduling.
>   2) Extend compute API with methods that will allow scheduling of tasks on
> cluster.
>   3) Implement both of 1) and 2) ?
>
> Any ideas and thought are welcomed!
>
> --
> Alexey Kuznetsov
>

Re: Distributed scheduling

Posted by Denis Magda <dm...@apache.org>.
Hi Alex,

A similar topic was raised and discussed some time ago:
http://apache-ignite-developers.2346864.n4.nabble.com/Tasks-Scheduling-and-Chaining-td14293.html <http://apache-ignite-developers.2346864.n4.nabble.com/Tasks-Scheduling-and-Chaining-td14293.html>

Probably, we need to reincarnate that thread with the inputs from your side.

—
Denis

> On Jun 29, 2017, at 12:22 PM, Alexey Kuznetsov <ak...@apache.org> wrote:
> 
> Hi, All!
> 
> I would like to start discussion about distributed scheduling.
> 
> So, Ignite already has a module "ignite-schedule" that provide API for
> LOCAL scheduling on node.
> And if node failed - schedule will be lost.
> 
> So, it will be very useful feature to have distributed scheduling.
> 
> Lets discuss how it could be implemented.
> 
> I see two options:
>  1) Extend "ignite-schedule" module to have API for distributed scheduling.
>  2) Extend compute API with methods that will allow scheduling of tasks on
> cluster.
>  3) Implement both of 1) and 2) ?
> 
> Any ideas and thought are welcomed!
> 
> -- 
> Alexey Kuznetsov