You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Brian Candler <b....@pobox.com> on 2015/10/07 09:56:14 UTC

Batch/queue frameworks?

Are there any open-source job queue/batch systems which run under Mesos? 
I am thinking of things like HTCondor, Torque etc.

The requirement is to be able to:
- define an overall job as a set of sub-tasks (could be many thousands)
- put sub-tasks into a queue; execute tasks from the queue
- dependencies: don't add a sub-task into the queue until its precursors 
have completed successfully
- restart: after an error, be able to restart the job but skipping those 
sub-tasks which completed successfully
- preferably handle short-lived tasks efficiently (of order of 10 
seconds duration)

Clearly it's possible to write a framework to do this, but I don't want 
to re-invent the wheel if it has been done already.

Thanks,

Brian.

P.S. I found Chronos, but it doesn't seem a good match. As far as I can 
see, it's intended for applications where you pre-define a bunch of 
tasks (via GUI? via REST?) and then trigger them periodically.

Re: Batch/queue frameworks?

Posted by James DeFelice <ja...@gmail.com>.

The OP might also be interested in Stolos:
https://github.com/sailthru/stolos

combined with Relay: https://github.com/sailthru/relay


On Wed, Oct 7, 2015 at 8:15 AM, Clarke, Trevor <tc...@ball.com> wrote:

> I'm currently working on this sort of framework. Unfortunately, source is
> not currently available but there is a plan to open source in the next
> couple of months. I'm not sure if your need is immediate or if it can wait
> for a bit. The framework handles jobs in docker containers with pre and
> post steps (copy data into the node, products out, etc.) Individual jobs
> can be strung together in a DAG for complex processing. Directories can be
> watched for new data and jobs can be started in response to this data.
>
> ________________________________________
> From: Brian Candler [b.candler@pobox.com]
> Sent: Wednesday, October 07, 2015 3:56 AM
> To: user@mesos.apache.org
> Subject: Batch/queue frameworks?
>
> Are there any open-source job queue/batch systems which run under Mesos?
> I am thinking of things like HTCondor, Torque etc.
>
> The requirement is to be able to:
> - define an overall job as a set of sub-tasks (could be many thousands)
> - put sub-tasks into a queue; execute tasks from the queue
> - dependencies: don't add a sub-task into the queue until its precursors
> have completed successfully
> - restart: after an error, be able to restart the job but skipping those
> sub-tasks which completed successfully
> - preferably handle short-lived tasks efficiently (of order of 10
> seconds duration)
>
> Clearly it's possible to write a framework to do this, but I don't want
> to re-invent the wheel if it has been done already.
>
> Thanks,
>
> Brian.
>
> P.S. I found Chronos, but it doesn't seem a good match. As far as I can
> see, it's intended for applications where you pre-define a bunch of
> tasks (via GUI? via REST?) and then trigger them periodically.
>
>
>
> This message and any enclosures are intended only for the addressee.
> Please
> notify the sender by email if you are not the intended recipient.  If you
> are
> not the intended recipient, you may not use, copy, disclose, or distribute
> this
> message or its contents or enclosures to any other person and any such
> actions
> may be unlawful.  Ball reserves the right to monitor and review all
> messages
> and enclosures sent to or from this email address.
>



-- 
James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

RE: Batch/queue frameworks?

Posted by "Clarke, Trevor" <tc...@ball.com>.

I'm currently working on this sort of framework. Unfortunately, source is not currently available but there is a plan to open source in the next couple of months. I'm not sure if your need is immediate or if it can wait for a bit. The framework handles jobs in docker containers with pre and post steps (copy data into the node, products out, etc.) Individual jobs can be strung together in a DAG for complex processing. Directories can be watched for new data and jobs can be started in response to this data.

________________________________________
From: Brian Candler [b.candler@pobox.com]
Sent: Wednesday, October 07, 2015 3:56 AM
To: user@mesos.apache.org
Subject: Batch/queue frameworks?

Are there any open-source job queue/batch systems which run under Mesos?
I am thinking of things like HTCondor, Torque etc.

The requirement is to be able to:
- define an overall job as a set of sub-tasks (could be many thousands)
- put sub-tasks into a queue; execute tasks from the queue
- dependencies: don't add a sub-task into the queue until its precursors
have completed successfully
- restart: after an error, be able to restart the job but skipping those
sub-tasks which completed successfully
- preferably handle short-lived tasks efficiently (of order of 10
seconds duration)

Clearly it's possible to write a framework to do this, but I don't want
to re-invent the wheel if it has been done already.

Thanks,

Brian.

P.S. I found Chronos, but it doesn't seem a good match. As far as I can
see, it's intended for applications where you pre-define a bunch of
tasks (via GUI? via REST?) and then trigger them periodically.

This message and any enclosures are intended only for the addressee.  Please 
notify the sender by email if you are not the intended recipient.  If you are 
not the intended recipient, you may not use, copy, disclose, or distribute this 
message or its contents or enclosures to any other person and any such actions 
may be unlawful.  Ball reserves the right to monitor and review all messages 
and enclosures sent to or from this email address.

Re: Batch/queue frameworks?

Posted by Nikolaos Ballas neXus <ni...@nexusgroup.com>.

I think any pub/sub system(name it typical jms / rabbitmq/ kafka) etc would do what you describe. All of them can be run as containers inside apache mess cluster. Kafka has really good integration with MEsos and YARN and also is more lightweight than a typical jus implementation.

regards
\n\m
On 07 Oct 2015, at 12:05, F21 <f2...@gmail.com>> wrote:

I am also interested in something like this, although my requirements are much more simpler.

I am interested in a work queue like beanstalkd that will allow me to push to a queue from a web app and have workers to do things like send emails, generate pdfs and resize images.

I have thought about running a beanstalkd in a container, but it has some limitations. For example, if it crashes, it needs to be relaunched manually to recover the binlog (which is a no go).

Another option I can think of is to use kafka (which has a mesos framework) and have the web app and other parts push jobs into the kafka broker. Workers listening on the broker would pop each job off and execute whatever needs to be done.

However, there seems to be a lot of wheel-reinventing what that solution. For example, what if a job depends on another job? There's also a lot of work that needs to be done at a lower level when all I am interested in is to write domain specific code to generate the pdf, resize the image etc.

If there's a work queue solution for mesos, I would love to know too.

On 7/10/2015 8:08 PM, Brian Candler wrote:
On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
Maybe you need to read a bit :)
I have read plenty, including those you list, and I didn't find anything which met my requirements. Again I apologise if I was not clear in my question.

Spark has a very specific data model (RDDs) and applications which write to its API. I want to run arbitrary compute jobs - think "shell scripts" or "docker containers" which run pre-existing applications which I can't change. And I want to fill a queue or pipeline with those jobs.

Hadoop also is for specific workloads, written to run under Hadoop and preferably using HDFS.

The nearest Hadoop gets to general-purpose computing, as far as I can see, is its YARN scheduler. YARN can in turn run under Mesos. Therefore a job queue which can run on YARN might be acceptable, although I'd rather not have an additional layer in the stack. (There was an old project for running Torque under YARN, but this has been abandoned)

Regards,

Brian.

Nikolaos Ballas | Software Development Manager

Technology Nexus S.a.r.l.
2-4 Rue Eugene Rupert
2453 Luxembourg
Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building
Tel: + 3522619113580
contact@nexusgroup.com<ma...@nexusgroup.com> | nexusgroup.com<http://www.nexusgroup.com/>
LinkedIn.com<http://www.linkedin.com/company/nexus-technology> | Twitter<http://www.twitter.com/technologynexus> | Facebook.com<https://www.facebook.com/pages/Technology-Nexus/133756470003189>

[cid:87987ACD-6CF7-41BE-9517-E612DBF86ABA@pwcacc.com]
\

Re: Batch/queue frameworks?

Posted by David Greenberg <ds...@gmail.com>.

Another great option is Cook: https://github.com/twosigma/Cook

Cook combines a simple REST API for batch jobs with sophisticated
fair-sharing and preemption features on Mesos. Tomorrow, at MesosCon
Europe, I'll be speaking about it in more detail. When we want to use
dependencies with Cook, we use a workflow tool that creates the dependent
jobs on-the-fly.

On Wed, Oct 7, 2015 at 11:08 AM Pablo Cingolani <pa...@gmail.com>
wrote:

>
> It looks like you are looking for something like BDS
>
>   http://pcingola.github.io/BigDataScript/
>
> It has the additional advantage that you can port your scripts seamlessly
> between Mesos and other cluster systems (SGE, PBS, Torque, etc.).
>
>
>
>
>
> On Wed, Oct 7, 2015 at 7:05 AM, F21 <f2...@gmail.com> wrote:
>
>> I am also interested in something like this, although my requirements are
>> much more simpler.
>>
>> I am interested in a work queue like beanstalkd that will allow me to
>> push to a queue from a web app and have workers to do things like send
>> emails, generate pdfs and resize images.
>>
>> I have thought about running a beanstalkd in a container, but it has some
>> limitations. For example, if it crashes, it needs to be relaunched manually
>> to recover the binlog (which is a no go).
>>
>> Another option I can think of is to use kafka (which has a mesos
>> framework) and have the web app and other parts push jobs into the kafka
>> broker. Workers listening on the broker would pop each job off and execute
>> whatever needs to be done.
>>
>> However, there seems to be a lot of wheel-reinventing what that solution.
>> For example, what if a job depends on another job? There's also a lot of
>> work that needs to be done at a lower level when all I am interested in is
>> to write domain specific code to generate the pdf, resize the image etc.
>>
>> If there's a work queue solution for mesos, I would love to know too.
>>
>>
>>
>>
>> On 7/10/2015 8:08 PM, Brian Candler wrote:
>>
>> On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
>>
>> Maybe you need to read a bit  :)
>>
>> I have read plenty, including those you list, and I didn't find anything
>> which met my requirements. Again I apologise if I was not clear in my
>> question.
>>
>> Spark has a very specific data model (RDDs) and applications which write
>> to its API. I want to run arbitrary compute jobs - think "shell scripts" or
>> "docker containers" which run pre-existing applications which I can't
>> change.  And I want to fill a queue or pipeline with those jobs.
>>
>> Hadoop also is for specific workloads, written to run under Hadoop and
>> preferably using HDFS.
>>
>> The nearest Hadoop gets to general-purpose computing, as far as I can
>> see, is its YARN scheduler. YARN can in turn run under Mesos. Therefore a
>> job queue which can run on YARN might be acceptable, although I'd rather
>> not have an additional layer in the stack. (There was an old project for
>> running Torque under YARN, but this has been abandoned)
>>
>> Regards,
>>
>> Brian.
>>
>>
>>
>

Re: Batch/queue frameworks?

Posted by Pablo Cingolani <pa...@gmail.com>.

I answer below...


On Wed, Oct 7, 2015 at 8:17 AM, Brian Candler <b....@pobox.com> wrote:

> On 07/10/2015 11:08, Pablo Cingolani wrote:
>
> It looks like you are looking for something like BDS
>
>   http://pcingola.github.io/BigDataScript/
>
> It has the additional advantage that you can port your scripts seamlessly
> between Mesos and other cluster systems (SGE, PBS, Torque, etc.).
>
> Yes, that looks very interesting, thank you!  It seems to perform the same
> role as HTCondor Dagman, but with pluggable backends and a much more
> expressive language.
>
> At http://pcingola.github.io/BigDataScript/bigDataScript_manual.html
> under "Resource consumption and task options", I don't see any option for
> declaring the memory used by a task. Is that a wishlist feature?
>

You can use "mem=NNN" to specify memory requirements.


> In fact, mesos allows arbitrary resources, so it would be good to be able
> to specify resource requirements of
> any particular resource.
>

Arbitrary resources are not supported yet.



>
> I note that BDS allows a task to specify it runs on one particular cluster
> node. In my application it would also be helpful to be able to specify a
> particular class of node. (When submitting a job to HTCondor this could be
> expanded to a requirements expression)
>

Typically this is done using "queue" type in other clusters.
At the moment (for Mesos systems) this parameter is mostly
ignored, but I can add support it if you need it.
Yours

    Pablo



>
> Regards,
>
> Brian.
>
>

Re: Batch/queue frameworks?

Posted by Brian Candler <b....@pobox.com>.

On 07/10/2015 11:08, Pablo Cingolani wrote:
> It looks like you are looking for something like BDS
>
> http://pcingola.github.io/BigDataScript/
>
> It has the additional advantage that you can port your scripts seamlessly
> between Mesos and other cluster systems (SGE, PBS, Torque, etc.).
>
Yes, that looks very interesting, thank you!  It seems to perform the 
same role as HTCondor Dagman, but with pluggable backends and a much 
more expressive language.

At http://pcingola.github.io/BigDataScript/bigDataScript_manual.html
under "Resource consumption and task options", I don't see any option 
for declaring the memory used by a task. Is that a wishlist feature? In 
fact, mesos allows arbitrary resources, so it would be good to be able 
to specify resource requirements of
any particular resource.

I note that BDS allows a task to specify it runs on one particular 
cluster node. In my application it would also be helpful to be able to 
specify a particular class of node. (When submitting a job to HTCondor 
this could be expanded to a requirements expression)

Regards,

Brian.

Re: Batch/queue frameworks?

Posted by Pablo Cingolani <pa...@gmail.com>.

It looks like you are looking for something like BDS

  http://pcingola.github.io/BigDataScript/

It has the additional advantage that you can port your scripts seamlessly
between Mesos and other cluster systems (SGE, PBS, Torque, etc.).





On Wed, Oct 7, 2015 at 7:05 AM, F21 <f2...@gmail.com> wrote:

> I am also interested in something like this, although my requirements are
> much more simpler.
>
> I am interested in a work queue like beanstalkd that will allow me to push
> to a queue from a web app and have workers to do things like send emails,
> generate pdfs and resize images.
>
> I have thought about running a beanstalkd in a container, but it has some
> limitations. For example, if it crashes, it needs to be relaunched manually
> to recover the binlog (which is a no go).
>
> Another option I can think of is to use kafka (which has a mesos
> framework) and have the web app and other parts push jobs into the kafka
> broker. Workers listening on the broker would pop each job off and execute
> whatever needs to be done.
>
> However, there seems to be a lot of wheel-reinventing what that solution.
> For example, what if a job depends on another job? There's also a lot of
> work that needs to be done at a lower level when all I am interested in is
> to write domain specific code to generate the pdf, resize the image etc.
>
> If there's a work queue solution for mesos, I would love to know too.
>
>
>
>
> On 7/10/2015 8:08 PM, Brian Candler wrote:
>
> On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
>
> Maybe you need to read a bit  :)
>
> I have read plenty, including those you list, and I didn't find anything
> which met my requirements. Again I apologise if I was not clear in my
> question.
>
> Spark has a very specific data model (RDDs) and applications which write
> to its API. I want to run arbitrary compute jobs - think "shell scripts" or
> "docker containers" which run pre-existing applications which I can't
> change.  And I want to fill a queue or pipeline with those jobs.
>
> Hadoop also is for specific workloads, written to run under Hadoop and
> preferably using HDFS.
>
> The nearest Hadoop gets to general-purpose computing, as far as I can see,
> is its YARN scheduler. YARN can in turn run under Mesos. Therefore a job
> queue which can run on YARN might be acceptable, although I'd rather not
> have an additional layer in the stack. (There was an old project for
> running Torque under YARN, but this has been abandoned)
>
> Regards,
>
> Brian.
>
>
>

Re: Batch/queue frameworks?

Posted by F21 <f2...@gmail.com>.

I am also interested in something like this, although my requirements 
are much more simpler.

I am interested in a work queue like beanstalkd that will allow me to 
push to a queue from a web app and have workers to do things like send 
emails, generate pdfs and resize images.

I have thought about running a beanstalkd in a container, but it has 
some limitations. For example, if it crashes, it needs to be relaunched 
manually to recover the binlog (which is a no go).

Another option I can think of is to use kafka (which has a mesos 
framework) and have the web app and other parts push jobs into the kafka 
broker. Workers listening on the broker would pop each job off and 
execute whatever needs to be done.

However, there seems to be a lot of wheel-reinventing what that 
solution. For example, what if a job depends on another job? There's 
also a lot of work that needs to be done at a lower level when all I am 
interested in is to write domain specific code to generate the pdf, 
resize the image etc.

If there's a work queue solution for mesos, I would love to know too.

On 7/10/2015 8:08 PM, Brian Candler wrote:
> On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
>> Maybe you need to read a bit  :)
> I have read plenty, including those you list, and I didn't find 
> anything which met my requirements. Again I apologise if I was not 
> clear in my question.
>
> Spark has a very specific data model (RDDs) and applications which 
> write to its API. I want to run arbitrary compute jobs - think "shell 
> scripts" or "docker containers" which run pre-existing applications 
> which I can't change.  And I want to fill a queue or pipeline with 
> those jobs.
>
> Hadoop also is for specific workloads, written to run under Hadoop and 
> preferably using HDFS.
>
> The nearest Hadoop gets to general-purpose computing, as far as I can 
> see, is its YARN scheduler. YARN can in turn run under Mesos. 
> Therefore a job queue which can run on YARN might be acceptable, 
> although I'd rather not have an additional layer in the stack. (There 
> was an old project for running Torque under YARN, but this has been 
> abandoned)
>
> Regards,
>
> Brian.
>

Re: Batch/queue frameworks?

Posted by Brian Candler <b....@pobox.com>.

On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
> Maybe you need to read a bit  :)
I have read plenty, including those you list, and I didn't find anything 
which met my requirements. Again I apologise if I was not clear in my 
question.

Spark has a very specific data model (RDDs) and applications which write 
to its API. I want to run arbitrary compute jobs - think "shell scripts" 
or "docker containers" which run pre-existing applications which I can't 
change.  And I want to fill a queue or pipeline with those jobs.

Hadoop also is for specific workloads, written to run under Hadoop and 
preferably using HDFS.

The nearest Hadoop gets to general-purpose computing, as far as I can 
see, is its YARN scheduler. YARN can in turn run under Mesos. Therefore 
a job queue which can run on YARN might be acceptable, although I'd 
rather not have an additional layer in the stack. (There was an old 
project for running Torque under YARN, but this has been abandoned)

Regards,

Brian.

Re: Batch/queue frameworks?

Posted by Nikolaos Ballas neXus <ni...@nexusgroup.com>.

Maybe you need to read a bit :) Hadoop/Spark are batch processing frameworks, both can run on top of Mesos. If you want to do online processing the you have the Apache Storm child. On the other hand super computer != distributed computing. You referred to croons and I thought you were asking for a scheduler. You need to read maybe a bit to understand the technology stack, cause the answers to your question are rather obvious for a guy with ds background, even basic, following the market. Jobs can be either executed with Hadoop executors or delegate jobs to processes configured in docker containers that mess can bootstrap.

kind regards
\n\m

On 07 Oct 2015, at 10:37, Brian Candler <b....@pobox.com>> wrote:

On 07/10/2015 09:01, Nikolaos Ballas neXus wrote:
Check for Marathon

I don't see how Marathon does what I want. Maybe I wasn't clear enough in explaining my requirements.

What I need is basically a supercomputer cluster where I can take a large computation job, break it into lots of sub-tasks, and run as many of those sub-tasks in parallel as possible given the CPU resources available, until all the sub-tasks are done.

The core of any sort of system like that is a "job queue" where all the sub-tasks are entered. The executor picks out another task whenever there is some free resource available, and when it finishes, it is removed from the queue.

I don't see how Marathon has such a job queue. As far as I can tell, Marathon is for starting long-lived applications; you define what things you want running, it starts them, and restarts them if they die for any reason.

Or have I misunderstood what Marathon is capable of? If so, can you point me at the relevant documentation?

The advantage of running such a supercomputer cluster under Mesos would be that I could run *other* applications (including those started by Marathon or Chronos) on the same hardware.

Thanks,

Brian.

Nikolaos Ballas | Software Development Manager

[cid:87987ACD-6CF7-41BE-9517-E612DBF86ABA@pwcacc.com]
\

Re: Batch/queue frameworks?

Posted by Brian Candler <b....@pobox.com>.

On 07/10/2015 09:01, Nikolaos Ballas neXus wrote:
> Check for Marathon

I don't see how Marathon does what I want. Maybe I wasn't clear enough 
in explaining my requirements.

What I need is basically a supercomputer cluster where I can take a 
large computation job, break it into lots of sub-tasks, and run as many 
of those sub-tasks in parallel as possible given the CPU resources 
available, until all the sub-tasks are done.

The core of any sort of system like that is a "job queue" where all the 
sub-tasks are entered. The executor picks out another task whenever 
there is some free resource available, and when it finishes, it is 
removed from the queue.

I don't see how Marathon has such a job queue. As far as I can tell, 
Marathon is for starting long-lived applications; you define what things 
you want running, it starts them, and restarts them if they die for any 
reason.

Or have I misunderstood what Marathon is capable of? If so, can you 
point me at the relevant documentation?

The advantage of running such a supercomputer cluster under Mesos would 
be that I could run *other* applications (including those started by 
Marathon or Chronos) on the same hardware.

Thanks,

Brian.

Re: Batch/queue frameworks?

Posted by Nikolaos Ballas neXus <ni...@nexusgroup.com>.

Check for Marathon
On 07 Oct 2015, at 09:56, Brian Candler <b....@pobox.com>> wrote:

Are there any open-source job queue/batch systems which run under Mesos? I am thinking of things like HTCondor, Torque etc.

The requirement is to be able to:
- define an overall job as a set of sub-tasks (could be many thousands)
- put sub-tasks into a queue; execute tasks from the queue
- dependencies: don't add a sub-task into the queue until its precursors have completed successfully
- restart: after an error, be able to restart the job but skipping those sub-tasks which completed successfully
- preferably handle short-lived tasks efficiently (of order of 10 seconds duration)

Clearly it's possible to write a framework to do this, but I don't want to re-invent the wheel if it has been done already.

Thanks,

Brian.

P.S. I found Chronos, but it doesn't seem a good match. As far as I can see, it's intended for applications where you pre-define a bunch of tasks (via GUI? via REST?) and then trigger them periodically.

Nikolaos Ballas  |  Software Development Manager

Technology Nexus S.a.r.l.
2-4 Rue Eugene Rupert
2453 Luxembourg
Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building
Tel: + 3522619113580
contact@nexusgroup.com<ma...@nexusgroup.com> | nexusgroup.com<http://www.nexusgroup.com/>
LinkedIn.com<http://www.linkedin.com/company/nexus-technology> | Twitter<http://www.twitter.com/technologynexus> | Facebook.com<https://www.facebook.com/pages/Technology-Nexus/133756470003189>



[cid:87987ACD-6CF7-41BE-9517-E612DBF86ABA@pwcacc.com]
\

Re: Batch/queue frameworks?

Posted by Lars Albertsson <la...@gmail.com>.

What you are looking for is probably a workflow manager. It is more or
less independent from a cluster management system, such as Mesos.

Here is a suggestion for a tool shopping list:

https://github.com/spotify/luigi
https://azkaban.github.io/
https://github.com/airbnb/airflow
https://github.com/pinterest/pinball
https://github.com/sailthru/stolos

Luigi is probably least risk - easy to get started and battle-tested.
I am biased, though.

In batch processing environments, the workflow managers typically run
on a small cluster of "edge nodes", which in turn schedule jobs on
Hadoop or Spark. One could conceive scheduling jobs from edge nodes
both onto Hadoop/Spark and Mesos - the latter would be appropriate for
jobs that fit in a single machine. Hadoop or Spark are often used also
for simpler jobs, at a high cost in hardware and complexity. I have
not heard of any such hybrid integrations, however.

If you go down that path, you may want to look at Aurora for Mesos
scheduling and resource allocation. Unlike Marathon and Kubernetes, it
supports batch jobs. You can build a batch worker farm on Mesos with
e.g. Marathon + RabbitMQ, but you would likely reinvent what Aurora
does.

I answered a related question on the Spark mailing list, which may
provide some useful additional information:
https://www.mail-archive.com/user@spark.apache.org/msg34417.html

Regards,

Lars Albertsson

On Wed, Oct 7, 2015 at 9:56 AM, Brian Candler <b....@pobox.com> wrote:
> Are there any open-source job queue/batch systems which run under Mesos? I
> am thinking of things like HTCondor, Torque etc.
>
> The requirement is to be able to:
> - define an overall job as a set of sub-tasks (could be many thousands)
> - put sub-tasks into a queue; execute tasks from the queue
> - dependencies: don't add a sub-task into the queue until its precursors
> have completed successfully
> - restart: after an error, be able to restart the job but skipping those
> sub-tasks which completed successfully
> - preferably handle short-lived tasks efficiently (of order of 10 seconds
> duration)
>
> Clearly it's possible to write a framework to do this, but I don't want to
> re-invent the wheel if it has been done already.
>
> Thanks,
>
> Brian.
>
> P.S. I found Chronos, but it doesn't seem a good match. As far as I can see,
> it's intended for applications where you pre-define a bunch of tasks (via
> GUI? via REST?) and then trigger them periodically.