You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by David Challoner <dc...@gmail.com> on 2013/04/20 21:00:37 UTC

Best way to go about writing a framework for daemon-like tasks

Hi, new to mesos.   I setup a test cluster in EC2 (which required some
tweaks to the provided scripts - i'll try to send those back in) but I'm
not sure how I should write the framework for what I'd like to achieve.

I'd like to use Mesos to run a dynamically changing list of applications
like so:
[
{ job: app,
  cpu: 3,
  memory: 20g,
  instances: 4},
 { job: app2,
  cpu: 5,
  memory: 30g,
  instances: 2}
]

So I want the framework to pull the list out of a database/redis/zk and run
these apps across the cluster in a round robbin fashion until either the
cluster resources are exhausted or we've satisfied the number of running
instances.   What I'm having trouble groking is how this would look on a
framework/executor level.

With a single framework it seems you're likely to fail using all available
resources given a long running daemon-like tasks:
*framework registered
*node offers 10 cpus
*framework accepts offer and at the time decides to to gives it 8cpu worth
of tasks to run.
*node has 2 cpus left over
*at some point the list of apps changes and a new app allocation is
needed..  Maybe we have a new app that could use those 2 cpus or maybe we
just need to adjust how many of the old apps are running.  If I understand
the docs correctly, the nodes won't re-offer because they've already been
assigned tasks by the framework that will run forever.

Do I maybe submit a new framework for each app type then?  Would that scale
to large numbers of apps?

Re: Best way to go about writing a framework for daemon-like tasks

Posted by Vinod Kone <vi...@gmail.com>.

Hi David,

First, glad to hear that you would like to contribute EC2 related patches.
Looking forwarding to it!

Regarding your question about frameworks: You could absolutely do what you
want to do with one framework. Below are some notes/suggestions.

--> Mesos always re-offers un-used resources to frameworks. So, in your
case, you can definitely schedule a 2 cpu job/task at a later point in time.

--> If you want to adjust (by this I assume you mean kill some jobs/tasks?)
old job/task, the framework typically needs to maintain a map of running
tasks. You can then issue 'killTask()' calls, so that the tasks get killed
and the corresponding resources are re-offered.

--> Note that, for Mesos to re-offer  resources after killing a task, the
executor running that task needs to send a terminal
(TASK_KILLED/TASK_FINISHED/TASK_LOST) status update.

--> Mesos also re-offers resources, if an executor is terminated.  These
are the resources used by the executor and all its constituent tasks.

--> Finally, if you don't want to write an executor (to begin with) Mesos
has a built-in Command Executor. This executor just wraps your shell
command, runs the command and exits when the command finishes.


Hope that helps,


On Sat, Apr 20, 2013 at 12:00 PM, David Challoner <dc...@gmail.com>wrote:

> Hi, new to mesos.   I setup a test cluster in EC2 (which required some
> tweaks to the provided scripts - i'll try to send those back in) but I'm
> not sure how I should write the framework for what I'd like to achieve.
>
> I'd like to use Mesos to run a dynamically changing list of applications
> like so:
> [
> { job: app,
>   cpu: 3,
>   memory: 20g,
>   instances: 4},
>  { job: app2,
>   cpu: 5,
>   memory: 30g,
>   instances: 2}
> ]
>
> So I want the framework to pull the list out of a database/redis/zk and run
> these apps across the cluster in a round robbin fashion until either the
> cluster resources are exhausted or we've satisfied the number of running
> instances.   What I'm having trouble groking is how this would look on a
> framework/executor level.
>
> With a single framework it seems you're likely to fail using all available
> resources given a long running daemon-like tasks:
> *framework registered
> *node offers 10 cpus
> *framework accepts offer and at the time decides to to gives it 8cpu worth
> of tasks to run.
> *node has 2 cpus left over
> *at some point the list of apps changes and a new app allocation is
> needed..  Maybe we have a new app that could use those 2 cpus or maybe we
> just need to adjust how many of the old apps are running.  If I understand
> the docs correctly, the nodes won't re-offer because they've already been
> assigned tasks by the framework that will run forever.
>
> Do I maybe submit a new framework for each app type then?  Would that scale
> to large numbers of apps?
>