You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Christopher Hunt <ch...@lightbend.com> on 2017/06/19 02:09:21 UTC

Executors and CPU allocations

Hi there,

We have a framework that runs on Mesos and DC/OS. There is a core and an agent design to our framework which equates to a Mesos scheduler and executor respectively. The executor is responsible for forking and managing processes w.r.t. to our problem domain. Given that the executor is written in Scala and runs on the JVM, we find that it requires at least 1.9 CPUs to be allocated in order to function reasonably well. Also, given that it is a JVM process we also “warm up” the executors by starting them for each distinct node that we receive offers for. This keeps our domain of task management feeling responsive.

Our problem is that our executor will consume 1.9 CPUs even when whether we have no further tasks. Given that Mesos deducts 1.9 from the number of available CPUs on each node, our users quickly complain that there’s no resource left to run anything else.

I’m hoping to solicit ideas on how we can manage our executor more effectively. Clearly, consuming 1.9 cpus when effectively doing nothing is undesirable.

Some ideas:

* start the executor only when required - we tried this and the resulting experience felt sluggish given the overhead of starting the JVM based executor
* start the executor with fewer CPU requirements (say, 1.0 CPUs), and then change its CPU share via ExecutorInfo when we have tasks to run - I’m not sure that this is possible - I think Mesos complains if ExecutorInfo is changed given that a previous task has supplied it
* Given Mesos 1.3 and its support for multiple roles, have our framework register its own role so that the user has more control over where our executors are placed - at present we target all nodes where we receive an offer i.e. “*”.
* re-write the executor off the JVM e.g. using Rust - this would be non-trivial

Thoughts/more ideas?

Thanks in advance.

Kind regards,.
Christopher

Christopher Hunt
Technical Lead, Lightbend Enterprise Suite
@huntchr
UTC+10

Re: Executors and CPU allocations

Posted by Alex Rukletsov <al...@mesosphere.com>.

Regarding your second idea, you may have a "dummy" task with, say, 1.8 CPU
and "run" it iff there is at least another real task running, while
assigning 0.1 CPU for your executor. You can do some bookkeeping in the
executor to determine whether a certain executor is idle (and hence a
"dummy" task should be sent) when accepting an offer. This might be racy,
so you may want to "kill" the dummy task after a certain timeout on the
executor.

Similar to the above, you can also terminate executors from the scheduler
if you don't need them any more, or for a certain period of time.

On Mon, Jun 19, 2017 at 4:09 AM, Christopher Hunt <
christopher.hunt@lightbend.com> wrote:

> Hi there,
>
> We have a framework that runs on Mesos and DC/OS. There is a core and an
> agent design to our framework which equates to a Mesos scheduler and
> executor respectively. The executor is responsible for forking and managing
> processes w.r.t. to our problem domain. Given that the executor is written
> in Scala and runs on the JVM, we find that it requires at least 1.9 CPUs to
> be allocated in order to function reasonably well. Also, given that it is a
> JVM process we also “warm up” the executors by starting them for each
> distinct node that we receive offers for. This keeps our domain of task
> management feeling responsive.
>
> Our problem is that our executor will consume 1.9 CPUs even when whether
> we have no further tasks. Given that Mesos deducts 1.9 from the number of
> available CPUs on each node, our users quickly complain that there’s no
> resource left to run anything else.
>
> I’m hoping to solicit ideas on how we can manage our executor more
> effectively. Clearly, consuming 1.9 cpus when effectively doing nothing is
> undesirable.
>
> Some ideas:
>
> * start the executor only when required - we tried this and the resulting
> experience felt sluggish given the overhead of starting the JVM based
> executor
> * start the executor with fewer CPU requirements (say, 1.0 CPUs), and then
> change its CPU share via ExecutorInfo when we have tasks to run - I’m not
> sure that this is possible - I think Mesos complains if ExecutorInfo is
> changed given that a previous task has supplied it
> * Given Mesos 1.3 and its support for multiple roles, have our framework
> register its own role so that the user has more control over where our
> executors are placed - at present we target all nodes where we receive an
> offer i.e. “*”.
> * re-write the executor off the JVM e.g. using Rust - this would be
> non-trivial
>
> Thoughts/more ideas?
>
> Thanks in advance.
>
> Kind regards,.
> Christopher
>
> Christopher Hunt
> *Technical Lead, Lightbend Enterprise Suite*
> @huntchr
> UTC+10
>
>