You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Asim <li...@gmail.com> on 2014/07/01 16:48:30 UTC

Re: Task serialization per machine?

Thanks for your response!

Yes the executor (launchTask) only gets one task that it executes
synchronously and finishes. Since launchTask is a callback, my intuition
 is the scheduler should launch these tasks in parallel (even within a
single machine) after calculating the resources required. I can create a
new thread in launchTask() callback and return immediately but that will
cause a lost slave since the scheduler assumes it is finished but there is
a zombie thread still around. Hence, I am not completely sure creating new
threads will solve this issue.

I am using the C++ framework. Is there an example on how this is
accomplished in current frameworks?  I looked at Spark and it does not seem
to be doing anything special for its callbacks to ensure that multiple
tasks on a single machine execute in parallel.

Thanks,
Asim

On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <sp...@netflix.com> wrote:

> A likely scenario is that your executor is running the task synchronously
> inside the callback to launchTask(). If you make it instead run the task
> asynchronously (e.g., in a separate thread), that should resolve it.
>
>
> On Mon, Jun 30, 2014 at 12:48 PM, Asim <li...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to launch multiple tasks on multiple machines (t >> m) that can
>> run simultaneously. Currently, I find that every machine processes the
>> tasks in a serial fashion one after another.
>>
>> I have written a framework with a scheduler and a executor. The scheduler
>> launches a task list on a bunch of machines (that show up as offers). When
>> I send a task list to run with driver->launchTasks(offers[i].id(),
>> tasks[i]) I find that every machine picks up one task at a time (and then
>> goes to the next). This happens even though the offer can accommodate more
>> than one task from this task list easily.
>>
>> Is there something that I am missing?
>>
>> Thanks,
>> Asim
>>
>>
>

Re: Task serialization per machine?

Posted by Asim <li...@gmail.com>.

 I was able to figure this out with my C++ framework. I created a new
thread (pthread_create) and detached its execution from the executor
(pthread_detach) and then sent a TASK_RUNNING status update to framework.
Thanks for the explanation!


On Tue, Jul 1, 2014 at 2:51 PM, Vinod Kone <vi...@gmail.com> wrote:

> What Sharma said.
>
> Both the scheduler and executor drivers are single threaded i.e., you will
> only get one call back at a time. IOW, unless you return from one callback
> you won't get the next callback.
>
>
> On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila <sp...@netflix.com>
> wrote:
>
>> Hi Asim,
>>
>> I am using (developing) a Java executor. I see a similar strategy in the
>> Mesos-Hadoop executor.
>>
>>
>> https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java
>>
>> Executor's successful launching of the task (asynchronously) is usually
>> immediately followed by a TaskState.TASK_RUNNING status message to
>> driver. It can then return from the launchTask method, but the executor
>> process shouldn't exit, it will have to remain running for at least the
>> duration of the task. Upon completion of the task, the executor must notify
>> Mesos of its completion. A task lost status will be reported by Mesos if
>> the executor were to exit pre-maturely.
>>
>> My explanation is from understanding Mesos as a user and framework
>> developer. Someone from the Mesos dev team may have a better way to explain
>> this.
>> I suspect framework callbacks, at least at the executor, aren't done
>> concurrently. I haven't looked in to the details of why/how/etc.
>>
>>
>>
>>
>>
>> On Tue, Jul 1, 2014 at 7:48 AM, Asim <li...@gmail.com> wrote:
>>
>>> Thanks for your response!
>>>
>>> Yes the executor (launchTask) only gets one task that it executes
>>> synchronously and finishes. Since launchTask is a callback, my intuition
>>>  is the scheduler should launch these tasks in parallel (even within a
>>> single machine) after calculating the resources required. I can create a
>>> new thread in launchTask() callback and return immediately but that will
>>> cause a lost slave since the scheduler assumes it is finished but there is
>>> a zombie thread still around. Hence, I am not completely sure creating new
>>> threads will solve this issue.
>>>
>>> I am using the C++ framework. Is there an example on how this is
>>> accomplished in current frameworks?  I looked at Spark and it does not seem
>>> to be doing anything special for its callbacks to ensure that multiple
>>> tasks on a single machine execute in parallel.
>>>
>>> Thanks,
>>> Asim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <sp...@netflix.com>
>>> wrote:
>>>
>>>> A likely scenario is that your executor is running the task
>>>> synchronously inside the callback to launchTask(). If you make it instead
>>>> run the task asynchronously (e.g., in a separate thread), that should
>>>> resolve it.
>>>>
>>>>
>>>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <li...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to launch multiple tasks on multiple machines (t >> m) that can
>>>>> run simultaneously. Currently, I find that every machine processes the
>>>>> tasks in a serial fashion one after another.
>>>>>
>>>>> I have written a framework with a scheduler and a executor. The
>>>>> scheduler launches a task list on a bunch of machines (that show up as
>>>>> offers). When I send a task list to run
>>>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every
>>>>> machine picks up one task at a time (and then goes to the next). This
>>>>> happens even though the offer can accommodate more than one task from this
>>>>> task list easily.
>>>>>
>>>>> Is there something that I am missing?
>>>>>
>>>>> Thanks,
>>>>> Asim
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Task serialization per machine?

Posted by Vinod Kone <vi...@gmail.com>.

What Sharma said.

Both the scheduler and executor drivers are single threaded i.e., you will
only get one call back at a time. IOW, unless you return from one callback
you won't get the next callback.


On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila <sp...@netflix.com> wrote:

> Hi Asim,
>
> I am using (developing) a Java executor. I see a similar strategy in the
> Mesos-Hadoop executor.
>
>
> https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java
>
> Executor's successful launching of the task (asynchronously) is usually
> immediately followed by a TaskState.TASK_RUNNING status message to
> driver. It can then return from the launchTask method, but the executor
> process shouldn't exit, it will have to remain running for at least the
> duration of the task. Upon completion of the task, the executor must notify
> Mesos of its completion. A task lost status will be reported by Mesos if
> the executor were to exit pre-maturely.
>
> My explanation is from understanding Mesos as a user and framework
> developer. Someone from the Mesos dev team may have a better way to explain
> this.
> I suspect framework callbacks, at least at the executor, aren't done
> concurrently. I haven't looked in to the details of why/how/etc.
>
>
>
>
>
> On Tue, Jul 1, 2014 at 7:48 AM, Asim <li...@gmail.com> wrote:
>
>> Thanks for your response!
>>
>> Yes the executor (launchTask) only gets one task that it executes
>> synchronously and finishes. Since launchTask is a callback, my intuition
>>  is the scheduler should launch these tasks in parallel (even within a
>> single machine) after calculating the resources required. I can create a
>> new thread in launchTask() callback and return immediately but that will
>> cause a lost slave since the scheduler assumes it is finished but there is
>> a zombie thread still around. Hence, I am not completely sure creating new
>> threads will solve this issue.
>>
>> I am using the C++ framework. Is there an example on how this is
>> accomplished in current frameworks?  I looked at Spark and it does not seem
>> to be doing anything special for its callbacks to ensure that multiple
>> tasks on a single machine execute in parallel.
>>
>> Thanks,
>> Asim
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <sp...@netflix.com>
>> wrote:
>>
>>> A likely scenario is that your executor is running the task
>>> synchronously inside the callback to launchTask(). If you make it instead
>>> run the task asynchronously (e.g., in a separate thread), that should
>>> resolve it.
>>>
>>>
>>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <li...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to launch multiple tasks on multiple machines (t >> m) that can
>>>> run simultaneously. Currently, I find that every machine processes the
>>>> tasks in a serial fashion one after another.
>>>>
>>>> I have written a framework with a scheduler and a executor. The
>>>> scheduler launches a task list on a bunch of machines (that show up as
>>>> offers). When I send a task list to run
>>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every
>>>> machine picks up one task at a time (and then goes to the next). This
>>>> happens even though the offer can accommodate more than one task from this
>>>> task list easily.
>>>>
>>>> Is there something that I am missing?
>>>>
>>>> Thanks,
>>>> Asim
>>>>
>>>>
>>>
>>
>

Re: Task serialization per machine?

Posted by Sharma Podila <sp...@netflix.com>.

Hi Asim,

I am using (developing) a Java executor. I see a similar strategy in the
Mesos-Hadoop executor.

https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java

Executor's successful launching of the task (asynchronously) is usually
immediately followed by a TaskState.TASK_RUNNING status message to driver.
It can then return from the launchTask method, but the executor process
shouldn't exit, it will have to remain running for at least the duration of
the task. Upon completion of the task, the executor must notify Mesos of
its completion. A task lost status will be reported by Mesos if the
executor were to exit pre-maturely.

My explanation is from understanding Mesos as a user and framework
developer. Someone from the Mesos dev team may have a better way to explain
this.
I suspect framework callbacks, at least at the executor, aren't done
concurrently. I haven't looked in to the details of why/how/etc.

On Tue, Jul 1, 2014 at 7:48 AM, Asim <li...@gmail.com> wrote:

> Thanks for your response!
>
> Yes the executor (launchTask) only gets one task that it executes
> synchronously and finishes. Since launchTask is a callback, my intuition
>  is the scheduler should launch these tasks in parallel (even within a
> single machine) after calculating the resources required. I can create a
> new thread in launchTask() callback and return immediately but that will
> cause a lost slave since the scheduler assumes it is finished but there is
> a zombie thread still around. Hence, I am not completely sure creating new
> threads will solve this issue.
>
> I am using the C++ framework. Is there an example on how this is
> accomplished in current frameworks?  I looked at Spark and it does not seem
> to be doing anything special for its callbacks to ensure that multiple
> tasks on a single machine execute in parallel.
>
> Thanks,
> Asim
>
>
>
>
>
>
>
> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <sp...@netflix.com>
> wrote:
>
>> A likely scenario is that your executor is running the task synchronously
>> inside the callback to launchTask(). If you make it instead run the task
>> asynchronously (e.g., in a separate thread), that should resolve it.
>>
>>
>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <li...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to launch multiple tasks on multiple machines (t >> m) that can
>>> run simultaneously. Currently, I find that every machine processes the
>>> tasks in a serial fashion one after another.
>>>
>>> I have written a framework with a scheduler and a executor. The
>>> scheduler launches a task list on a bunch of machines (that show up as
>>> offers). When I send a task list to run
>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every
>>> machine picks up one task at a time (and then goes to the next). This
>>> happens even though the offer can accommodate more than one task from this
>>> task list easily.
>>>
>>> Is there something that I am missing?
>>>
>>> Thanks,
>>> Asim
>>>
>>>
>>
>