You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Philip Weaver <ph...@gmail.com> on 2015/07/17 21:26:50 UTC

High latency when scheduling and executing many tiny tasks.

I'm trying to understand the behavior of mesos, and if what I am observing
is typical or if I'm doing something wrong, and what options I have for
improving the performance of how offers are made and how tasks are executed
for my particular use case.

I have written a Scheduler that has a queue of very small tasks (for
testing, they are "echo hello world", but in production many of them won't
be much more expensive than that). Each task is configured to use 1 cpu
resource. When resourceOffers is called, I launch as many tasks as I can in
the given offers; that is, one call to driver.launchTasks for each offer,
with a list of tasks that has one task for each cpu in that offer.

On a cluster of 3 nodes and 4 cores each (12 total cores), it takes 120s to
execute 1000 tasks out of the queue. We are evaluting mesos because we want
to use it to replace our current homegrown cluster controller, which can
execute 1000 tasks in way less than 120s.

I am seeing two things that concern me:

   - The time between driver.launchTasks and receiving a callback to
   statusUpdate when the task completes is typically 200-500ms, and sometimes
   even as high as 1000-2000ms.
   - The time between when a task completes and when I get an offer for the
   newly freed resource is another 500ms or so.

These latencies explain why I can only execute tasks at a rate of about 8/s.

It looks like my offers always include all 4 cores on each machine, which
would indicate that mesos doesn't like to send an offer as soon as a single
resource is avaiable, and prefers to delay and send an offer with more
resources in it. Is this true?

Thanks in advance for any advice you can offer!

- Phllip

Re: High latency when scheduling and executing many tiny tasks.

Posted by Alexander Gallego <ag...@concord.io>.

I take back the executor in scala. Just looked at the source and both
PathExecutor and CommandExecutor proxy to mesos TaskBuilder.setCommand


    executor match {
      case CommandExecutor() =>
        builder.setCommand(TaskBuilder.commandInfo(app, Some(taskId), host,
ports, envPrefix))
        containerProto.foreach(builder.setContainer)

      case PathExecutor(path) =>
        val executorId = f"marathon-${taskId.getValue}" // Fresh executor
        val executorPath = s"'$path'" // TODO: Really escape this.
        val cmd = app.cmd orElse app.args.map(_ mkString " ") getOrElse ""
        val shell = s"chmod ug+rx $executorPath && exec $executorPath $cmd"
        val command = TaskBuilder.commandInfo(app, Some(taskId), host,
ports, envPrefix).toBuilder.setValue(shell)

        val info = ExecutorInfo.newBuilder()
          .setExecutorId(ExecutorID.newBuilder().setValue(executorId))
          .setCommand(command)
        containerProto.foreach(info.setContainer)
        builder.setExecutor(info)
        val binary = new ByteArrayOutputStream()
        mapper.writeValue(binary, app)
        builder.setData(ByteString.copyFrom(binary.toByteArray))
    }


The pattern of execvp'ing is still what I use and in fact what mesos uses:

        if (task.command().shell()) {
          execl(
              "/bin/sh",
              "sh",
              "-c",
              task.command().value().c_str(),
              (char*) NULL);
        } else {
          execvp(task.command().value().c_str(), argv);
        }


Sorry for the missinformation about the executor in Marathon.



On Fri, Jul 17, 2015 at 5:20 PM, Philip Weaver <ph...@gmail.com>
wrote:

> Ok, thanks!
>
> On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <ag...@concord.io>
> wrote:
>
>> I use a similar pattern.
>>
>> I have my own scheduler as you have. I deploy my own executor which
>> downloads a tar from some storage and effectively ` execvp ( ... ) ` a
>> proc. It monitors the child proc and reports status of child pid exit
>> status.
>>
>> Check out the Marathon code if you are writing in scala. It is an
>> excellent example for both scheduler and executor templates.
>>
>> -ag
>>
>> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <ph...@gmail.com>
>> wrote:
>>
>>> Awesome, I suspected that was the case, but hadn't discovered the
>>> --allocation_interval flag, so I will use that.
>>>
>>> I installed from the mesosphere RPMs and didn't change any flags from
>>> there. I will try to find some logs that provide some insight into the
>>> execution times.
>>>
>>> I am using a command task. I haven't looked into executors yet; I had a
>>> hard time finding some examples in my language (Scala).
>>>
>>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
>>> benjamin.mahler@gmail.com> wrote:
>>>
>>>> One other thing, do you use an executor to run many tasks? Or are you
>>>> using a command task?
>>>>
>>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>>>> benjamin.mahler@gmail.com> wrote:
>>>>
>>>>> Currently, recovered resources are not immediately re-offered as you
>>>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>>>>> second bullet you listed. Although, in your case it would be better to
>>>>> immediately re-offer recovered resources (feel free to file a ticket for
>>>>> supporting that).
>>>>>
>>>>> For the first bullet, mind providing some more information? E.g.
>>>>> master flags, slave flags, scheduler logs, master logs, slave logs,
>>>>> executor logs? We would need to trace through a task launch to see where
>>>>> the latency is being introduced.
>>>>>
>>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <
>>>>> philip.weaver@gmail.com> wrote:
>>>>>
>>>>>> I'm trying to understand the behavior of mesos, and if what I am
>>>>>> observing is typical or if I'm doing something wrong, and what options I
>>>>>> have for improving the performance of how offers are made and how tasks are
>>>>>> executed for my particular use case.
>>>>>>
>>>>>> I have written a Scheduler that has a queue of very small tasks (for
>>>>>> testing, they are "echo hello world", but in production many of them won't
>>>>>> be much more expensive than that). Each task is configured to use 1 cpu
>>>>>> resource. When resourceOffers is called, I launch as many tasks as I can in
>>>>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>>>
>>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes
>>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos because
>>>>>> we want to use it to replace our current homegrown cluster controller,
>>>>>> which can execute 1000 tasks in way less than 120s.
>>>>>>
>>>>>> I am seeing two things that concern me:
>>>>>>
>>>>>>    - The time between driver.launchTasks and receiving a callback to
>>>>>>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>>>>>>    even as high as 1000-2000ms.
>>>>>>    - The time between when a task completes and when I get an offer
>>>>>>    for the newly freed resource is another 500ms or so.
>>>>>>
>>>>>> These latencies explain why I can only execute tasks at a rate of
>>>>>> about 8/s.
>>>>>>
>>>>>> It looks like my offers always include all 4 cores on each machine,
>>>>>> which would indicate that mesos doesn't like to send an offer as soon as a
>>>>>> single resource is avaiable, and prefers to delay and send an offer with
>>>>>> more resources in it. Is this true?
>>>>>>
>>>>>> Thanks in advance for any advice you can offer!
>>>>>>
>>>>>> - Phllip
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>


-- 





Sincerely,
Alexander Gallego
Co Founder & CTO

Re: High latency when scheduling and executing many tiny tasks.

Posted by Benjamin Mahler <be...@gmail.com>.

I've filed a ticket to immediately re-offer recovered resources from
terminal tasks / executors:

https://issues.apache.org/jira/browse/MESOS-3078

On Fri, Jul 17, 2015 at 2:24 PM, Philip Weaver <ph...@gmail.com>
wrote:

> Your advice worked and made a huge difference. With
> allocation_interval=50ms, the 1000 tasks now execute in 21s instead of
> 120s. Thanks.
>
> On Fri, Jul 17, 2015 at 2:20 PM, Philip Weaver <ph...@gmail.com>
> wrote:
>
>> Ok, thanks!
>>
>> On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <ag...@concord.io>
>> wrote:
>>
>>> I use a similar pattern.
>>>
>>> I have my own scheduler as you have. I deploy my own executor which
>>> downloads a tar from some storage and effectively ` execvp ( ... ) ` a
>>> proc. It monitors the child proc and reports status of child pid exit
>>> status.
>>>
>>> Check out the Marathon code if you are writing in scala. It is an
>>> excellent example for both scheduler and executor templates.
>>>
>>> -ag
>>>
>>> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <ph...@gmail.com>
>>> wrote:
>>>
>>>> Awesome, I suspected that was the case, but hadn't discovered the
>>>> --allocation_interval flag, so I will use that.
>>>>
>>>> I installed from the mesosphere RPMs and didn't change any flags from
>>>> there. I will try to find some logs that provide some insight into the
>>>> execution times.
>>>>
>>>> I am using a command task. I haven't looked into executors yet; I had a
>>>> hard time finding some examples in my language (Scala).
>>>>
>>>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
>>>> benjamin.mahler@gmail.com> wrote:
>>>>
>>>>> One other thing, do you use an executor to run many tasks? Or are you
>>>>> using a command task?
>>>>>
>>>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>>>>> benjamin.mahler@gmail.com> wrote:
>>>>>
>>>>>> Currently, recovered resources are not immediately re-offered as you
>>>>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>>>>>> second bullet you listed. Although, in your case it would be better to
>>>>>> immediately re-offer recovered resources (feel free to file a ticket for
>>>>>> supporting that).
>>>>>>
>>>>>> For the first bullet, mind providing some more information? E.g.
>>>>>> master flags, slave flags, scheduler logs, master logs, slave logs,
>>>>>> executor logs? We would need to trace through a task launch to see where
>>>>>> the latency is being introduced.
>>>>>>
>>>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <
>>>>>> philip.weaver@gmail.com> wrote:
>>>>>>
>>>>>>> I'm trying to understand the behavior of mesos, and if what I am
>>>>>>> observing is typical or if I'm doing something wrong, and what options I
>>>>>>> have for improving the performance of how offers are made and how tasks are
>>>>>>> executed for my particular use case.
>>>>>>>
>>>>>>> I have written a Scheduler that has a queue of very small tasks (for
>>>>>>> testing, they are "echo hello world", but in production many of them won't
>>>>>>> be much more expensive than that). Each task is configured to use 1 cpu
>>>>>>> resource. When resourceOffers is called, I launch as many tasks as I can in
>>>>>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>>>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>>>>
>>>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes
>>>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos because
>>>>>>> we want to use it to replace our current homegrown cluster controller,
>>>>>>> which can execute 1000 tasks in way less than 120s.
>>>>>>>
>>>>>>> I am seeing two things that concern me:
>>>>>>>
>>>>>>>    - The time between driver.launchTasks and receiving a callback
>>>>>>>    to statusUpdate when the task completes is typically 200-500ms, and
>>>>>>>    sometimes even as high as 1000-2000ms.
>>>>>>>    - The time between when a task completes and when I get an offer
>>>>>>>    for the newly freed resource is another 500ms or so.
>>>>>>>
>>>>>>> These latencies explain why I can only execute tasks at a rate of
>>>>>>> about 8/s.
>>>>>>>
>>>>>>> It looks like my offers always include all 4 cores on each machine,
>>>>>>> which would indicate that mesos doesn't like to send an offer as soon as a
>>>>>>> single resource is avaiable, and prefers to delay and send an offer with
>>>>>>> more resources in it. Is this true?
>>>>>>>
>>>>>>> Thanks in advance for any advice you can offer!
>>>>>>>
>>>>>>> - Phllip
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>

Re: High latency when scheduling and executing many tiny tasks.

Posted by Philip Weaver <ph...@gmail.com>.

Your advice worked and made a huge difference. With
allocation_interval=50ms, the 1000 tasks now execute in 21s instead of
120s. Thanks.

On Fri, Jul 17, 2015 at 2:20 PM, Philip Weaver <ph...@gmail.com>
wrote:

> Ok, thanks!
>
> On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <ag...@concord.io>
> wrote:
>
>> I use a similar pattern.
>>
>> I have my own scheduler as you have. I deploy my own executor which
>> downloads a tar from some storage and effectively ` execvp ( ... ) ` a
>> proc. It monitors the child proc and reports status of child pid exit
>> status.
>>
>> Check out the Marathon code if you are writing in scala. It is an
>> excellent example for both scheduler and executor templates.
>>
>> -ag
>>
>> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <ph...@gmail.com>
>> wrote:
>>
>>> Awesome, I suspected that was the case, but hadn't discovered the
>>> --allocation_interval flag, so I will use that.
>>>
>>> I installed from the mesosphere RPMs and didn't change any flags from
>>> there. I will try to find some logs that provide some insight into the
>>> execution times.
>>>
>>> I am using a command task. I haven't looked into executors yet; I had a
>>> hard time finding some examples in my language (Scala).
>>>
>>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
>>> benjamin.mahler@gmail.com> wrote:
>>>
>>>> One other thing, do you use an executor to run many tasks? Or are you
>>>> using a command task?
>>>>
>>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>>>> benjamin.mahler@gmail.com> wrote:
>>>>
>>>>> Currently, recovered resources are not immediately re-offered as you
>>>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>>>>> second bullet you listed. Although, in your case it would be better to
>>>>> immediately re-offer recovered resources (feel free to file a ticket for
>>>>> supporting that).
>>>>>
>>>>> For the first bullet, mind providing some more information? E.g.
>>>>> master flags, slave flags, scheduler logs, master logs, slave logs,
>>>>> executor logs? We would need to trace through a task launch to see where
>>>>> the latency is being introduced.
>>>>>
>>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <
>>>>> philip.weaver@gmail.com> wrote:
>>>>>
>>>>>> I'm trying to understand the behavior of mesos, and if what I am
>>>>>> observing is typical or if I'm doing something wrong, and what options I
>>>>>> have for improving the performance of how offers are made and how tasks are
>>>>>> executed for my particular use case.
>>>>>>
>>>>>> I have written a Scheduler that has a queue of very small tasks (for
>>>>>> testing, they are "echo hello world", but in production many of them won't
>>>>>> be much more expensive than that). Each task is configured to use 1 cpu
>>>>>> resource. When resourceOffers is called, I launch as many tasks as I can in
>>>>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>>>
>>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes
>>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos because
>>>>>> we want to use it to replace our current homegrown cluster controller,
>>>>>> which can execute 1000 tasks in way less than 120s.
>>>>>>
>>>>>> I am seeing two things that concern me:
>>>>>>
>>>>>>    - The time between driver.launchTasks and receiving a callback to
>>>>>>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>>>>>>    even as high as 1000-2000ms.
>>>>>>    - The time between when a task completes and when I get an offer
>>>>>>    for the newly freed resource is another 500ms or so.
>>>>>>
>>>>>> These latencies explain why I can only execute tasks at a rate of
>>>>>> about 8/s.
>>>>>>
>>>>>> It looks like my offers always include all 4 cores on each machine,
>>>>>> which would indicate that mesos doesn't like to send an offer as soon as a
>>>>>> single resource is avaiable, and prefers to delay and send an offer with
>>>>>> more resources in it. Is this true?
>>>>>>
>>>>>> Thanks in advance for any advice you can offer!
>>>>>>
>>>>>> - Phllip
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>

Re: High latency when scheduling and executing many tiny tasks.

Posted by Philip Weaver <ph...@gmail.com>.

Ok, thanks!

On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <ag...@concord.io>
wrote:

> I use a similar pattern.
>
> I have my own scheduler as you have. I deploy my own executor which
> downloads a tar from some storage and effectively ` execvp ( ... ) ` a
> proc. It monitors the child proc and reports status of child pid exit
> status.
>
> Check out the Marathon code if you are writing in scala. It is an
> excellent example for both scheduler and executor templates.
>
> -ag
>
> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <ph...@gmail.com>
> wrote:
>
>> Awesome, I suspected that was the case, but hadn't discovered the
>> --allocation_interval flag, so I will use that.
>>
>> I installed from the mesosphere RPMs and didn't change any flags from
>> there. I will try to find some logs that provide some insight into the
>> execution times.
>>
>> I am using a command task. I haven't looked into executors yet; I had a
>> hard time finding some examples in my language (Scala).
>>
>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
>> benjamin.mahler@gmail.com> wrote:
>>
>>> One other thing, do you use an executor to run many tasks? Or are you
>>> using a command task?
>>>
>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>>> benjamin.mahler@gmail.com> wrote:
>>>
>>>> Currently, recovered resources are not immediately re-offered as you
>>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>>>> second bullet you listed. Although, in your case it would be better to
>>>> immediately re-offer recovered resources (feel free to file a ticket for
>>>> supporting that).
>>>>
>>>> For the first bullet, mind providing some more information? E.g. master
>>>> flags, slave flags, scheduler logs, master logs, slave logs, executor logs?
>>>> We would need to trace through a task launch to see where the latency is
>>>> being introduced.
>>>>
>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <
>>>> philip.weaver@gmail.com> wrote:
>>>>
>>>>> I'm trying to understand the behavior of mesos, and if what I am
>>>>> observing is typical or if I'm doing something wrong, and what options I
>>>>> have for improving the performance of how offers are made and how tasks are
>>>>> executed for my particular use case.
>>>>>
>>>>> I have written a Scheduler that has a queue of very small tasks (for
>>>>> testing, they are "echo hello world", but in production many of them won't
>>>>> be much more expensive than that). Each task is configured to use 1 cpu
>>>>> resource. When resourceOffers is called, I launch as many tasks as I can in
>>>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>>
>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes
>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos because
>>>>> we want to use it to replace our current homegrown cluster controller,
>>>>> which can execute 1000 tasks in way less than 120s.
>>>>>
>>>>> I am seeing two things that concern me:
>>>>>
>>>>>    - The time between driver.launchTasks and receiving a callback to
>>>>>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>>>>>    even as high as 1000-2000ms.
>>>>>    - The time between when a task completes and when I get an offer
>>>>>    for the newly freed resource is another 500ms or so.
>>>>>
>>>>> These latencies explain why I can only execute tasks at a rate of
>>>>> about 8/s.
>>>>>
>>>>> It looks like my offers always include all 4 cores on each machine,
>>>>> which would indicate that mesos doesn't like to send an offer as soon as a
>>>>> single resource is avaiable, and prefers to delay and send an offer with
>>>>> more resources in it. Is this true?
>>>>>
>>>>> Thanks in advance for any advice you can offer!
>>>>>
>>>>> - Phllip
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
>

Re: High latency when scheduling and executing many tiny tasks.

Posted by Alexander Gallego <ag...@concord.io>.

I use a similar pattern.

I have my own scheduler as you have. I deploy my own executor which
downloads a tar from some storage and effectively ` execvp ( ... ) ` a
proc. It monitors the child proc and reports status of child pid exit
status.

Check out the Marathon code if you are writing in scala. It is an excellent
example for both scheduler and executor templates.

-ag

On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <ph...@gmail.com>
wrote:

> Awesome, I suspected that was the case, but hadn't discovered the
> --allocation_interval flag, so I will use that.
>
> I installed from the mesosphere RPMs and didn't change any flags from
> there. I will try to find some logs that provide some insight into the
> execution times.
>
> I am using a command task. I haven't looked into executors yet; I had a
> hard time finding some examples in my language (Scala).
>
> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
> benjamin.mahler@gmail.com> wrote:
>
>> One other thing, do you use an executor to run many tasks? Or are you
>> using a command task?
>>
>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>> benjamin.mahler@gmail.com> wrote:
>>
>>> Currently, recovered resources are not immediately re-offered as you
>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>>> second bullet you listed. Although, in your case it would be better to
>>> immediately re-offer recovered resources (feel free to file a ticket for
>>> supporting that).
>>>
>>> For the first bullet, mind providing some more information? E.g. master
>>> flags, slave flags, scheduler logs, master logs, slave logs, executor logs?
>>> We would need to trace through a task launch to see where the latency is
>>> being introduced.
>>>
>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <philip.weaver@gmail.com
>>> > wrote:
>>>
>>>> I'm trying to understand the behavior of mesos, and if what I am
>>>> observing is typical or if I'm doing something wrong, and what options I
>>>> have for improving the performance of how offers are made and how tasks are
>>>> executed for my particular use case.
>>>>
>>>> I have written a Scheduler that has a queue of very small tasks (for
>>>> testing, they are "echo hello world", but in production many of them won't
>>>> be much more expensive than that). Each task is configured to use 1 cpu
>>>> resource. When resourceOffers is called, I launch as many tasks as I can in
>>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>
>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes
>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos because
>>>> we want to use it to replace our current homegrown cluster controller,
>>>> which can execute 1000 tasks in way less than 120s.
>>>>
>>>> I am seeing two things that concern me:
>>>>
>>>>    - The time between driver.launchTasks and receiving a callback to
>>>>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>>>>    even as high as 1000-2000ms.
>>>>    - The time between when a task completes and when I get an offer
>>>>    for the newly freed resource is another 500ms or so.
>>>>
>>>> These latencies explain why I can only execute tasks at a rate of about
>>>> 8/s.
>>>>
>>>> It looks like my offers always include all 4 cores on each machine,
>>>> which would indicate that mesos doesn't like to send an offer as soon as a
>>>> single resource is avaiable, and prefers to delay and send an offer with
>>>> more resources in it. Is this true?
>>>>
>>>> Thanks in advance for any advice you can offer!
>>>>
>>>> - Phllip
>>>>
>>>>
>>>
>>
>

Re: High latency when scheduling and executing many tiny tasks.

Posted by Philip Weaver <ph...@gmail.com>.

Awesome, I suspected that was the case, but hadn't discovered the
--allocation_interval flag, so I will use that.

I installed from the mesosphere RPMs and didn't change any flags from
there. I will try to find some logs that provide some insight into the
execution times.

I am using a command task. I haven't looked into executors yet; I had a
hard time finding some examples in my language (Scala).

On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <be...@gmail.com>
wrote:

> One other thing, do you use an executor to run many tasks? Or are you
> using a command task?
>
> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
> benjamin.mahler@gmail.com> wrote:
>
>> Currently, recovered resources are not immediately re-offered as you
>> noticed, and the default allocation interval is 1 second. I'd recommend
>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>> second bullet you listed. Although, in your case it would be better to
>> immediately re-offer recovered resources (feel free to file a ticket for
>> supporting that).
>>
>> For the first bullet, mind providing some more information? E.g. master
>> flags, slave flags, scheduler logs, master logs, slave logs, executor logs?
>> We would need to trace through a task launch to see where the latency is
>> being introduced.
>>
>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <ph...@gmail.com>
>> wrote:
>>
>>> I'm trying to understand the behavior of mesos, and if what I am
>>> observing is typical or if I'm doing something wrong, and what options I
>>> have for improving the performance of how offers are made and how tasks are
>>> executed for my particular use case.
>>>
>>> I have written a Scheduler that has a queue of very small tasks (for
>>> testing, they are "echo hello world", but in production many of them won't
>>> be much more expensive than that). Each task is configured to use 1 cpu
>>> resource. When resourceOffers is called, I launch as many tasks as I can in
>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>> with a list of tasks that has one task for each cpu in that offer.
>>>
>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes 120s
>>> to execute 1000 tasks out of the queue. We are evaluting mesos because we
>>> want to use it to replace our current homegrown cluster controller, which
>>> can execute 1000 tasks in way less than 120s.
>>>
>>> I am seeing two things that concern me:
>>>
>>>    - The time between driver.launchTasks and receiving a callback to
>>>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>>>    even as high as 1000-2000ms.
>>>    - The time between when a task completes and when I get an offer for
>>>    the newly freed resource is another 500ms or so.
>>>
>>> These latencies explain why I can only execute tasks at a rate of about
>>> 8/s.
>>>
>>> It looks like my offers always include all 4 cores on each machine,
>>> which would indicate that mesos doesn't like to send an offer as soon as a
>>> single resource is avaiable, and prefers to delay and send an offer with
>>> more resources in it. Is this true?
>>>
>>> Thanks in advance for any advice you can offer!
>>>
>>> - Phllip
>>>
>>>
>>
>

Re: High latency when scheduling and executing many tiny tasks.

Posted by Benjamin Mahler <be...@gmail.com>.

One other thing, do you use an executor to run many tasks? Or are you using
a command task?

On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <be...@gmail.com>
wrote:

> Currently, recovered resources are not immediately re-offered as you
> noticed, and the default allocation interval is 1 second. I'd recommend
> lowering that (e.g. --allocation_interval=50ms), that should improve the
> second bullet you listed. Although, in your case it would be better to
> immediately re-offer recovered resources (feel free to file a ticket for
> supporting that).
>
> For the first bullet, mind providing some more information? E.g. master
> flags, slave flags, scheduler logs, master logs, slave logs, executor logs?
> We would need to trace through a task launch to see where the latency is
> being introduced.
>
> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <ph...@gmail.com>
> wrote:
>
>> I'm trying to understand the behavior of mesos, and if what I am
>> observing is typical or if I'm doing something wrong, and what options I
>> have for improving the performance of how offers are made and how tasks are
>> executed for my particular use case.
>>
>> I have written a Scheduler that has a queue of very small tasks (for
>> testing, they are "echo hello world", but in production many of them won't
>> be much more expensive than that). Each task is configured to use 1 cpu
>> resource. When resourceOffers is called, I launch as many tasks as I can in
>> the given offers; that is, one call to driver.launchTasks for each offer,
>> with a list of tasks that has one task for each cpu in that offer.
>>
>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes 120s
>> to execute 1000 tasks out of the queue. We are evaluting mesos because we
>> want to use it to replace our current homegrown cluster controller, which
>> can execute 1000 tasks in way less than 120s.
>>
>> I am seeing two things that concern me:
>>
>>    - The time between driver.launchTasks and receiving a callback to
>>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>>    even as high as 1000-2000ms.
>>    - The time between when a task completes and when I get an offer for
>>    the newly freed resource is another 500ms or so.
>>
>> These latencies explain why I can only execute tasks at a rate of about
>> 8/s.
>>
>> It looks like my offers always include all 4 cores on each machine, which
>> would indicate that mesos doesn't like to send an offer as soon as a single
>> resource is avaiable, and prefers to delay and send an offer with more
>> resources in it. Is this true?
>>
>> Thanks in advance for any advice you can offer!
>>
>> - Phllip
>>
>>
>

Re: High latency when scheduling and executing many tiny tasks.

Posted by Benjamin Mahler <be...@gmail.com>.

Currently, recovered resources are not immediately re-offered as you
noticed, and the default allocation interval is 1 second. I'd recommend
lowering that (e.g. --allocation_interval=50ms), that should improve the
second bullet you listed. Although, in your case it would be better to
immediately re-offer recovered resources (feel free to file a ticket for
supporting that).

For the first bullet, mind providing some more information? E.g. master
flags, slave flags, scheduler logs, master logs, slave logs, executor logs?
We would need to trace through a task launch to see where the latency is
being introduced.

On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <ph...@gmail.com>
wrote:

> I'm trying to understand the behavior of mesos, and if what I am observing
> is typical or if I'm doing something wrong, and what options I have for
> improving the performance of how offers are made and how tasks are executed
> for my particular use case.
>
> I have written a Scheduler that has a queue of very small tasks (for
> testing, they are "echo hello world", but in production many of them won't
> be much more expensive than that). Each task is configured to use 1 cpu
> resource. When resourceOffers is called, I launch as many tasks as I can in
> the given offers; that is, one call to driver.launchTasks for each offer,
> with a list of tasks that has one task for each cpu in that offer.
>
> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes 120s
> to execute 1000 tasks out of the queue. We are evaluting mesos because we
> want to use it to replace our current homegrown cluster controller, which
> can execute 1000 tasks in way less than 120s.
>
> I am seeing two things that concern me:
>
>    - The time between driver.launchTasks and receiving a callback to
>    statusUpdate when the task completes is typically 200-500ms, and sometimes
>    even as high as 1000-2000ms.
>    - The time between when a task completes and when I get an offer for
>    the newly freed resource is another 500ms or so.
>
> These latencies explain why I can only execute tasks at a rate of about
> 8/s.
>
> It looks like my offers always include all 4 cores on each machine, which
> would indicate that mesos doesn't like to send an offer as soon as a single
> resource is avaiable, and prefers to delay and send an offer with more
> resources in it. Is this true?
>
> Thanks in advance for any advice you can offer!
>
> - Phllip
>
>