You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by As...@amat.com on 2017/05/14 09:16:26 UTC

RE: High performance, low latency framework over mesos

Hi again,

I changed my scheduler and executor to use v0 api. I still get ~50ms latency per task and it seems that now most of the latency is from the time the master sends status update to scheduler until the scheduler callback is called:
Please check below timeline (master & scheduler):

MASTER -->       11:34:31.234208  1507 master.cpp:6154] Status update TASK_FINISHED (UUID: 59b02301-da69-436b-a11b-ce917b115aab) for task 5
MASTER -->       11:34:31.234323  1507 master.cpp:6222] Forwarding status update TASK_FINISHED (UUID: 59b02301-da69-436b-a11b-ce917b115aab) for task 5
MASTER -->       11:34:31.234489  1507 master.cpp:8302] Updating the state of task 5 of framework 0129000f-770d-4422-9f42-5b72f73cac38-0004 (latest state: TASK_FINISHED, status update state: TASK_FINISHED)
MASTER -->       11:34:31.237977  1513 master.cpp:6992] Sending 1 offers to framework 0129000f-770d-4422-9f42-5b72f73cac38-0004 (My Mesos Framework) at scheduler-7e57ec64-ad97-4e09-97a6-50d58c26540b@10.0.2.15:36844<mailto:scheduler-7e57ec64-ad97-4e09-97a6-50d58c26540b@10.0.2.15:36844>
SCHEDULER --> 11:34:31.259 INFO: statusUpdate for task 5
MASTER -->       11:34:31.262975  1498 master.cpp:5092] Processing ACKNOWLEDGE call 59b02301-da69-436b-a11b-ce917b115aab for task 5

Please advise.
Thanks.

From: Anand Mazumdar [mailto:anand@apache.org]
Sent: Monday, March 20, 2017 7:18 PM
To: user@mesos.apache.org
Subject: Re: High performance, low latency framework over mesos

Thanks for the detailed timeline of the logs. The scheduler library does not create a new connection per call. I have a suspicion that the slowness might be related to https://issues.apache.org/jira/browse/MESOS-6405 instead. To confirm this, can you change your scheduler to use the `V0Mesos` class (https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V0Mesos.java) and build an updated timeline of logs based on that?

-anand




On Mon, Mar 20, 2017 at 6:42 AM, <As...@amat.com>> wrote:
Hi,

I added logs to my scheduler and executor and trace a specific task (task 1) timeline (the behavior is similar to all tasks).
It seems the most of the time (~30ms, lines 4-5 below) is ‘lost’ between the scheduler submits the task (i.e. sending an ACCEPT call with the submitted task) until the master gets the HTTP POST of this call (does the scheduler library creates a new connection per call?).
After the master gets the ACCEPT call it takes another ~10ms for mater-->slave-->executor-->slave-->master round trip (but maybe its reasonable for 4 IPC network calls).

Later on, I also see that scheduler ACKNOWLEDGE call for task 1 status update (i.e. sending an ACKNOWLEDGE call with the uuid got in status update) also seen in master only ~35ms (lines 18-19 below) after the call. I’m starting to conclude the each call using the scheduler library (which actually involves HTTP POST) takes ~40ms.

To sum it up, it seems that the main factor for the high latency I get here is due to HTTP POST mechanism in scheduler library. Is there a way to improve it? Is it possible to keep a scheduler-->master connection always connected?

[cid:image001.png@01D2CCA7.9D45FCA0]

Thanks.

From: Benjamin Mahler [mailto:bmahler@apache.org<ma...@apache.org>]
Sent: Thursday, March 16, 2017 3:12 AM
To: user
Cc: Anand Mazumdar

Subject: Re: High performance, low latency framework over mesos

The breakdown of the 10ms between launching on the agent and getting TASK_FINISHED from the agent could be seen by looking at the agent logs. Do you have them?

The 40ms it takes from forwarding the update to receiving the acknowledgement seems excessive, since the scheduler has to do a lot less work than the agent and that only takes 10ms. It would be great have the scheduler do some logging so that we can see if there is network latency here or there is inefficient processing in the v1 scheduler library.

As a next step, I would suggest building the complete timeline of logs, which includes scheduler (you likely need to do some logging here), master and agent logs.

On Tue, Mar 14, 2017 at 8:43 AM, <As...@amat.com>> wrote:
Thanks Benjamin,

I looked into the logs and it seems the delay is between the master and the scheduler:
Master log:
I0314 18:23:59.409423 39743 master.cpp:3776] Processing ACCEPT call for offers: [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O25 ] on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051<http://10.201.98.16:5051> (hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
W0314 18:23:59.410166 39743 validation.cpp:1064] Executor 'MesosBMExecutorId' for task '8' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0314 18:23:59.410221 39743 validation.cpp:1076] Executor 'MesosBMExecutorId' for task '8' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I0314 18:23:59.410292 39743 master.cpp:9053] Adding task 8 with resources cpus(*)(allocated: *):0.01 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051<http://10.201.98.16:5051> (hadoop-master)
I0314 18:23:59.410331 39743 master.cpp:4426] Launching task 8 of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler) with resources cpus(*)(allocated: *):0.01 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051<http://10.201.98.16:5051> (hadoop-master)
I0314 18:23:59.411258 39738 hierarchical.cpp:807] Updated allocation of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 from cpus(*)(allocated: *):0.01 to cpus(*)(allocated: *):0.01
I0314 18:23:59.415060 39723 master.cpp:6992] Sending 1 offers to framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
I0314 18:23:59.420624 39757 master.cpp:6154] Status update TASK_FINISHED (UUID: 583ea071-de66-4050-9513-8ff432da605f) for task 8 of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 from agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051<http://10.201.98.16:5051> (hadoop-master)
I0314 18:23:59.420671 39757 master.cpp:6222] Forwarding status update TASK_FINISHED (UUID: 583ea071-de66-4050-9513-8ff432da605f) for task 8 of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000
I0314 18:23:59.420819 39757 master.cpp:8302] Updating the state of task 8 of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (latest state: TASK_FINISHED, status update state: TASK_FINISHED)
I0314 18:23:59.425354 39742 master.cpp:6992] Sending 1 offers to framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
I0314 18:23:59.459801 39741 http.cpp:420] HTTP POST for /master/api/v1/scheduler from 127.0.0.1:36100<http://127.0.0.1:36100>
W0314 18:23:59.459889 39741 master.cpp:3634] Implicitly declining offers: [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O26 ] in ACCEPT call for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 as the launch operation specified no tasks
I0314 18:23:59.460055 39741 master.cpp:3776] Processing ACCEPT call for offers: [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O26 ] on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051<http://10.201.98.16:5051> (hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
I0314 18:23:59.460435 39741 http.cpp:420] HTTP POST for /master/api/v1/scheduler from 127.0.0.1:36100<http://127.0.0.1:36100>
I0314 18:23:59.460484 39741 master.cpp:5092] Processing ACKNOWLEDGE call 583ea071-de66-4050-9513-8ff432da605f for task 8 of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler) on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0

From master log you can see that the time between the master received the offer accept, until forwarding task_finished status to scheduler is ~11ms.
For some reason the scheduler acknowledge came after ~40ms although I’m sending it immediately upon status update. Moreover I can’t explain why the time between receiving an offer until accepting it (even on empty accept) take ~40ms. I think this might be related to the problem.

My scheduler is written in Java, inheriting org.apache.mesos.v1.scheduler.Scheduler and passing it to org.apache.mesos.v1.scheduler.V1Mesos.
This is my scheduler impl for processUpdate and processOffers (called upon ‘received’ with Protos.Event.UPDATE/OFFERS respectively):
private void processOffers(Mesos mesos, List<Offer> offers) {
        for (Offer offer : offers) {
            Offer.Operation.Launch.Builder launch = Offer.Operation.Launch.newBuilder();

            double offerCpus = 0;
            for (Resource resource : offer.getResourcesList()){
                if (resource.getName().equals("cpus")){
                    offerCpus += resource.getScalar().getValue();
                }
            }

            LOGGER.info("Received offer " + offer.getId().getValue() + " from agent: " + offer.getHostname() + " [cpus: " + offerCpus + "]");

            while (!pendingTasks.isEmpty())
            {
                Task task = pendingTasks.peek();
                if (task.requiredCpus <= offerCpus){
                    TaskID taskId = TaskID.newBuilder()
                            .setValue(Integer.toString(launchedTasks++)).build();

                    LOGGER.info("Launching task " + taskId.getValue() +
                            " using offer " + offer.getId().getValue());

                    TaskInfo taskInfo = TaskInfo.newBuilder()
                            .setName("task " + taskId.getValue())
                            .setTaskId(taskId)
                            .setAgentId(offer.getAgentId())
                            .addResources(Resource.newBuilder()
                                    .setName("cpus")
                                    .setType(Value.Type.SCALAR)
                                    .setScalar(Value.Scalar.newBuilder()
                                            .setValue(task.requiredCpus)
                                            .build()))
                            .setExecutor(executorInfo)
                            .build();

                    launch.addTaskInfos(taskInfo);

                    offerCpus -= task.requiredCpus;
                    submittedTasks.put(taskId, pendingTasks.poll());
                }
                else {
                    break;
                }
            }

            mesos.send(Call.newBuilder()
                    .setType(Call.Type.ACCEPT)
                    .setFrameworkId(frameworkId)
                    .setAccept(Call.Accept.newBuilder()
                            .addOfferIds(offer.getId())
                            .addOperations(Offer.Operation.newBuilder()
                                    .setType(Offer.Operation.Type.LAUNCH)
                                    .setLaunch(launch)
                                    .build())
                            .setFilters(Filters.newBuilder()
                                    .setRefuseSeconds(0)
                                    .build()))
                    .build());
        }

        LOGGER.info("Completed handling offers");
    }

private void processUpdate(Mesos mesos, TaskStatus taskStatus)
    {
        LOGGER.info("Task " + taskStatus.getTaskId().getValue() + " status " + taskStatus.getState().name());

        if (taskStatus.hasUuid()) {
            mesos.send(Call.newBuilder()
                    .setType(Call.Type.ACKNOWLEDGE)
                    .setFrameworkId(frameworkId)
                    .setAcknowledge(Call.Acknowledge.newBuilder()
                            .setAgentId(taskStatus.getAgentId())
                            .setTaskId(taskStatus.getTaskId())
                            .setUuid(taskStatus.getUuid())
                            .build())
                    .build());
        }
    }

From: Benjamin Mahler [mailto:bmahler@apache.org<ma...@apache.org>]
Sent: Friday, March 10, 2017 12:52 AM
To: user
Subject: Re: High performance, low latency framework over mesos

Have you taken a look at the logs across your scheduler / master / agent to see where the latency is introduced? We can then discuss options to reduce the latency.

Ben

On Tue, Mar 7, 2017 at 5:03 AM, <As...@amat.com>> wrote:
Hi,

I’m implementing my own framework (scheduler + executor) over mesos. The framework is oriented for low latency, short running tasks (i.e. task time is hundreds of ms).
I wrote a simple scheduler and executor and measured the time of task submission RTT (round trip) – that is the elapsed time from submitting the task at scheduler (by accepting and offer) and until the scheduler received TASK_FINISHED status.
The task on executor side does nothing but to immediately sending TASK_FINISHED status.

I see that the RTT is ~50ms which is quite a lot for my scenario, I was wondering if this is controllable and why does it take so long to receive status update.

Thanks

Assaf.

________________________________



________________________________



________________________________



________________________________

Re: High performance, low latency framework over mesos

Posted by Benjamin Mahler <bm...@apache.org>.

Hm.. at this point I would investigate whether writing a C++ scheduler
using our C++ bindings has the same overhead, in order to eliminate JNI /
the JVM as the source of this latency.

On Sun, May 14, 2017 at 2:16 AM, <As...@amat.com> wrote:

> Hi again,
>
>
>
> I changed my scheduler and executor to use v0 api. I still get ~50ms
> latency per task and it seems that now most of the latency is from the time
> the master sends status update to scheduler until the scheduler callback is
> called:
>
> Please check below timeline (master & scheduler):
>
>
>
> MASTER à       11:34:31.234208  1507 master.cpp:6154] Status update
> TASK_FINISHED (UUID: 59b02301-da69-436b-a11b-ce917b115aab) for task 5
>
> MASTER à       11:34:31.234323  1507 master.cpp:6222] Forwarding status
> update TASK_FINISHED (UUID: 59b02301-da69-436b-a11b-ce917b115aab) for
> task 5
>
> MASTER à       11:34:31.234489  1507 master.cpp:8302] Updating the state
> of task 5 of framework 0129000f-770d-4422-9f42-5b72f73cac38-0004 (latest
> state: TASK_FINISHED, status update state: TASK_FINISHED)
>
> MASTER à       11:34:31.237977  1513 master.cpp:6992] Sending 1 offers to
> framework 0129000f-770d-4422-9f42-5b72f73cac38-0004 (My Mesos Framework)
> at scheduler-7e57ec64-ad97-4e09-97a6-50d58c26540b@10.0.2.15:36844
>
> SCHEDULER à 11:34:31.259 INFO: statusUpdate for task 5
>
> MASTER à       11:34:31.262975  1498 master.cpp:5092] Processing
> ACKNOWLEDGE call 59b02301-da69-436b-a11b-ce917b115aab for task 5
>
>
>
> Please advise.
>
> Thanks.
>
>
>
> *From:* Anand Mazumdar [mailto:anand@apache.org]
> *Sent:* Monday, March 20, 2017 7:18 PM
> *To:* user@mesos.apache.org
>
> *Subject:* Re: High performance, low latency framework over mesos
>
>
>
> Thanks for the detailed timeline of the logs. The scheduler library does
> not create a new connection per call. I have a suspicion that the slowness
> might be related to https://issues.apache.org/jira/browse/MESOS-6405 instead.
> To confirm this, can you change your scheduler to use the `V0Mesos` class (
> https://github.com/apache/mesos/blob/master/src/java/src/
> org/apache/mesos/v1/scheduler/V0Mesos.java) and build an updated timeline
> of logs based on that?
>
>
>
> -anand
>
>
>
>
>
>
>
>
>
> On Mon, Mar 20, 2017 at 6:42 AM, <As...@amat.com> wrote:
>
> Hi,
>
>
>
> I added logs to my scheduler and executor and trace a specific task (task
> 1) timeline (the behavior is similar to all tasks).
>
> It seems the most of the time (~30ms, lines 4-5 below) is ‘lost’ between
> the scheduler submits the task (i.e. sending an ACCEPT call with the
> submitted task) until the master gets the HTTP POST of this call (does the
> scheduler library creates a new connection per call?).
>
> After the master gets the ACCEPT call it takes another ~10ms for materà
> slaveàexecutoràslaveàmaster round trip (but maybe its reasonable for 4
> IPC network calls).
>
>
>
> Later on, I also see that scheduler ACKNOWLEDGE call for task 1 status
> update (i.e. sending an ACKNOWLEDGE call with the uuid got in status
> update) also seen in master only ~35ms (lines 18-19 below) after the call.
> I’m starting to conclude the each call using the scheduler library (which
> actually involves HTTP POST) takes ~40ms.
>
>
>
> To sum it up, it seems that the main factor for the high latency I get
> here is due to HTTP POST mechanism in scheduler library. Is there a way to
> improve it? Is it possible to keep a scheduleràmaster connection always
> connected?
>
>
>
>
>
> Thanks.
>
>
>
> *From:* Benjamin Mahler [mailto:bmahler@apache.org]
> *Sent:* Thursday, March 16, 2017 3:12 AM
> *To:* user
> *Cc:* Anand Mazumdar
>
>
> *Subject:* Re: High performance, low latency framework over mesos
>
>
>
> The breakdown of the 10ms between launching on the agent and getting
> TASK_FINISHED from the agent could be seen by looking at the agent logs. Do
> you have them?
>
>
>
> The 40ms it takes from forwarding the update to receiving the
> acknowledgement seems excessive, since the scheduler has to do a lot less
> work than the agent and that only takes 10ms. It would be great have the
> scheduler do some logging so that we can see if there is network latency
> here or there is inefficient processing in the v1 scheduler library.
>
>
>
> As a next step, I would suggest building the complete timeline of logs,
> which includes scheduler (you likely need to do some logging here), master
> and agent logs.
>
>
>
> On Tue, Mar 14, 2017 at 8:43 AM, <As...@amat.com> wrote:
>
> Thanks Benjamin,
>
>
>
> I looked into the logs and it seems the delay is between the master and
> the scheduler:
>
> Master log:
>
> I0314 *18:23:59.409423* 39743 master.cpp:3776] *Processing ACCEPT call*
> for offers: [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O25 ] on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051
> (hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000
> (Mesos BM Scheduler)
>
> W0314 18:23:59.410166 39743 validation.cpp:1064] Executor
> 'MesosBMExecutorId' for task '8' uses less CPUs (None) than the minimum
> required (0.01). Please update your executor, as this will be mandatory in
> future releases.
>
> W0314 18:23:59.410221 39743 validation.cpp:1076] Executor
> 'MesosBMExecutorId' for task '8' uses less memory (None) than the minimum
> required (32MB). Please update your executor, as this will be mandatory in
> future releases.
>
> I0314 18:23:59.410292 39743 master.cpp:9053] Adding task 8 with resources
> cpus(*)(allocated: *):0.01 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0
> at slave(1)@10.201.98.16:5051 (hadoop-master)
>
> I0314 *18:23:59.410331* 39743 master.cpp:4426] *Launching task 8 of
> framework* afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
> with resources cpus(*)(allocated: *):0.01 on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051
> (hadoop-master)
>
> I0314 18:23:59.411258 39738 hierarchical.cpp:807] Updated allocation of
> framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 from cpus(*)(allocated: *):0.01
> to cpus(*)(allocated: *):0.01
>
> I0314 18:23:59.415060 39723 master.cpp:6992] Sending 1 offers to framework
> afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
>
> I0314 *18:23:59.420624* 39757 master.cpp:6154] *Status update
> TASK_FINISHED* (UUID: 583ea071-de66-4050-9513-8ff432da605f) for *task 8*
> of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 from agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051
> (hadoop-master)
>
> I0314 *18:23:59.420671* 39757 master.cpp:6222] *Forwarding status update
> TASK_FINISHED* (UUID: 583ea071-de66-4050-9513-8ff432da605f) for *task 8*
> of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000
>
> I0314 18:23:59.420819 39757 master.cpp:8302] Updating the state of task 8
> of framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (latest state:
> TASK_FINISHED, status update state: TASK_FINISHED)
>
> I0314 18:23:59.425354 39742 master.cpp:6992] Sending 1 offers to framework
> afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler)
>
> I0314 18:23:59.459801 39741 http.cpp:420] HTTP POST for
> /master/api/v1/scheduler from 127.0.0.1:36100
>
> W0314 18:23:59.459889 39741 master.cpp:3634] Implicitly declining offers:
> [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O26 ] in ACCEPT call for framework
> afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 as the launch operation
> specified no tasks
>
> I0314 18:23:59.460055 39741 master.cpp:3776] Processing ACCEPT call for
> offers: [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O26 ] on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051
> (hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000
> (Mesos BM Scheduler)
>
> I0314 18:23:59.460435 39741 http.cpp:420] HTTP POST for
> /master/api/v1/scheduler from 127.0.0.1:36100
>
> I0314 *18:23:59.460484* 39741 master.cpp:5092*] Processing ACKNOWLEDGE
> call* 583ea071-de66-4050-9513-8ff432da605f for *task 8* of framework
> afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-0000 (Mesos BM Scheduler) on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0
>
>
>
> From master log you can see that the time between the master received the
> offer accept, until forwarding task_finished status to scheduler is ~11ms.
>
> For some reason the scheduler acknowledge came after ~40ms although I’m
> sending it immediately upon status update. Moreover I can’t explain why the
> time between receiving an offer until accepting it (even on empty accept)
> take ~40ms. I think this might be related to the problem.
>
>
>
> My scheduler is written in Java, inheriting org.apache.mesos.v1.scheduler.Scheduler
> and passing it to org.apache.mesos.v1.scheduler.V1Mesos.
>
> This is my scheduler impl for processUpdate and processOffers (called upon
> ‘received’ with Protos.Event.UPDATE/OFFERS respectively):
>
> private void processOffers(Mesos mesos, List<Offer> offers) {
>
>         for (Offer offer : offers) {
>
>             Offer.Operation.Launch.Builder launch =
> Offer.Operation.Launch.newBuilder();
>
>
>
>             double offerCpus = 0;
>
>             for (Resource resource : offer.getResourcesList()){
>
>                 if (resource.getName().equals("cpus")){
>
>                     offerCpus += resource.getScalar().getValue();
>
>                 }
>
>             }
>
>
>
>             LOGGER.info("Received offer " + offer.getId().getValue() + "
> from agent: " + offer.getHostname() + " [cpus: " + offerCpus + "]");
>
>
>
>             while (!pendingTasks.isEmpty())
>
>             {
>
>                 Task task = pendingTasks.peek();
>
>                 if (task.requiredCpus <= offerCpus){
>
>                     TaskID taskId = TaskID.newBuilder()
>
>                             .setValue(Integer.toString(lau
> nchedTasks++)).build();
>
>
>
>                     LOGGER.info("Launching task " + taskId.getValue() +
>
>                             " using offer " + offer.getId().getValue());
>
>
>
>                     TaskInfo taskInfo = TaskInfo.newBuilder()
>
>                             .setName("task " + taskId.getValue())
>
>                             .setTaskId(taskId)
>
>                             .setAgentId(offer.getAgentId())
>
>                             .addResources(Resource.newBuilder()
>
>                                     .setName("cpus")
>
>                                     .setType(Value.Type.SCALAR)
>
>                                     .setScalar(Value.Scalar.newBuilder()
>
>                                             .setValue(task.requiredCpus)
>
>                                             .build()))
>
>                             .setExecutor(executorInfo)
>
>                             .build();
>
>
>
>                     launch.addTaskInfos(taskInfo);
>
>
>
>                     offerCpus -= task.requiredCpus;
>
>                     submittedTasks.put(taskId, pendingTasks.poll());
>
>                 }
>
>                 else {
>
>                     break;
>
>                 }
>
>             }
>
>
>
>             mesos.send(Call.newBuilder()
>
>                     .setType(Call.Type.ACCEPT)
>
>                     .setFrameworkId(frameworkId)
>
>                     .setAccept(Call.Accept.newBuilder()
>
>                             .addOfferIds(offer.getId())
>
>                             .addOperations(Offer.Operation.newBuilder()
>
>                                     .setType(Offer.Operation.Type.LAUNCH)
>
>                                     .setLaunch(launch)
>
>                                     .build())
>
>                             .setFilters(Filters.newBuilder()
>
>                                     .setRefuseSeconds(0)
>
>                                     .build()))
>
>                     .build());
>
>         }
>
>
>
>         LOGGER.info("Completed handling offers");
>
>     }
>
>
>
> private void processUpdate(Mesos mesos, TaskStatus taskStatus)
>
>     {
>
>         LOGGER.info("Task " + taskStatus.getTaskId().getValue() + "
> status " + taskStatus.getState().name());
>
>
>
>         if (taskStatus.hasUuid()) {
>
>             mesos.send(Call.newBuilder()
>
>                     .setType(Call.Type.ACKNOWLEDGE)
>
>                     .setFrameworkId(frameworkId)
>
>                     .setAcknowledge(Call.Acknowledge.newBuilder()
>
>                             .setAgentId(taskStatus.getAgentId())
>
>                             .setTaskId(taskStatus.getTaskId())
>
>                             .setUuid(taskStatus.getUuid())
>
>                             .build())
>
>                     .build());
>
>         }
>
>     }
>
>
>
> *From:* Benjamin Mahler [mailto:bmahler@apache.org]
> *Sent:* Friday, March 10, 2017 12:52 AM
> *To:* user
> *Subject:* Re: High performance, low latency framework over mesos
>
>
>
> Have you taken a look at the logs across your scheduler / master / agent
> to see where the latency is introduced? We can then discuss options to
> reduce the latency.
>
>
>
> Ben
>
>
>
> On Tue, Mar 7, 2017 at 5:03 AM, <As...@amat.com> wrote:
>
> Hi,
>
>
>
> I’m implementing my own framework (scheduler + executor) over mesos. The
> framework is oriented for low latency, short running tasks (i.e. task time
> is hundreds of ms).
>
> I wrote a simple scheduler and executor and measured the time of task
> submission RTT (round trip) – that is the elapsed time from submitting the
> task at scheduler (by accepting and offer) and until the scheduler received
> TASK_FINISHED status.
>
> The task on executor side does nothing but to immediately sending
> TASK_FINISHED status.
>
>
>
> I see that the RTT is ~50ms which is quite a lot for my scenario, I was
> wondering if this is controllable and why does it take so long to receive
> status update.
>
>
>
> Thanks
>
>
>
> *Assaf.*
>
>
> ------------------------------
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
> ------------------------------
>
>