You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Dario Rexin <dr...@apple.com> on 2016/10/16 06:41:34 UTC

Performance regression in v1 api vs v0

Hi all,

I recently did some performance testing on the v1 scheduler API and found that throughput is around 10x lower than for the v0 API. Using 1 connection, I don’t get a lot more than 1,500 calls per second, where the v0 API can do ~15,000. If I use multiple connections, throughput maxes out at 3 connections and ~2,500 calls / s. If I add any more connections, the throughput per connection drops and the total throughput stays around ~2,500 calls / s. Has anyone done performance testing on the v1 API before? It seems a little strange to me, that it’s so much slower, given that the v0 API also uses HTTP (well, more or less). I would be thankful for any comments and experience reports of other users.

Thanks,
Dario


Re: Performance regression in v1 api vs v0

Posted by Zhitao Li <zh...@gmail.com>.
+1

We are also quite interested in this topic.

On Mon, Oct 17, 2016 at 12:34 PM, Dario Rexin <dr...@apple.com> wrote:

> Hi Anand,
>
> thanks for creating the ticket. I will also investigate a bit more. I will
> probably be in SF on Thursday, so we could discuss in person.
>
> --
> Dario
>
> On Oct 17, 2016, at 12:19 PM, Anand Mazumdar <an...@apache.org> wrote:
>
> Dario,
>
> It's not immediately clear to me where the bottleneck might be. I filed
> MESOS-6405 <https://issues.apache.org/jira/browse/MESOS-6405> to write a
> benchmark that tries to mimic your test setup and then go about fixing the
> issues.
>
> -anand
>
> On Sun, Oct 16, 2016 at 6:20 PM, Dario Rexin <dr...@apple.com> wrote:
>
>> Hi Anand,
>>
>> I tested with and without pipelining and it doesn’t make a difference.
>> First of all because unlimited pipelining is not a good idea, because we
>> still have to handle the responses and need to be able to relate the
>> request and response upon return, i.e. store the context of the request
>> until we receive the response. Also, we want to know as soon as possible
>> when an error occurs, so early returns are very desirable. I agree that it
>> shouldn't make a difference in how fast events can be processed if they are
>> queued on the master vs. client, but this observation made it very apparent
>> that throughput is a problem on the master. I did not make any requests
>> that would potentially block for a long time, so it’s even weirder to me,
>> that the throughput is so low. One thing I don’t understand for example, is
>> why all messages go through the master process. The parsing for example
>> could be done in a completely separate process and if every connected
>> framework would be backed by its own process, the check if a framework is
>> connected could also be done there (not to mention that this requirement
>> exists only because we need to use multiple connections). Requiring all
>> messages to go through a single process that can indefinitely block is
>> obviously a huge bottleneck. I understand that this problem is not limited
>> to the HTTP API, but I think it has to be fixed.
>>
>> —
>> Dario
>>
>> On Oct 16, 2016, at 5:52 PM, Anand Mazumdar <ma...@gmail.com>
>> wrote:
>>
>> Dario,
>>
>> Regarding:
>>
>> >This is especially concerning, as it means that accepting calls will
>> completely stall when a long running call (e.g. retrieving state.json) is
>> running.
>>
>> How does it help a client when it gets an early accepted response versus
>> when accepting of calls is stalled i.e., queued up on the master actor? The
>> client does not need to wait for a response before pipelining its next
>> request to the master anyway. In your tests, do you send the next REVIVE
>> call only upon receiving the response to the current call? That might
>> explain the behavior you are seeing.
>>
>> -anand
>>
>> On Sun, Oct 16, 2016 at 11:58 AM, tommy xiao <xi...@gmail.com> wrote:
>>
>>> interesting this topic.
>>>
>>> 2016-10-17 2:51 GMT+08:00 Dario Rexin <dr...@apple.com>:
>>>
>>>> Hi Anand,
>>>>
>>>> I tested with current HEAD. After I saw low throughput on our own HTTP
>>>> API client, I wrote a small server that sends out fake events and accepts
>>>> calls and our client was able to send a lot more calls to that server. I
>>>> also wrote a small tool that simply sends as many calls to Mesos as
>>>> possible without handling any events and get similar results there.I also
>>>> observe extremely high CPU usage. While my sending tool is using ~10% CPU,
>>>> Mesos runs on ~185%. The calls I send for testing are all REVIVE and I
>>>> don’t have any agents connected, so there should be essentially nothing
>>>> happening. One reason I could think of for the reduced throughput is that
>>>> all calls are processed in the master process, before it sends back an
>>>> ACCEPTED, leading to effectively single threaded processing of HTTP calls,
>>>> interleaved with all other calls that are sent to the master process.
>>>> Libprocess however just forwards the messages to the master process and
>>>> then immediately  returns ACCEPTED. It also handles all connections in
>>>> separate processes, whereas HTTP calls are effectively all handled by the
>>>> master process.This is especially concerning, as it means that accepting
>>>> calls will completely stall when a long running call (e.g. retrieving
>>>> state.json) is running.
>>>>
>>>> Thanks,
>>>> Dario
>>>>
>>>> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <an...@apache.org> wrote:
>>>>
>>>> Dario,
>>>>
>>>> Thanks for reporting this. Did you test this with 1.0 or the recent
>>>> HEAD? We had done performance testing prior to 1.0rc1 and had not found any
>>>> substantial discrepancy on the call ingestion path. Hence, we had focussed
>>>> on fixing the performance issues around writing events on the stream in
>>>> MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and
>>>> MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>.
>>>>
>>>> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ
>>>> due to the slowness of the client (scheduler library) in processing the
>>>> status update events. We should add another benchmark that measures just
>>>> the time taken by the master to write the events. I would file an issue
>>>> shortly to address this.
>>>>
>>>> Do you mind filing an issue with more details on your test setup?
>>>>
>>>> -anand
>>>>
>>>> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <dr...@apple.com> wrote:
>>>>
>>>>> Hi haosdent,
>>>>>
>>>>> thanks for the pointer! Your results show exactly what I’m
>>>>> experiencing. I think especially for bigger clusters this could be very
>>>>> problematic. It would be great to get some input from the folks working on
>>>>> the HTTP API, especially Anand.
>>>>>
>>>>> Thanks,
>>>>> Dario
>>>>>
>>>>> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark
>>>>> test case to compare v1 and v0 APIs before. You could run it via
>>>>>
>>>>> ```
>>>>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReco
>>>>> ncileTasks_BENCHMARK_Test*"
>>>>> ```
>>>>>
>>>>> Here is the result that run it in my machine.
>>>>>
>>>>> ```
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/0
>>>>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/1
>>>>> Reconciling 10000 tasks took 3.389258444secs using the scheduler
>>>>> library
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/2
>>>>> Reconciling 50000 tasks took 16.624603964secs using the scheduler
>>>>> library
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/3
>>>>> Reconciling 100000 tasks took 33.134018718secs using the scheduler
>>>>> library
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/0
>>>>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/1
>>>>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/2
>>>>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/3
>>>>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>>>>> ```
>>>>>
>>>>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way
>>>>> based on libmesos.so.
>>>>>
>>>>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I recently did some performance testing on the v1 scheduler API and
>>>>>> found that throughput is around 10x lower than for the v0 API. Using 1
>>>>>> connection, I don’t get a lot more than 1,500 calls per second, where the
>>>>>> v0 API can do ~15,000. If I use multiple connections, throughput maxes out
>>>>>> at 3 connections and ~2,500 calls / s. If I add any more connections, the
>>>>>> throughput per connection drops and the total throughput stays around
>>>>>> ~2,500 calls / s. Has anyone done performance testing on the v1 API before?
>>>>>> It seems a little strange to me, that it’s so much slower, given that the
>>>>>> v0 API also uses HTTP (well, more or less). I would be thankful for any
>>>>>> comments and experience reports of other users.
>>>>>>
>>>>>> Thanks,
>>>>>> Dario
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Deshi Xiao
>>> Twitter: xds2000
>>> E-mail: xiaods(AT)gmail.com
>>>
>>
>>
>>
>> --
>> Anand Mazumdar
>>
>>
>>
>


-- 
Cheers,

Zhitao Li

Re: Performance regression in v1 api vs v0

Posted by Dario Rexin <dr...@apple.com>.
Hi Anand,

thanks for creating the ticket. I will also investigate a bit more. I will probably be in SF on Thursday, so we could discuss in person.

--
Dario

> On Oct 17, 2016, at 12:19 PM, Anand Mazumdar <an...@apache.org> wrote:
> 
> Dario,
> 
> It's not immediately clear to me where the bottleneck might be. I filed MESOS-6405 to write a benchmark that tries to mimic your test setup and then go about fixing the issues.
> 
> -anand
> 
>> On Sun, Oct 16, 2016 at 6:20 PM, Dario Rexin <dr...@apple.com> wrote:
>> Hi Anand,
>> 
>> I tested with and without pipelining and it doesn’t make a difference. First of all because unlimited pipelining is not a good idea, because we still have to handle the responses and need to be able to relate the request and response upon return, i.e. store the context of the request until we receive the response. Also, we want to know as soon as possible when an error occurs, so early returns are very desirable. I agree that it shouldn't make a difference in how fast events can be processed if they are queued on the master vs. client, but this observation made it very apparent that throughput is a problem on the master. I did not make any requests that would potentially block for a long time, so it’s even weirder to me, that the throughput is so low. One thing I don’t understand for example, is why all messages go through the master process. The parsing for example could be done in a completely separate process and if every connected framework would be backed by its own process, the check if a framework is connected could also be done there (not to mention that this requirement exists only because we need to use multiple connections). Requiring all messages to go through a single process that can indefinitely block is obviously a huge bottleneck. I understand that this problem is not limited to the HTTP API, but I think it has to be fixed.
>> 
>> —
>> Dario
>> 
>>> On Oct 16, 2016, at 5:52 PM, Anand Mazumdar <ma...@gmail.com> wrote:
>>> 
>>> Dario,
>>> 
>>> Regarding:
>>> 
>>> >This is especially concerning, as it means that accepting calls will completely stall when a long running call (e.g. retrieving state.json) is running. 
>>> 
>>> How does it help a client when it gets an early accepted response versus when accepting of calls is stalled i.e., queued up on the master actor? The client does not need to wait for a response before pipelining its next request to the master anyway. In your tests, do you send the next REVIVE call only upon receiving the response to the current call? That might explain the behavior you are seeing.
>>> 
>>> -anand
>>> 
>>>> On Sun, Oct 16, 2016 at 11:58 AM, tommy xiao <xi...@gmail.com> wrote:
>>>> interesting this topic. 
>>>> 
>>>> 2016-10-17 2:51 GMT+08:00 Dario Rexin <dr...@apple.com>:
>>>>> Hi Anand,
>>>>> 
>>>>> I tested with current HEAD. After I saw low throughput on our own HTTP API client, I wrote a small server that sends out fake events and accepts calls and our client was able to send a lot more calls to that server. I also wrote a small tool that simply sends as many calls to Mesos as possible without handling any events and get similar results there.I also observe extremely high CPU usage. While my sending tool is using ~10% CPU, Mesos runs on ~185%. The calls I send for testing are all REVIVE and I don’t have any agents connected, so there should be essentially nothing happening. One reason I could think of for the reduced throughput is that all calls are processed in the master process, before it sends back an ACCEPTED, leading to effectively single threaded processing of HTTP calls, interleaved with all other calls that are sent to the master process. Libprocess however just forwards the messages to the master process and then immediately  returns ACCEPTED. It also handles all connections in separate processes, whereas HTTP calls are effectively all handled by the master process.This is especially concerning, as it means that accepting calls will completely stall when a long running call (e.g. retrieving state.json) is running. 
>>>>> 
>>>>> Thanks,
>>>>> Dario
>>>>> 
>>>>>> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <an...@apache.org> wrote:
>>>>>> 
>>>>>> Dario,
>>>>>> 
>>>>>> Thanks for reporting this. Did you test this with 1.0 or the recent HEAD? We had done performance testing prior to 1.0rc1 and had not found any substantial discrepancy on the call ingestion path. Hence, we had focussed on fixing the performance issues around writing events on the stream in MESOS-5222 and MESOS-5457. 
>>>>>> 
>>>>>> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ due to the slowness of the client (scheduler library) in processing the status update events. We should add another benchmark that measures just the time taken by the master to write the events. I would file an issue shortly to address this. 
>>>>>> 
>>>>>> Do you mind filing an issue with more details on your test setup?
>>>>>> 
>>>>>> -anand
>>>>>> 
>>>>>>> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <dr...@apple.com> wrote:
>>>>>>> Hi haosdent,
>>>>>>> 
>>>>>>> thanks for the pointer! Your results show exactly what I’m experiencing. I think especially for bigger clusters this could be very problematic. It would be great to get some input from the folks working on the HTTP API, especially Anand.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Dario
>>>>>>> 
>>>>>>>> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark test case to compare v1 and v0 APIs before. You could run it via
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReconcileTasks_BENCHMARK_Test*"
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> Here is the result that run it in my machine.
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
>>>>>>>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
>>>>>>>> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
>>>>>>>> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
>>>>>>>> Reconciling 100000 tasks took 33.134018718secs using the scheduler library
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
>>>>>>>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
>>>>>>>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
>>>>>>>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>>>>>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
>>>>>>>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>>>>>>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way based on libmesos.so.
>>>>>>>> 
>>>>>>>>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I recently did some performance testing on the v1 scheduler API and found that throughput is around 10x lower than for the v0 API. Using 1 connection, I don’t get a lot more than 1,500 calls per second, where the v0 API can do ~15,000. If I use multiple connections, throughput maxes out at 3 connections and ~2,500 calls / s. If I add any more connections, the throughput per connection drops and the total throughput stays around ~2,500 calls / s. Has anyone done performance testing on the v1 API before? It seems a little strange to me, that it’s so much slower, given that the v0 API also uses HTTP (well, more or less). I would be thankful for any comments and experience reports of other users.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Dario
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Deshi Xiao
>>>> Twitter: xds2000
>>>> E-mail: xiaods(AT)gmail.com
>>> 
>>> 
>>> 
>>> -- 
>>> Anand Mazumdar
>> 
> 

Re: Performance regression in v1 api vs v0

Posted by Anand Mazumdar <an...@apache.org>.
Dario,

It's not immediately clear to me where the bottleneck might be. I filed
MESOS-6405 <https://issues.apache.org/jira/browse/MESOS-6405> to write a
benchmark that tries to mimic your test setup and then go about fixing the
issues.

-anand

On Sun, Oct 16, 2016 at 6:20 PM, Dario Rexin <dr...@apple.com> wrote:

> Hi Anand,
>
> I tested with and without pipelining and it doesn’t make a difference.
> First of all because unlimited pipelining is not a good idea, because we
> still have to handle the responses and need to be able to relate the
> request and response upon return, i.e. store the context of the request
> until we receive the response. Also, we want to know as soon as possible
> when an error occurs, so early returns are very desirable. I agree that it
> shouldn't make a difference in how fast events can be processed if they are
> queued on the master vs. client, but this observation made it very apparent
> that throughput is a problem on the master. I did not make any requests
> that would potentially block for a long time, so it’s even weirder to me,
> that the throughput is so low. One thing I don’t understand for example, is
> why all messages go through the master process. The parsing for example
> could be done in a completely separate process and if every connected
> framework would be backed by its own process, the check if a framework is
> connected could also be done there (not to mention that this requirement
> exists only because we need to use multiple connections). Requiring all
> messages to go through a single process that can indefinitely block is
> obviously a huge bottleneck. I understand that this problem is not limited
> to the HTTP API, but I think it has to be fixed.
>
> —
> Dario
>
> On Oct 16, 2016, at 5:52 PM, Anand Mazumdar <ma...@gmail.com>
> wrote:
>
> Dario,
>
> Regarding:
>
> >This is especially concerning, as it means that accepting calls will
> completely stall when a long running call (e.g. retrieving state.json) is
> running.
>
> How does it help a client when it gets an early accepted response versus
> when accepting of calls is stalled i.e., queued up on the master actor? The
> client does not need to wait for a response before pipelining its next
> request to the master anyway. In your tests, do you send the next REVIVE
> call only upon receiving the response to the current call? That might
> explain the behavior you are seeing.
>
> -anand
>
> On Sun, Oct 16, 2016 at 11:58 AM, tommy xiao <xi...@gmail.com> wrote:
>
>> interesting this topic.
>>
>> 2016-10-17 2:51 GMT+08:00 Dario Rexin <dr...@apple.com>:
>>
>>> Hi Anand,
>>>
>>> I tested with current HEAD. After I saw low throughput on our own HTTP
>>> API client, I wrote a small server that sends out fake events and accepts
>>> calls and our client was able to send a lot more calls to that server. I
>>> also wrote a small tool that simply sends as many calls to Mesos as
>>> possible without handling any events and get similar results there.I also
>>> observe extremely high CPU usage. While my sending tool is using ~10% CPU,
>>> Mesos runs on ~185%. The calls I send for testing are all REVIVE and I
>>> don’t have any agents connected, so there should be essentially nothing
>>> happening. One reason I could think of for the reduced throughput is that
>>> all calls are processed in the master process, before it sends back an
>>> ACCEPTED, leading to effectively single threaded processing of HTTP calls,
>>> interleaved with all other calls that are sent to the master process.
>>> Libprocess however just forwards the messages to the master process and
>>> then immediately  returns ACCEPTED. It also handles all connections in
>>> separate processes, whereas HTTP calls are effectively all handled by the
>>> master process.This is especially concerning, as it means that accepting
>>> calls will completely stall when a long running call (e.g. retrieving
>>> state.json) is running.
>>>
>>> Thanks,
>>> Dario
>>>
>>> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <an...@apache.org> wrote:
>>>
>>> Dario,
>>>
>>> Thanks for reporting this. Did you test this with 1.0 or the recent
>>> HEAD? We had done performance testing prior to 1.0rc1 and had not found any
>>> substantial discrepancy on the call ingestion path. Hence, we had focussed
>>> on fixing the performance issues around writing events on the stream in
>>> MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and
>>> MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>.
>>>
>>> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ
>>> due to the slowness of the client (scheduler library) in processing the
>>> status update events. We should add another benchmark that measures just
>>> the time taken by the master to write the events. I would file an issue
>>> shortly to address this.
>>>
>>> Do you mind filing an issue with more details on your test setup?
>>>
>>> -anand
>>>
>>> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <dr...@apple.com> wrote:
>>>
>>>> Hi haosdent,
>>>>
>>>> thanks for the pointer! Your results show exactly what I’m
>>>> experiencing. I think especially for bigger clusters this could be very
>>>> problematic. It would be great to get some input from the folks working on
>>>> the HTTP API, especially Anand.
>>>>
>>>> Thanks,
>>>> Dario
>>>>
>>>> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark
>>>> test case to compare v1 and v0 APIs before. You could run it via
>>>>
>>>> ```
>>>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReco
>>>> ncileTasks_BENCHMARK_Test*"
>>>> ```
>>>>
>>>> Here is the result that run it in my machine.
>>>>
>>>> ```
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/0
>>>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/1
>>>> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/2
>>>> Reconciling 50000 tasks took 16.624603964secs using the scheduler
>>>> library
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/3
>>>> Reconciling 100000 tasks took 33.134018718secs using the scheduler
>>>> library
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/0
>>>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/1
>>>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/2
>>>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/3
>>>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>>> BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>>>> ```
>>>>
>>>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way
>>>> based on libmesos.so.
>>>>
>>>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I recently did some performance testing on the v1 scheduler API and
>>>>> found that throughput is around 10x lower than for the v0 API. Using 1
>>>>> connection, I don’t get a lot more than 1,500 calls per second, where the
>>>>> v0 API can do ~15,000. If I use multiple connections, throughput maxes out
>>>>> at 3 connections and ~2,500 calls / s. If I add any more connections, the
>>>>> throughput per connection drops and the total throughput stays around
>>>>> ~2,500 calls / s. Has anyone done performance testing on the v1 API before?
>>>>> It seems a little strange to me, that it’s so much slower, given that the
>>>>> v0 API also uses HTTP (well, more or less). I would be thankful for any
>>>>> comments and experience reports of other users.
>>>>>
>>>>> Thanks,
>>>>> Dario
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Deshi Xiao
>> Twitter: xds2000
>> E-mail: xiaods(AT)gmail.com
>>
>
>
>
> --
> Anand Mazumdar
>
>
>

Re: Performance regression in v1 api vs v0

Posted by Dario Rexin <dr...@apple.com>.
Hi Anand,

I tested with and without pipelining and it doesn’t make a difference. First of all because unlimited pipelining is not a good idea, because we still have to handle the responses and need to be able to relate the request and response upon return, i.e. store the context of the request until we receive the response. Also, we want to know as soon as possible when an error occurs, so early returns are very desirable. I agree that it shouldn't make a difference in how fast events can be processed if they are queued on the master vs. client, but this observation made it very apparent that throughput is a problem on the master. I did not make any requests that would potentially block for a long time, so it’s even weirder to me, that the throughput is so low. One thing I don’t understand for example, is why all messages go through the master process. The parsing for example could be done in a completely separate process and if every connected framework would be backed by its own process, the check if a framework is connected could also be done there (not to mention that this requirement exists only because we need to use multiple connections). Requiring all messages to go through a single process that can indefinitely block is obviously a huge bottleneck. I understand that this problem is not limited to the HTTP API, but I think it has to be fixed.

—
Dario

> On Oct 16, 2016, at 5:52 PM, Anand Mazumdar <ma...@gmail.com> wrote:
> 
> Dario,
> 
> Regarding:
> 
> >This is especially concerning, as it means that accepting calls will completely stall when a long running call (e.g. retrieving state.json) is running. 
> 
> How does it help a client when it gets an early accepted response versus when accepting of calls is stalled i.e., queued up on the master actor? The client does not need to wait for a response before pipelining its next request to the master anyway. In your tests, do you send the next REVIVE call only upon receiving the response to the current call? That might explain the behavior you are seeing.
> 
> -anand
> 
> On Sun, Oct 16, 2016 at 11:58 AM, tommy xiao <xiaods@gmail.com <ma...@gmail.com>> wrote:
> interesting this topic. 
> 
> 2016-10-17 2:51 GMT+08:00 Dario Rexin <drexin@apple.com <ma...@apple.com>>:
> Hi Anand,
> 
> I tested with current HEAD. After I saw low throughput on our own HTTP API client, I wrote a small server that sends out fake events and accepts calls and our client was able to send a lot more calls to that server. I also wrote a small tool that simply sends as many calls to Mesos as possible without handling any events and get similar results there.I also observe extremely high CPU usage. While my sending tool is using ~10% CPU, Mesos runs on ~185%. The calls I send for testing are all REVIVE and I don’t have any agents connected, so there should be essentially nothing happening. One reason I could think of for the reduced throughput is that all calls are processed in the master process, before it sends back an ACCEPTED, leading to effectively single threaded processing of HTTP calls, interleaved with all other calls that are sent to the master process. Libprocess however just forwards the messages to the master process and then immediately  returns ACCEPTED. It also handles all connections in separate processes, whereas HTTP calls are effectively all handled by the master process.This is especially concerning, as it means that accepting calls will completely stall when a long running call (e.g. retrieving state.json) is running. 
> 
> Thanks,
> Dario
> 
>> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <anand@apache.org <ma...@apache.org>> wrote:
>> 
>> Dario,
>> 
>> Thanks for reporting this. Did you test this with 1.0 or the recent HEAD? We had done performance testing prior to 1.0rc1 and had not found any substantial discrepancy on the call ingestion path. Hence, we had focussed on fixing the performance issues around writing events on the stream in MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>. 
>> 
>> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ due to the slowness of the client (scheduler library) in processing the status update events. We should add another benchmark that measures just the time taken by the master to write the events. I would file an issue shortly to address this. 
>> 
>> Do you mind filing an issue with more details on your test setup?
>> 
>> -anand
>> 
>> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <drexin@apple.com <ma...@apple.com>> wrote:
>> Hi haosdent,
>> 
>> thanks for the pointer! Your results show exactly what I’m experiencing. I think especially for bigger clusters this could be very problematic. It would be great to get some input from the folks working on the HTTP API, especially Anand.
>> 
>> Thanks,
>> Dario
>> 
>>> On Oct 16, 2016, at 12:01 AM, haosdent <haosdent@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark test case to compare v1 and v0 APIs before. You could run it via
>>> 
>>> ```
>>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReconcileTasks_BENCHMARK_Test*"
>>> ```
>>> 
>>> Here is the result that run it in my machine.
>>> 
>>> ```
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
>>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
>>> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
>>> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
>>> Reconciling 100000 tasks took 33.134018718secs using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
>>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
>>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
>>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
>>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>>> ```
>>> 
>>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way based on libmesos.so.
>>> 
>>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <drexin@apple.com <ma...@apple.com>> wrote:
>>> Hi all,
>>> 
>>> I recently did some performance testing on the v1 scheduler API and found that throughput is around 10x lower than for the v0 API. Using 1 connection, I don’t get a lot more than 1,500 calls per second, where the v0 API can do ~15,000. If I use multiple connections, throughput maxes out at 3 connections and ~2,500 calls / s. If I add any more connections, the throughput per connection drops and the total throughput stays around ~2,500 calls / s. Has anyone done performance testing on the v1 API before? It seems a little strange to me, that it’s so much slower, given that the v0 API also uses HTTP (well, more or less). I would be thankful for any comments and experience reports of other users.
>>> 
>>> Thanks,
>>> Dario
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Best Regards,
>>> Haosdent Huang
>> 
>> 
> 
> 
> 
> 
> -- 
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com <http://gmail.com/>
> 
> 
> -- 
> Anand Mazumdar


Re: Performance regression in v1 api vs v0

Posted by Anand Mazumdar <ma...@gmail.com>.
Dario,

Regarding:

>This is especially concerning, as it means that accepting calls will
completely stall when a long running call (e.g. retrieving state.json) is
running.

How does it help a client when it gets an early accepted response versus
when accepting of calls is stalled i.e., queued up on the master actor? The
client does not need to wait for a response before pipelining its next
request to the master anyway. In your tests, do you send the next REVIVE
call only upon receiving the response to the current call? That might
explain the behavior you are seeing.

-anand

On Sun, Oct 16, 2016 at 11:58 AM, tommy xiao <xi...@gmail.com> wrote:

> interesting this topic.
>
> 2016-10-17 2:51 GMT+08:00 Dario Rexin <dr...@apple.com>:
>
>> Hi Anand,
>>
>> I tested with current HEAD. After I saw low throughput on our own HTTP
>> API client, I wrote a small server that sends out fake events and accepts
>> calls and our client was able to send a lot more calls to that server. I
>> also wrote a small tool that simply sends as many calls to Mesos as
>> possible without handling any events and get similar results there.I also
>> observe extremely high CPU usage. While my sending tool is using ~10% CPU,
>> Mesos runs on ~185%. The calls I send for testing are all REVIVE and I
>> don’t have any agents connected, so there should be essentially nothing
>> happening. One reason I could think of for the reduced throughput is that
>> all calls are processed in the master process, before it sends back an
>> ACCEPTED, leading to effectively single threaded processing of HTTP calls,
>> interleaved with all other calls that are sent to the master process.
>> Libprocess however just forwards the messages to the master process and
>> then immediately  returns ACCEPTED. It also handles all connections in
>> separate processes, whereas HTTP calls are effectively all handled by the
>> master process.This is especially concerning, as it means that accepting
>> calls will completely stall when a long running call (e.g. retrieving
>> state.json) is running.
>>
>> Thanks,
>> Dario
>>
>> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <an...@apache.org> wrote:
>>
>> Dario,
>>
>> Thanks for reporting this. Did you test this with 1.0 or the recent HEAD?
>> We had done performance testing prior to 1.0rc1 and had not found any
>> substantial discrepancy on the call ingestion path. Hence, we had focussed
>> on fixing the performance issues around writing events on the stream in
>> MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and
>> MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>.
>>
>> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ
>> due to the slowness of the client (scheduler library) in processing the
>> status update events. We should add another benchmark that measures just
>> the time taken by the master to write the events. I would file an issue
>> shortly to address this.
>>
>> Do you mind filing an issue with more details on your test setup?
>>
>> -anand
>>
>> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <dr...@apple.com> wrote:
>>
>>> Hi haosdent,
>>>
>>> thanks for the pointer! Your results show exactly what I’m experiencing.
>>> I think especially for bigger clusters this could be very problematic. It
>>> would be great to get some input from the folks working on the HTTP API,
>>> especially Anand.
>>>
>>> Thanks,
>>> Dario
>>>
>>> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
>>>
>>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark
>>> test case to compare v1 and v0 APIs before. You could run it via
>>>
>>> ```
>>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReco
>>> ncileTasks_BENCHMARK_Test*"
>>> ```
>>>
>>> Here is the result that run it in my machine.
>>>
>>> ```
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/0
>>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/1
>>> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/2
>>> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/3
>>> Reconciling 100000 tasks took 33.134018718secs using the scheduler
>>> library
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/0
>>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/1
>>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/2
>>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>>> [ RUN      ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/3
>>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>>> [       OK ] Tasks/SchedulerReconcileTasks_
>>> BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>>> ```
>>>
>>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way
>>> based on libmesos.so.
>>>
>>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I recently did some performance testing on the v1 scheduler API and
>>>> found that throughput is around 10x lower than for the v0 API. Using 1
>>>> connection, I don’t get a lot more than 1,500 calls per second, where the
>>>> v0 API can do ~15,000. If I use multiple connections, throughput maxes out
>>>> at 3 connections and ~2,500 calls / s. If I add any more connections, the
>>>> throughput per connection drops and the total throughput stays around
>>>> ~2,500 calls / s. Has anyone done performance testing on the v1 API before?
>>>> It seems a little strange to me, that it’s so much slower, given that the
>>>> v0 API also uses HTTP (well, more or less). I would be thankful for any
>>>> comments and experience reports of other users.
>>>>
>>>> Thanks,
>>>> Dario
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>>
>>>
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>



-- 
Anand Mazumdar

Re: Performance regression in v1 api vs v0

Posted by tommy xiao <xi...@gmail.com>.
interesting this topic.

2016-10-17 2:51 GMT+08:00 Dario Rexin <dr...@apple.com>:

> Hi Anand,
>
> I tested with current HEAD. After I saw low throughput on our own HTTP API
> client, I wrote a small server that sends out fake events and accepts calls
> and our client was able to send a lot more calls to that server. I also
> wrote a small tool that simply sends as many calls to Mesos as possible
> without handling any events and get similar results there.I also observe
> extremely high CPU usage. While my sending tool is using ~10% CPU, Mesos
> runs on ~185%. The calls I send for testing are all REVIVE and I don’t have
> any agents connected, so there should be essentially nothing happening. One
> reason I could think of for the reduced throughput is that all calls are
> processed in the master process, before it sends back an ACCEPTED, leading
> to effectively single threaded processing of HTTP calls, interleaved with
> all other calls that are sent to the master process. Libprocess however
> just forwards the messages to the master process and then immediately
>  returns ACCEPTED. It also handles all connections in separate processes,
> whereas HTTP calls are effectively all handled by the master process.This
> is especially concerning, as it means that accepting calls will completely
> stall when a long running call (e.g. retrieving state.json) is running.
>
> Thanks,
> Dario
>
> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <an...@apache.org> wrote:
>
> Dario,
>
> Thanks for reporting this. Did you test this with 1.0 or the recent HEAD?
> We had done performance testing prior to 1.0rc1 and had not found any
> substantial discrepancy on the call ingestion path. Hence, we had focussed
> on fixing the performance issues around writing events on the stream in
> MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and
> MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>.
>
> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ
> due to the slowness of the client (scheduler library) in processing the
> status update events. We should add another benchmark that measures just
> the time taken by the master to write the events. I would file an issue
> shortly to address this.
>
> Do you mind filing an issue with more details on your test setup?
>
> -anand
>
> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <dr...@apple.com> wrote:
>
>> Hi haosdent,
>>
>> thanks for the pointer! Your results show exactly what I’m experiencing.
>> I think especially for bigger clusters this could be very problematic. It
>> would be great to get some input from the folks working on the HTTP API,
>> especially Anand.
>>
>> Thanks,
>> Dario
>>
>> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
>>
>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark test
>> case to compare v1 and v0 APIs before. You could run it via
>>
>> ```
>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReco
>> ncileTasks_BENCHMARK_Test*"
>> ```
>>
>> Here is the result that run it in my machine.
>>
>> ```
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrar
>> y/0
>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
>> (479 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrar
>> y/1
>> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
>> (3435 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrar
>> y/2
>> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
>> (16737 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrar
>> y/3
>> Reconciling 100000 tasks took 33.134018718secs using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
>> (33333 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver
>> /0
>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
>> (89 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver
>> /1
>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
>> (385 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver
>> /2
>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
>> (1379 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver
>> /3
>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
>> (2711 ms)
>> ```
>>
>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way
>> based on libmesos.so.
>>
>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:
>>
>>> Hi all,
>>>
>>> I recently did some performance testing on the v1 scheduler API and
>>> found that throughput is around 10x lower than for the v0 API. Using 1
>>> connection, I don’t get a lot more than 1,500 calls per second, where the
>>> v0 API can do ~15,000. If I use multiple connections, throughput maxes out
>>> at 3 connections and ~2,500 calls / s. If I add any more connections, the
>>> throughput per connection drops and the total throughput stays around
>>> ~2,500 calls / s. Has anyone done performance testing on the v1 API before?
>>> It seems a little strange to me, that it’s so much slower, given that the
>>> v0 API also uses HTTP (well, more or less). I would be thankful for any
>>> comments and experience reports of other users.
>>>
>>> Thanks,
>>> Dario
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>>
>
>


-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Re: Performance regression in v1 api vs v0

Posted by Dario Rexin <dr...@apple.com>.
Hi Anand,

I tested with current HEAD. After I saw low throughput on our own HTTP API client, I wrote a small server that sends out fake events and accepts calls and our client was able to send a lot more calls to that server. I also wrote a small tool that simply sends as many calls to Mesos as possible without handling any events and get similar results there.I also observe extremely high CPU usage. While my sending tool is using ~10% CPU, Mesos runs on ~185%. The calls I send for testing are all REVIVE and I don’t have any agents connected, so there should be essentially nothing happening. One reason I could think of for the reduced throughput is that all calls are processed in the master process, before it sends back an ACCEPTED, leading to effectively single threaded processing of HTTP calls, interleaved with all other calls that are sent to the master process. Libprocess however just forwards the messages to the master process and then immediately  returns ACCEPTED. It also handles all connections in separate processes, whereas HTTP calls are effectively all handled by the master process.This is especially concerning, as it means that accepting calls will completely stall when a long running call (e.g. retrieving state.json) is running. 

Thanks,
Dario

> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar <an...@apache.org> wrote:
> 
> Dario,
> 
> Thanks for reporting this. Did you test this with 1.0 or the recent HEAD? We had done performance testing prior to 1.0rc1 and had not found any substantial discrepancy on the call ingestion path. Hence, we had focussed on fixing the performance issues around writing events on the stream in MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>. 
> 
> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ due to the slowness of the client (scheduler library) in processing the status update events. We should add another benchmark that measures just the time taken by the master to write the events. I would file an issue shortly to address this. 
> 
> Do you mind filing an issue with more details on your test setup?
> 
> -anand
> 
> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <drexin@apple.com <ma...@apple.com>> wrote:
> Hi haosdent,
> 
> thanks for the pointer! Your results show exactly what I’m experiencing. I think especially for bigger clusters this could be very problematic. It would be great to get some input from the folks working on the HTTP API, especially Anand.
> 
> Thanks,
> Dario
> 
>> On Oct 16, 2016, at 12:01 AM, haosdent <haosdent@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark test case to compare v1 and v0 APIs before. You could run it via
>> 
>> ```
>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReconcileTasks_BENCHMARK_Test*"
>> ```
>> 
>> Here is the result that run it in my machine.
>> 
>> ```
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
>> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
>> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
>> Reconciling 100000 tasks took 33.134018718secs using the scheduler library
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
>> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
>> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
>> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
>> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>> ```
>> 
>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way based on libmesos.so.
>> 
>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <drexin@apple.com <ma...@apple.com>> wrote:
>> Hi all,
>> 
>> I recently did some performance testing on the v1 scheduler API and found that throughput is around 10x lower than for the v0 API. Using 1 connection, I don’t get a lot more than 1,500 calls per second, where the v0 API can do ~15,000. If I use multiple connections, throughput maxes out at 3 connections and ~2,500 calls / s. If I add any more connections, the throughput per connection drops and the total throughput stays around ~2,500 calls / s. Has anyone done performance testing on the v1 API before? It seems a little strange to me, that it’s so much slower, given that the v0 API also uses HTTP (well, more or less). I would be thankful for any comments and experience reports of other users.
>> 
>> Thanks,
>> Dario
>> 
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Haosdent Huang
> 
> 


Re: Performance regression in v1 api vs v0

Posted by Anand Mazumdar <an...@apache.org>.
Dario,

Thanks for reporting this. Did you test this with 1.0 or the recent HEAD?
We had done performance testing prior to 1.0rc1 and had not found any
substantial discrepancy on the call ingestion path. Hence, we had focussed
on fixing the performance issues around writing events on the stream in
MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and MESOS-5457
<https://issues.apache.org/jira/browse/MESOS-5457>.

The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ due
to the slowness of the client (scheduler library) in processing the status
update events. We should add another benchmark that measures just the time
taken by the master to write the events. I would file an issue shortly to
address this.

Do you mind filing an issue with more details on your test setup?

-anand

On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin <dr...@apple.com> wrote:

> Hi haosdent,
>
> thanks for the pointer! Your results show exactly what I’m experiencing. I
> think especially for bigger clusters this could be very problematic. It
> would be great to get some input from the folks working on the HTTP API,
> especially Anand.
>
> Thanks,
> Dario
>
> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
>
> Hmm, this is an interesting topic. @anandmazumdar create a benchmark test
> case to compare v1 and v0 APIs before. You could run it via
>
> ```
> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReconcileTasks_
> BENCHMARK_Test*"
> ```
>
> Here is the result that run it in my machine.
>
> ```
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerLibrary/0
> Reconciling 1000 tasks took 386.451108ms using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
> (479 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerLibrary/1
> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
> (3435 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerLibrary/2
> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
> (16737 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerLibrary/3
> Reconciling 100000 tasks took 33.134018718secs using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
> (33333 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerDriver/0
> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
> (89 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerDriver/1
> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
> (385 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerDriver/2
> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
> (1379 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.
> SchedulerDriver/3
> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
> (2711 ms)
> ```
>
> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way based
> on libmesos.so.
>
> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:
>
>> Hi all,
>>
>> I recently did some performance testing on the v1 scheduler API and found
>> that throughput is around 10x lower than for the v0 API. Using 1
>> connection, I don’t get a lot more than 1,500 calls per second, where the
>> v0 API can do ~15,000. If I use multiple connections, throughput maxes out
>> at 3 connections and ~2,500 calls / s. If I add any more connections, the
>> throughput per connection drops and the total throughput stays around
>> ~2,500 calls / s. Has anyone done performance testing on the v1 API before?
>> It seems a little strange to me, that it’s so much slower, given that the
>> v0 API also uses HTTP (well, more or less). I would be thankful for any
>> comments and experience reports of other users.
>>
>> Thanks,
>> Dario
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>
>

Re: Performance regression in v1 api vs v0

Posted by Dario Rexin <dr...@apple.com>.
Hi haosdent,

thanks for the pointer! Your results show exactly what I’m experiencing. I think especially for bigger clusters this could be very problematic. It would be great to get some input from the folks working on the HTTP API, especially Anand.

Thanks,
Dario

> On Oct 16, 2016, at 12:01 AM, haosdent <ha...@gmail.com> wrote:
> 
> Hmm, this is an interesting topic. @anandmazumdar create a benchmark test case to compare v1 and v0 APIs before. You could run it via
> 
> ```
> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReconcileTasks_BENCHMARK_Test*"
> ```
> 
> Here is the result that run it in my machine.
> 
> ```
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
> Reconciling 1000 tasks took 386.451108ms using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
> Reconciling 10000 tasks took 3.389258444secs using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
> Reconciling 50000 tasks took 16.624603964secs using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
> Reconciling 100000 tasks took 33.134018718secs using the scheduler library
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0 (89 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
> Reconciling 10000 tasks took 316.115078ms using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1 (385 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
> Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
> [ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
> Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
> [       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
> ```
> 
> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way based on libmesos.so.
> 
> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <drexin@apple.com <ma...@apple.com>> wrote:
> Hi all,
> 
> I recently did some performance testing on the v1 scheduler API and found that throughput is around 10x lower than for the v0 API. Using 1 connection, I don’t get a lot more than 1,500 calls per second, where the v0 API can do ~15,000. If I use multiple connections, throughput maxes out at 3 connections and ~2,500 calls / s. If I add any more connections, the throughput per connection drops and the total throughput stays around ~2,500 calls / s. Has anyone done performance testing on the v1 API before? It seems a little strange to me, that it’s so much slower, given that the v0 API also uses HTTP (well, more or less). I would be thankful for any comments and experience reports of other users.
> 
> Thanks,
> Dario
> 
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang


Re: Performance regression in v1 api vs v0

Posted by haosdent <ha...@gmail.com>.
Hmm, this is an interesting topic. @anandmazumdar create a benchmark test
case to compare v1 and v0 APIs before. You could run it via

```
./bin/mesos-tests.sh --benchmark
--gtest_filter="*SchedulerReconcileTasks_BENCHMARK_Test*"
```

Here is the result that run it in my machine.

```
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0
Reconciling 1000 tasks took 386.451108ms using the scheduler library
[       OK ]
Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1
Reconciling 10000 tasks took 3.389258444secs using the scheduler library
[       OK ]
Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2
Reconciling 50000 tasks took 16.624603964secs using the scheduler library
[       OK ]
Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3
Reconciling 100000 tasks took 33.134018718secs using the scheduler library
[       OK ]
Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary/3 (33333 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
Reconciling 1000 tasks took 24.212092ms using the scheduler driver
[       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/0
(89 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
Reconciling 10000 tasks took 316.115078ms using the scheduler driver
[       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/1
(385 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
Reconciling 50000 tasks took 1.239050154secs using the scheduler driver
[       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/2
(1379 ms)
[ RUN      ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
Reconciling 100000 tasks took 2.38445672secs using the scheduler driver
[       OK ] Tasks/SchedulerReconcileTasks_BENCHMARK_Test.SchedulerDriver/3
(2711 ms)
```

*SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way based
on libmesos.so.

On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin <dr...@apple.com> wrote:

> Hi all,
>
> I recently did some performance testing on the v1 scheduler API and found
> that throughput is around 10x lower than for the v0 API. Using 1
> connection, I don’t get a lot more than 1,500 calls per second, where the
> v0 API can do ~15,000. If I use multiple connections, throughput maxes out
> at 3 connections and ~2,500 calls / s. If I add any more connections, the
> throughput per connection drops and the total throughput stays around
> ~2,500 calls / s. Has anyone done performance testing on the v1 API before?
> It seems a little strange to me, that it’s so much slower, given that the
> v0 API also uses HTTP (well, more or less). I would be thankful for any
> comments and experience reports of other users.
>
> Thanks,
> Dario
>
>


-- 
Best Regards,
Haosdent Huang