You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Yanyan Hu (JIRA)" <ji...@apache.org> on 2016/05/23 01:58:12 UTC

[jira] [Commented] (MESOS-5425) Consider using IntervalSet for Port range resource math

    [ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295833#comment-15295833 ] 

Yanyan Hu commented on MESOS-5425:
----------------------------------

Hi, Joseph, thanks for filing this jira. I copied my test result from 3051 to here:

Hi, guys, I'm now trying to use Mesos to manage a container cluster in large scale. And I'm using Mesos-0.25.0 with Marathon stays upon it. But when I made test using this environment, I found we still suffered from this issue when Marathon allocated port resource randomly.

In my test, three Mesos-slaves were activated with each one has available port resource of [31000-37000]. Then I tried to created more than 3000 tasks in three slave nodes. I found when task amount reached 3000, it cost nearly 800 milisecond to finish a calculation of "Resources available = slaves[slaveId].total - slaves[slaveId].allocated
" which is performed in HierarchicalAllocatorProcess::allocate() function:
https://github.com/apache/mesos/blob/0.25.0/src/master/allocator/mesos/hierarchical.hpp#L1284

Since I have three Mesos-slaves, the total time consumption of each invoking for allocate() function is more than 2 seconds which make the performance of Mesos-master very terrible.

So I tried to made a simple test to evaluate the performance of "Ranges" value calculation. I found the performance of subtraction operation is still not good:

e.g. res1 = [1-6000], res2 = [1-1, 3-3, 5-5, ...]
I changed the range_size of res2 and recorded the execution time for "res1 -= res2", the result is as followed:
(Test was done in a x86 VM which has 4 process cores and 16GB memory)

res2 range_size execution time(second)
1           0.003 (0.002 in kernel mode)
100       0.011
200       0.031
400       0.121
800       0.533
1600     2.157

By comparison, the performance of addition and comparison operations are much better. So looks like the current fix haven't completely resolved this problem. Based on our test, the Mesos-master's performance seriously suffered from this issue when task amount is more than 10000 with 20 activated Mesos-slave nodes.

I haven't tried latest Mesos release, but after checking the code of src/common/values.cpp in master branch, I found the implementation of "Ranges" data type is almost the same as in 0.25.0 release:
https://github.com/apache/mesos/blob/master/src/common/values.cpp
https://github.com/apache/mesos/blob/0.25.0/src/common/values.cpp

So I guess the problem is still there? So is there any way we can further optimize the implementation of "Ranges" data type so we can avoid this performance bottleneck? Thanks.

> Consider using IntervalSet for Port range resource math
> -------------------------------------------------------
>
>                 Key: MESOS-5425
>                 URL: https://issues.apache.org/jira/browse/MESOS-5425
>             Project: Mesos
>          Issue Type: Improvement
>          Components: allocation
>            Reporter: Joseph Wu
>              Labels: mesosphere
>
> Follow-up JIRA for comments raised in MESOS-3051 (see comments there).
> We should consider utilizing [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] in [Port range resource math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)