You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2019/05/08 16:17:00 UTC
[jira] [Commented] (MESOS-9725) Perform incremental sorting in the random sorter.

    [ https://issues.apache.org/jira/browse/MESOS-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835725#comment-16835725 ] 

Meng Zhu commented on MESOS-9725:
---------------------------------

Based on a recent internal test, the sort() does not take much time. And this ticket would introduce some extra complexities.

The review above (https://reviews.apache.org/r/70497/) is pretty ready except one issue that still needs to figure out. In the review, we used a hashmap and used double as the key. This worries us because of the double precision issue. A solution is to use rational numbers. 

Given the benefit and complexity of the patch, we decided to shelve it for now. Move this ticket back to `accepted`.

> Perform incremental sorting in the random sorter.
> -------------------------------------------------
>
>                 Key: MESOS-9725
>                 URL: https://issues.apache.org/jira/browse/MESOS-9725
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Meng Zhu
>            Assignee: Meng Zhu
>            Priority: Major
>              Labels: performance, resource-management
>
> By doing random sampling every time as the caller asks for the next client (See MESOS-9722) we could avoid the cost of full shuffling and only pay as we go.
> While the hope is to do each random sampling with O(1) cost, the presence of weights complicates the matter. We will need to pay O(log( n )) for every sample even with fancy data structures like segment tree or binary index trees (naive ones will result in O( n ) since we need to look at every node's weights). And the current full node shuffling is already optimal (nlog( n )) if all nodes are picked.
> However, since the number of *distinct* weights is usually much smaller comparing to the size of clients, we can minimize the sample cost by picking a client in two steps:
> Step1: randomly pick a group of clients that has the same weight by generating a weighted random number.
> Step2: Once a vector of clients is chosen, randomly sample a specific client within the group. Since all the clients in the chosen vector have the same weight, we do not need to consider any weights.
>  
> Since the size of distinct weights is usually much smaller comparing to the size of clients, this way, we minimize the cost of generating weighted random numbers which are linear with the size of weights.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)