You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/12/03 17:49:00 UTC

[jira] [Work logged] (HADOOP-17408) Optimize NetworkTopology while sorting of block locations

     [ https://issues.apache.org/jira/browse/HADOOP-17408?focusedWorklogId=519789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519789 ]

ASF GitHub Bot logged work on HADOOP-17408:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Dec/20 17:48
            Start Date: 03/Dec/20 17:48
    Worklog Time Spent: 10m 
      Work Description: amahussein opened a new pull request #2514:
URL: https://github.com/apache/hadoop/pull/2514


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
   For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 519789)
    Remaining Estimate: 0h
            Time Spent: 10m

> Optimize NetworkTopology while sorting of block locations
> ---------------------------------------------------------
>
>                 Key: HADOOP-17408
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17408
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, net
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{NetworkTopology}}, I noticed that there are some hanging fruits to improve the performance.
> Inside {{sortByDistance}}, collections.shuffle is performed on the list before calling {{secondarySort}}.
> {code:java}
> Collections.shuffle(list, r);
> if (secondarySort != null) {
>   secondarySort.accept(list);
> }
> {code}
> However, in different call sites, {{collections.shuffle}} is passed as the secondarySort to {{sortByDistance}}. This means that the shuffle is executed twice on each list.
> Also, logic wise, it is useless to shuffle before applying a tie breaker which might make the shuffle work obsolete.
> In addition, [~daryn] reported that:
> * topology is unnecessarily locking/unlocking to calculate the distance for every node
> * shuffling uses a seeded Random, instead of ThreadLocalRandom, which is heavily synchronized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org