You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ahmed Hussein (Jira)" <ji...@apache.org> on 2020/12/03 15:30:00 UTC
[jira] [Created] (HADOOP-17408) Optimize NetworkTopology while
sorting of block locations
Ahmed Hussein created HADOOP-17408:
--------------------------------------
Summary: Optimize NetworkTopology while sorting of block locations
Key: HADOOP-17408
URL: https://issues.apache.org/jira/browse/HADOOP-17408
Project: Hadoop Common
Issue Type: Improvement
Components: common, net
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein
In {{NetworkTopology}}, I noticed that there are some hanging fruits to improve the performance.
Inside {{sortByDistance}}, collections.shuffle is performed on the list before calling {{secondarySort}}.
{code:java}
Collections.shuffle(list, r);
if (secondarySort != null) {
secondarySort.accept(list);
}
{code}
However, in different call sites, {{collections.shuffle}} is passed as the secondarySort to {{sortByDistance}}. This means that the shuffle is executed twice on each list.
Also, logic wise, it is useless to shuffle before applying a tie breaker which might make the shuffle work obsolete.
In addition, [~daryn] reported that:
* topology is unnecessarily locking/unlocking to calculate the distance for every node
* shuffling uses a seeded Random, instead of ThreadLocalRandom, which is heavily synchronized
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org