You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Inigo Goiri (JIRA)" <ji...@apache.org> on 2015/07/04 00:54:04 UTC

[jira] [Commented] (HADOOP-12185) NetworkTopology is not efficient adding/getting/removing nodes

    [ https://issues.apache.org/jira/browse/HADOOP-12185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613497#comment-14613497 ] 

Inigo Goiri commented on HADOOP-12185:
--------------------------------------

First, I realized that to get a node in NetworkTopology, it did a full linear search without stopping when the node was found:
{code}
      Node childnode = null;
      for(int i=0; i<children.size(); i++) {
        if (children.get(i).getName().equals(path[0])) {
          childnode = children.get(i);
        }
      }
      if (childnode == null) return null; // non-existing node
{code}

Adding a break when the child is found improves the performance a lot (2x in most cases).

After profiling, I realized that this linear search pattern in general was pretty bad and using a HashMap improved the performance significantly.
I did some testing adding, getting, and removing nodes, and these are the average results for 10 executions:
Large cluster: 100 racks with 40 machine per rack:
{color:gray}
  Default:  1067.0 1259.0 1157.3
  Break:    1305.2  602.3  116.7
  Fancy:     122.3   69.6  137.9
{color}

Very large cluster: 800 racks with 800 machine per rack:
{color:gray}
  Default:  22110  29600  17975
  Break:    21756  12816   1975
  Fancy:    6013   2362    2015
{color}

Medium cluster: 100 racks with 100 machine per rack:
{color:gray}
  Default: 57.2 52.9 50.6
  Break:   59.3 42.4 32.3
  Fancy:   40.1 24.6 41.2
{color}

Regular cluster: 10 racks with 10 machine per rack:
{color:gray}
  Default: 3.2 4.0 3.2
  Break:   3.1 4.1 3.9
  Fancy:   2.5 4.3 2.4
{color}

The results are average running time for ten executions in milliseconds.
The first column is addition of nodes, the second one is retrieving those nodes and the last one is removing them.
"Default" is what NetworkTopology does right now, "Break" is adding the break in the search, and "Fancy" is the one that includes the HashMap to do the search (I'll submit a patch with it).




> NetworkTopology is not efficient adding/getting/removing nodes
> --------------------------------------------------------------
>
>                 Key: HADOOP-12185
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12185
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Inigo Goiri
>            Assignee: Inigo Goiri
>
> NetworkToplogy uses nodes with a list of children. The access to these children is slow as it's a linear search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)