You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/01/26 19:53:59 UTC

[jira] Created: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

A few optimizations to FsNamesystem#RecentInvalidateSets
--------------------------------------------------------

                 Key: HADOOP-5124
                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
             Fix For: 0.21.0


This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669800#action_12669800 ] 

Konstantin Shvachko commented on HADOOP-5124:
---------------------------------------------

# {{computeInvalidateWork()}}
## You probably want to use {{Math.min()}} in computing the value of {{nodesToProcess}}
## I would rather go with 
{{ArrayList<String> keyArray = new ArrayList<String>(recentInvalidateSets.keySet());}} 
than {{String[] keyArray}}. You will be able to use {{Collections.swap()}} instead of implementing it yourself.
Ideally it would be better of course to just get a random element from the TreeMap and put it into the array list.
# {{invalidateWorkForOneNode()}}
{code}
    if(it.hasNext())
      recentInvalidateSets.put(firstNodeId, invalidateSet);
{code}
Is a no op in your case, because {{recentInvalidateSets}} already contains {{firstNodeId}} with exactly {{invalidateSet}} as it was modified before in the loop.
The original variant of this code
{code}
    if(!it.hasNext())
      recentInvalidateSets.remove(nodeId);
{code}
makes more sense since we remove the entire node if it does not have invalid blocks anymore.
# Could you please run some tests showing how much of optimization we can get with the randomization of data-node selection.

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-5124:
-------------------------------------------

    Hadoop Flags: [Reviewed]

+1 patch looks good to me

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch, optimizeInvalidate2.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-5124:
-------------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I committed this.  Thanks, Hairong!

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch, optimizeInvalidate2.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669806#action_12669806 ] 

Konstantin Shvachko commented on HADOOP-5124:
---------------------------------------------

Sorry about (2). I thought {if(!it.hasNext())}} is the old code. It is the new one, so it is absolutely correct. Please disregard (2).

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5124:
----------------------------------

    Attachment: optimizeInvalidate1.patch

This patch goes with a unit testcase.

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5124:
----------------------------------

    Status: Patch Available  (was: Open)

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch, optimizeInvalidate2.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670166#action_12670166 ] 

Hadoop QA commented on HADOOP-5124:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12399320/optimizeInvalidate2.patch
  against trunk revision 740237.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3792/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3792/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3792/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3792/console

This message is automatically generated.

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch, optimizeInvalidate2.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5124:
----------------------------------

    Attachment: optimizeInvalidate.patch

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5124) A few optimizations to FsNamesystem#RecentInvalidateSets

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5124:
----------------------------------

    Attachment: optimizeInvalidate2.patch

This patch incorporates Konstantin's first comment.

As for (3), I am not clear how to evaluate the optimization. The goal of the new strategy is not to improve performance. Instead it aims to give fairness to all nodes when computing invalidation work. The current strategy always favors the ones in the beginning of the map since recentInvalidateSets is a TreeMap so it is sorted. Another flaw is that after the first node is scheduled, the node is reinserted into the map if it still has remaining blocks. Since it becomes the first node again, next call of invalidateWorkForOneNode will work on the same node again. The current strategy would work fine if recentInvalidateSets is a FIFO queue.

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch, optimizeInvalidate2.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.