You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2008/07/03 00:14:45 UTC

[jira] Created: (HADOOP-3685) Unbalanced replication target

Unbalanced replication target 
------------------------------

                 Key: HADOOP-3685
                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.17.0
            Reporter: Koji Noguchi
            Priority: Critical


In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
Most of them were from remote rack.

Looking at the code, 

{noformat}
    166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
    167                           maxNodesPerRack, results);
{noformat}

was sometimes not choosing the local rack of the writer(source).  

As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3685) Unbalanced replication target

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624801#action_12624801 ] 

Hudson commented on HADOOP-3685:
--------------------------------

Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3685:
----------------------------------

    Status: Open  (was: Patch Available)

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang reassigned HADOOP-3685:
-------------------------------------

    Assignee: Hairong Kuang

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Critical
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3685:
------------------------------------

         Priority: Blocker  (was: Critical)
    Fix Version/s: 0.18.0
                   0.17.1

Need patches for both 17 and 18, if different.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610073#action_12610073 ] 

Hairong Kuang commented on HADOOP-3685:
---------------------------------------

This bug is introduced by HADOOP-2559. The change there works for choosing targets for a new block, but does not work for re-replicating an underreplicated block.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Priority: Critical
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3685:
----------------------------------

    Attachment: rereplicationPolicy.patch

This patch places a third replica on the rack where the source is located in case of rereplication when two existing replicas are on two different racks.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610295#action_12610295 ] 

hairong edited comment on HADOOP-3685 at 7/3/08 11:30 AM:
----------------------------------------------------------------

This patch places a third replica on the rack where the source is located in case of rereplication when two existing replicas are on two different racks. Since the source and the target are at the same rack, only the datanodes on the same rack may choose to replicate an underreplicated blocks to this rack. Therefore at most twice of (rack size-1) block transfers may happen to a single target within a heartbeat interval.

      was (Author: hairong):
    This patch places a third replica on the rack where the source is located in case of rereplication when two existing replicas are on two different racks.
  
> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3685:
----------------------------------

    Status: Patch Available  (was: Open)

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3685:
------------------------------------

    Fix Version/s: 0.17.2

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3685) Unbalanced replication target

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611379#action_12611379 ] 

Hadoop QA commented on HADOOP-3685:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12385220/rereplicationPolicy.patch
  against trunk revision 674645.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2804/console

This message is automatically generated.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3685:
----------------------------------

    Status: Open  (was: Patch Available)

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3685) Unbalanced replication target

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610409#action_12610409 ] 

Lohit Vijayarenu commented on HADOOP-3685:
------------------------------------------

+1 patch looks good. Should we document this someplace? I see that we missed changing hdfs_design.html

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611850#action_12611850 ] 

Hairong Kuang commented on HADOOP-3685:
---------------------------------------

Targets test-core and test-patch are passed on my local machine under trunk, branch 17, and branch 18.

Here is the test-patch result:
     [exec] +1 overall.

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang resolved HADOOP-3685.
-----------------------------------

    Resolution: Fixed

I've just committed this.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3685:
----------------------------------

    Fix Version/s:     (was: 0.17.1)
     Hadoop Flags: [Reviewed]
           Status: Patch Available  (was: Open)

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: rereplicationPolicy.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3685) Unbalanced replication target

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3685:
----------------------------------

    Attachment: rereplicationPolicy1.patch

Here is a patch that applies to the trunk.

> Unbalanced replication target 
> ------------------------------
>
>                 Key: HADOOP-3685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3685
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: rereplicationPolicy.patch, rereplicationPolicy1.patch
>
>
> In HADOOP-3633, namenode was assigning some datanodes to receive  hundreds of blocks in a short period which caused datanodes to go out of memroy(threads).
> Most of them were from remote rack.
> Looking at the code, 
> {noformat}
>     166           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>     167                           maxNodesPerRack, results);
> {noformat}
> was sometimes not choosing the local rack of the writer(source).  
> As a result, when a datanode goes down, other datanodes on the same rack were getting large number of blocks from remote racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.