You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/03/23 19:39:50 UTC

[jira] Created: (HADOOP-5556) A few improvements to DataNodeCluster

A few improvements to DataNodeCluster
-------------------------------------

                 Key: HADOOP-5556
                 URL: https://issues.apache.org/jira/browse/HADOOP-5556
             Project: Hadoop Core
          Issue Type: Bug
          Components: test
            Reporter: Hairong Kuang
             Fix For: 0.21.0


DataNodeCluster is a great tool to simulate a large scale DFS cluster using a small set of machines. A few suggestions to improve this tool:
# DataNodeCluster uses MiniDFSCluster#startDataNode to start multiple instances of DataNode on one machine. MiniDFSCluster sets DataNode's address to be 127.0.0.1. We should allow to set its address to 0.0.0.0 so DataNodes in different machines could communicate.
# Currently the size of the blocks injected to DataNode and created in CreatedEditsLog is hardcoded as 10. It would be more convenient if this could be configurable. Also we need to make sure that both use the same block size.
# If the replication factor of blocks is larger than 1, currently a DataNode in DataNodeCluster will be injected blocks multiple times and therefore it sends block reports to NameNode multiple times. Initial block reports contain only a portion of its blocks and therefore may cause unnecessary block replications. It would be cleaner if only one block report with all its blocks is sent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-5556) A few improvements to DataNodeCluster

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang reassigned HADOOP-5556:
-------------------------------------

    Assignee: Hairong Kuang

> A few improvements to DataNodeCluster
> -------------------------------------
>
>                 Key: HADOOP-5556
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5556
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: DataNodeCluster.patch
>
>
> DataNodeCluster is a great tool to simulate a large scale DFS cluster using a small set of machines. A few suggestions to improve this tool:
> # DataNodeCluster uses MiniDFSCluster#startDataNode to start multiple instances of DataNode on one machine. MiniDFSCluster sets DataNode's address to be 127.0.0.1. We should allow to set its address to 0.0.0.0 so DataNodes in different machines could communicate.
> # Currently the size of the blocks injected to DataNode and created in CreatedEditsLog is hardcoded as 10. It would be more convenient if this could be configurable. Also we need to make sure that both use the same block size.
> # If the replication factor of blocks is larger than 1, currently a DataNode in DataNodeCluster will be injected blocks multiple times and therefore it sends block reports to NameNode multiple times. Initial block reports contain only a portion of its blocks and therefore may cause unnecessary block replications. It would be cleaner if only one block report with all its blocks is sent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5556) A few improvements to DataNodeCluster

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5556:
----------------------------------

    Attachment: DataNodeCluster.patch

This patch made the changes suggested above. It also optimizes SimulatedFSDataset#injectBlocks by avoid allocating a new block map.

> A few improvements to DataNodeCluster
> -------------------------------------
>
>                 Key: HADOOP-5556
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5556
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: DataNodeCluster.patch
>
>
> DataNodeCluster is a great tool to simulate a large scale DFS cluster using a small set of machines. A few suggestions to improve this tool:
> # DataNodeCluster uses MiniDFSCluster#startDataNode to start multiple instances of DataNode on one machine. MiniDFSCluster sets DataNode's address to be 127.0.0.1. We should allow to set its address to 0.0.0.0 so DataNodes in different machines could communicate.
> # Currently the size of the blocks injected to DataNode and created in CreatedEditsLog is hardcoded as 10. It would be more convenient if this could be configurable. Also we need to make sure that both use the same block size.
> # If the replication factor of blocks is larger than 1, currently a DataNode in DataNodeCluster will be injected blocks multiple times and therefore it sends block reports to NameNode multiple times. Initial block reports contain only a portion of its blocks and therefore may cause unnecessary block replications. It would be cleaner if only one block report with all its blocks is sent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5556) A few improvements to DataNodeCluster

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688835#action_12688835 ] 

Steve Loughran commented on HADOOP-5556:
----------------------------------------

#Does CreateEditsLog have to create a default Configuration() instance, or could createEditsLog() be changed to take a Configuration as a parameter? 
# It might be useful to think -at some point in time- to move these MiniDFSCluster and  DataNodeCluster classes into a redistributable JAR, as they are handy to anyone trying to set up a test cluster in a single JVM -though we'd have to get rid of all emergency System.exit() calls first.

> A few improvements to DataNodeCluster
> -------------------------------------
>
>                 Key: HADOOP-5556
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5556
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: DataNodeCluster.patch
>
>
> DataNodeCluster is a great tool to simulate a large scale DFS cluster using a small set of machines. A few suggestions to improve this tool:
> # DataNodeCluster uses MiniDFSCluster#startDataNode to start multiple instances of DataNode on one machine. MiniDFSCluster sets DataNode's address to be 127.0.0.1. We should allow to set its address to 0.0.0.0 so DataNodes in different machines could communicate.
> # Currently the size of the blocks injected to DataNode and created in CreatedEditsLog is hardcoded as 10. It would be more convenient if this could be configurable. Also we need to make sure that both use the same block size.
> # If the replication factor of blocks is larger than 1, currently a DataNode in DataNodeCluster will be injected blocks multiple times and therefore it sends block reports to NameNode multiple times. Initial block reports contain only a portion of its blocks and therefore may cause unnecessary block replications. It would be cleaner if only one block report with all its blocks is sent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.