You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2009/05/30 17:10:07 UTC

[jira] Created: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Support running multiple DataNodes/TaskTrackers simultaneously in a single node
-------------------------------------------------------------------------------

                 Key: HADOOP-5945
                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: He Yongqiang


We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715756#action_12715756 ] 

Hairong Kuang commented on HADOOP-5945:
---------------------------------------

dfs already has a tool called DataNodeCluster that allows to run multiple datanodes in a node. I have used it in many of the large scale tests using only a small set of nodes.

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714746#action_12714746 ] 

Jakob Homan commented on HADOOP-5945:
-------------------------------------

It's possible now to run multiple datanodes/tasktrackers on the same node, simply by writing separate configuration files for each.  I do it quite often during testing.  There is some limited discussion on the mailing lists for how to do this, but there certainly could be a more concise, step-by-set set of instructions.  Are you meaning to do this on production machines? 

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715974#action_12715974 ] 

He Yongqiang commented on HADOOP-5945:
--------------------------------------

>>The datanode shouldn't be using that much CPU/memory if you can help it, as that is for the tasks the task tracker starts. You are free to increase the number of slots for task trackers to use up all the spare RAM, CPU time you have.
@Steve, thanks. The situation is that TT/JT are not used at all(they are not even started). Only hdfs are used as pure file sever.

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715909#action_12715909 ] 

Steve Loughran commented on HADOOP-5945:
----------------------------------------

>only one datanode server per node can not fully utilize the node's resources.

The datanode shouldn't be using that much CPU/memory if you can help it, as that is for the tasks the task tracker starts. You are free to increase the number of slots for task trackers to use up all the spare RAM, CPU time you have.

-If the server has spare storage, then you can add more directories to the list of storage dirs for the datanode to use

@Hairong -yes, separate VMs is best. There are a few places where System.exit() can be called. and unless you are running under a security manager, a single VM will shut down without warning.


> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715842#action_12715842 ] 

He Yongqiang commented on HADOOP-5945:
--------------------------------------

>>The big issue is what do you gain by running >1 TT and DN per node, except on testing, where you want to give the master nodes more of a workload?
Talked with one friend. And he said hdfs is used in the production environment as a catoon video store. And only one datanode server per node can not fully utilize the node's resources.
>>virtual machines
Good suggestions. One datanode server per vm, i guess it need special care to avoid all replicas of one file block be put on the same physical machine.
I think letting mutiple tasktrackers running on a same node is not a good desicion due to the memory problem. But if only the hdfs is used, why not supporting multiple datanodes on a single node? 
>>DataNodeCluster
Hairong, thanks. I will try it. Agreed with Jakob, we should put it in the documentation. But it seems DataNodeCluster is only used for test?

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "Yiping Han (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715593#action_12715593 ] 

Yiping Han commented on HADOOP-5945:
------------------------------------

What I do for testing is creating some virtual machines and run TT & DN inside. This look easy to maintain and deploy.

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715519#action_12715519 ] 

Steve Loughran commented on HADOOP-5945:
----------------------------------------

The big issue is what do you gain by running >1 TT and DN per node, except on testing, where you want to give the master nodes more of a workload?

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714805#action_12714805 ] 

He Yongqiang commented on HADOOP-5945:
--------------------------------------

Yeah. I often do that too. But everytime i start the datanode/tasktracker manually. And let that unoffical instances hb the master instance to add them in. So we can let the start script start multiple slave instances.
>>Are you meaning to do this on production machines? 
why this can not be done on production machines?

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5945) Support running multiple DataNodes/TaskTrackers simultaneously in a single node

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715759#action_12715759 ] 

Jakob Homan commented on HADOOP-5945:
-------------------------------------

>> dfs already has a tool called DataNodeCluster that allows to run multiple datanodes in a node. I have used it in many of the large scale tests using only a small set of nodes.
The DataNodeCluster is a neat tool that I didn't know about. It should be given greater prominence in the documentation....

> Support running multiple DataNodes/TaskTrackers simultaneously in a single node
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5945
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> We should support multiple datanodes/tasktrackers running at a same node, only if they do not share same port/local fs dir etc. I think Hadoop can be easily adapted to meet this.  
> I guess at first and the major step is that we should modify the script to let it support startting multiple datanode/tasktracker daemons in a same node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.