You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Barry, Sean F" <se...@intel.com> on 2012/04/09 18:22:00 UTC

multiple nodes one machine

Hi all,

I currently have a 2 node cluster up and running. But now I face a new issue, one of my nodes is running a Datanode and a Tasktracker on a 4 core machine and in order to do a bit of proof of concept testing I would like to have 4 nodes running on that particular machine. Does this mean that I would need to set that up as a pseudodistributed cluster? or do you have any other suggestions? And would I need to add 3 more datanodes and 3 more tasktrackers or either or?

Thanks
-SB

Re: multiple nodes one machine

Posted by Harsh J <ha...@cloudera.com>.

Hi,

With my configuration in place, I simply do:

"hadoop datanode 2>&1 > /tmp/datanode-$RANDOM.log &" required number
of times. I then track the launched instances with "jps", so I can
send them quit signals when I want to tear them down again.

On Tue, Apr 10, 2012 at 1:52 AM, Barry, Sean F <se...@intel.com> wrote:
> Harsh,
> I am interested in adding datanodes just for testing.
>
>
> I have a few more things I should have said earlier.
> My current cluster looks like this. Which I set up exactly like tutorial http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ except I am running Suse 12.1 on both boxes.
>
> Master - running NameNode, DataNode, TaskTracker, SecNameNode and JobTtracker
> ------
> Slave - running Datanode and TaskTracker
>
> My (4 core) slave  machine is the one that I would like to add three additional datanodes to But when I use the run-additionalDN.sh script I get an Usage: java DataNode    [-rollback].
>
> Am I supposed to run the script on my master node or slave node?
>
> -SB
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Monday, April 09, 2012 11:35 AM
> To: common-user@hadoop.apache.org
> Subject: Re: multiple nodes one machine
>
> Barry,
>
> Depends on what you'll be testing. If you want more daemons, then yes you need to add more nodes onto the same box (configs may be tweaked to achieve this). If you just want MR to provide more slots for tasks, then a specific task tracker property alone may be edited.
>
> For more daemons, see http://search-hadoop.com/m/a4klk28NUr12 and a neat config I use for running them without too much config mess:
> https://gist.github.com/2345300
>
> For the latter, see:
> http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F
>
> Alternatively, use the hadoop-test jar provided classes:
> MiniDFSCluster and MiniMRCluster which can run from with a test suite itself (With multiple threads as daemons, to simply test around with).
>
> On Mon, Apr 9, 2012 at 9:52 PM, Barry, Sean F <se...@intel.com> wrote:
>> Hi all,
>>
>> I currently have a 2 node cluster up and running. But now I face a new issue, one of my nodes is running a Datanode and a Tasktracker on a 4 core machine and in order to do a bit of proof of concept testing I would like to have 4 nodes running on that particular machine. Does this mean that I would need to set that up as a pseudodistributed cluster? or do you have any other suggestions? And would I need to add 3 more datanodes and 3 more tasktrackers or either or?
>>
>> Thanks
>> -SB
>
>
>
> --
> Harsh J



-- 
Harsh J

RE: multiple nodes one machine

Posted by "Barry, Sean F" <se...@intel.com>.

Harsh, 
I am interested in adding datanodes just for testing.

I have a few more things I should have said earlier. 
My current cluster looks like this. Which I set up exactly like tutorial http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ except I am running Suse 12.1 on both boxes.

Master - running NameNode, DataNode, TaskTracker, SecNameNode and JobTtracker
------
Slave - running Datanode and TaskTracker

My (4 core) slave  machine is the one that I would like to add three additional datanodes to But when I use the run-additionalDN.sh script I get an Usage: java DataNode    [-rollback].

Am I supposed to run the script on my master node or slave node?

-SB

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Monday, April 09, 2012 11:35 AM
To: common-user@hadoop.apache.org
Subject: Re: multiple nodes one machine

Barry,

Depends on what you'll be testing. If you want more daemons, then yes you need to add more nodes onto the same box (configs may be tweaked to achieve this). If you just want MR to provide more slots for tasks, then a specific task tracker property alone may be edited.

For more daemons, see http://search-hadoop.com/m/a4klk28NUr12 and a neat config I use for running them without too much config mess:
https://gist.github.com/2345300

For the latter, see:
http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F

Alternatively, use the hadoop-test jar provided classes:
MiniDFSCluster and MiniMRCluster which can run from with a test suite itself (With multiple threads as daemons, to simply test around with).

On Mon, Apr 9, 2012 at 9:52 PM, Barry, Sean F <se...@intel.com> wrote:
> Hi all,
>
> I currently have a 2 node cluster up and running. But now I face a new issue, one of my nodes is running a Datanode and a Tasktracker on a 4 core machine and in order to do a bit of proof of concept testing I would like to have 4 nodes running on that particular machine. Does this mean that I would need to set that up as a pseudodistributed cluster? or do you have any other suggestions? And would I need to add 3 more datanodes and 3 more tasktrackers or either or?
>
> Thanks
> -SB

--
Harsh J

Re: multiple nodes one machine

Posted by Harsh J <ha...@cloudera.com>.

Barry,

Depends on what you'll be testing. If you want more daemons, then yes
you need to add more nodes onto the same box (configs may be tweaked
to achieve this). If you just want MR to provide more slots for tasks,
then a specific task tracker property alone may be edited.

For more daemons, see http://search-hadoop.com/m/a4klk28NUr12 and a
neat config I use for running them without too much config mess:
https://gist.github.com/2345300

For the latter, see:
http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F

Alternatively, use the hadoop-test jar provided classes:
MiniDFSCluster and MiniMRCluster which can run from with a test suite
itself (With multiple threads as daemons, to simply test around with).

On Mon, Apr 9, 2012 at 9:52 PM, Barry, Sean F <se...@intel.com> wrote:
> Hi all,
>
> I currently have a 2 node cluster up and running. But now I face a new issue, one of my nodes is running a Datanode and a Tasktracker on a 4 core machine and in order to do a bit of proof of concept testing I would like to have 4 nodes running on that particular machine. Does this mean that I would need to set that up as a pseudodistributed cluster? or do you have any other suggestions? And would I need to add 3 more datanodes and 3 more tasktrackers or either or?
>
> Thanks
> -SB

-- 
Harsh J