You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Ramya S <ra...@suntecgroup.com> on 2013/05/14 09:01:51 UTC

About configuring cluster setup

Hi,
 
Can we configure 1 node as both Name node and Data node ?

Re: About configuring cluster setup

Posted by Chris Embree <ce...@gmail.com>.

It's not a good idea for anything more than Proof of Concept or Sandbox
clusters.


On Tue, May 14, 2013 at 3:10 AM, Leonid Fedotov <lf...@hortonworks.com>wrote:

> No, it is not called "pseudo distributed" mode. It called "as you wish"
> mode...
> It is absolutely normal configuration.
> You can distribute your nodes as you like, no one limit you on it.
> Just make sure you have enough memory on your nodes.
> If you have more questions, feel free to s=ask it directly.
>
>
>
> On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:
>
>> yes you can. Its called as pseudo distributed mode
>>
>>
>> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>>
>>>  Hi,
>>>
>>> Can we configure 1 node as both Name node and Data node ?
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: About configuring cluster setup

Posted by Chris Embree <ce...@gmail.com>.

It's not a good idea for anything more than Proof of Concept or Sandbox
clusters.


On Tue, May 14, 2013 at 3:10 AM, Leonid Fedotov <lf...@hortonworks.com>wrote:

> No, it is not called "pseudo distributed" mode. It called "as you wish"
> mode...
> It is absolutely normal configuration.
> You can distribute your nodes as you like, no one limit you on it.
> Just make sure you have enough memory on your nodes.
> If you have more questions, feel free to s=ask it directly.
>
>
>
> On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:
>
>> yes you can. Its called as pseudo distributed mode
>>
>>
>> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>>
>>>  Hi,
>>>
>>> Can we configure 1 node as both Name node and Data node ?
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: About configuring cluster setup

Posted by Chris Embree <ce...@gmail.com>.

It's not a good idea for anything more than Proof of Concept or Sandbox
clusters.


On Tue, May 14, 2013 at 3:10 AM, Leonid Fedotov <lf...@hortonworks.com>wrote:

> No, it is not called "pseudo distributed" mode. It called "as you wish"
> mode...
> It is absolutely normal configuration.
> You can distribute your nodes as you like, no one limit you on it.
> Just make sure you have enough memory on your nodes.
> If you have more questions, feel free to s=ask it directly.
>
>
>
> On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:
>
>> yes you can. Its called as pseudo distributed mode
>>
>>
>> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>>
>>>  Hi,
>>>
>>> Can we configure 1 node as both Name node and Data node ?
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: About configuring cluster setup

Posted by Chris Embree <ce...@gmail.com>.

It's not a good idea for anything more than Proof of Concept or Sandbox
clusters.


On Tue, May 14, 2013 at 3:10 AM, Leonid Fedotov <lf...@hortonworks.com>wrote:

> No, it is not called "pseudo distributed" mode. It called "as you wish"
> mode...
> It is absolutely normal configuration.
> You can distribute your nodes as you like, no one limit you on it.
> Just make sure you have enough memory on your nodes.
> If you have more questions, feel free to s=ask it directly.
>
>
>
> On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:
>
>> yes you can. Its called as pseudo distributed mode
>>
>>
>> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>>
>>>  Hi,
>>>
>>> Can we configure 1 node as both Name node and Data node ?
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: About configuring cluster setup

Posted by Leonid Fedotov <lf...@hortonworks.com>.

No, it is not called "pseudo distributed" mode. It called "as you wish"
mode...
It is absolutely normal configuration.
You can distribute your nodes as you like, no one limit you on it.
Just make sure you have enough memory on your nodes.
If you have more questions, feel free to s=ask it directly.

On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:

> yes you can. Its called as pseudo distributed mode
>
>
> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>
>>  Hi,
>>
>> Can we configure 1 node as both Name node and Data node ?
>>
>
>
>
> --
> Nitin Pawar
>

Re: About configuring cluster setup

Posted by Leonid Fedotov <lf...@hortonworks.com>.

No, it is not called "pseudo distributed" mode. It called "as you wish"
mode...
It is absolutely normal configuration.
You can distribute your nodes as you like, no one limit you on it.
Just make sure you have enough memory on your nodes.
If you have more questions, feel free to s=ask it directly.

On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:

> yes you can. Its called as pseudo distributed mode
>
>
> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>
>>  Hi,
>>
>> Can we configure 1 node as both Name node and Data node ?
>>
>
>
>
> --
> Nitin Pawar
>

Re: About configuring cluster setup

Posted by Leonid Fedotov <lf...@hortonworks.com>.

No, it is not called "pseudo distributed" mode. It called "as you wish"
mode...
It is absolutely normal configuration.
You can distribute your nodes as you like, no one limit you on it.
Just make sure you have enough memory on your nodes.
If you have more questions, feel free to s=ask it directly.

On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:

> yes you can. Its called as pseudo distributed mode
>
>
> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>
>>  Hi,
>>
>> Can we configure 1 node as both Name node and Data node ?
>>
>
>
>
> --
> Nitin Pawar
>

Re: About configuring cluster setup

Posted by Leonid Fedotov <lf...@hortonworks.com>.

No, it is not called "pseudo distributed" mode. It called "as you wish"
mode...
It is absolutely normal configuration.
You can distribute your nodes as you like, no one limit you on it.
Just make sure you have enough memory on your nodes.
If you have more questions, feel free to s=ask it directly.

On Tue, May 14, 2013 at 12:03 AM, Nitin Pawar <ni...@gmail.com>wrote:

> yes you can. Its called as pseudo distributed mode
>
>
> On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:
>
>>  Hi,
>>
>> Can we configure 1 node as both Name node and Data node ?
>>
>
>
>
> --
> Nitin Pawar
>

Re: About configuring cluster setup

Posted by Nitin Pawar <ni...@gmail.com>.

yes you can. Its called as pseudo distributed mode


On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>



-- 
Nitin Pawar

RE: About configuring cluster setup

Posted by David Parks <da...@yahoo.com>.

We have a box that's a bit overpowered for just running our namenode and
jobtracker on a 10-node cluster and we also wanted to make use of the
storage and processor resources of that node, like you.

 

What we did is use LXC containers to segregate the different processes. LXC
is a very light weight psudo-virtualization platform for linux (near 0
overhead).

 

The key benefit to LXC, in this case, is that we can use linux cgroups
(standard, simple config in LXC) to specify that the container/VM running
the namenode/jobtracker should have 10x the CPU and IO resources than the
container that runs a tasktracker/data node (though since LXC containers all
run under the same kernel, any "unused" resources are assigned to runnable
processes).

 

We run cloudera hadoop and deployed a slightly modified tasktracker
configuration on the shared box (fewer task slots so as to not over utilize
memory). 

 

That tasktracker doesn't do as much work as the other dedicated nodes, but
it does a fair share, and the cgroup configurations (cpu.shares &
blkio.weight for the curious) ensure that the bulk processing doesn't
interfere with the critical namenode & jobtracker systems.

 

 

From: Robert Dyer [mailto:psybers@gmail.com] 
Sent: Tuesday, May 14, 2013 11:23 PM
To: user@hadoop.apache.org
Subject: Re: About configuring cluster setup

 

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

 

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

Hi,

 

Can we configure 1 node as both Name node and Data node ?

RE: About configuring cluster setup

Posted by David Parks <da...@yahoo.com>.

We have a box that's a bit overpowered for just running our namenode and
jobtracker on a 10-node cluster and we also wanted to make use of the
storage and processor resources of that node, like you.

 

What we did is use LXC containers to segregate the different processes. LXC
is a very light weight psudo-virtualization platform for linux (near 0
overhead).

 

The key benefit to LXC, in this case, is that we can use linux cgroups
(standard, simple config in LXC) to specify that the container/VM running
the namenode/jobtracker should have 10x the CPU and IO resources than the
container that runs a tasktracker/data node (though since LXC containers all
run under the same kernel, any "unused" resources are assigned to runnable
processes).

 

We run cloudera hadoop and deployed a slightly modified tasktracker
configuration on the shared box (fewer task slots so as to not over utilize
memory). 

 

That tasktracker doesn't do as much work as the other dedicated nodes, but
it does a fair share, and the cgroup configurations (cpu.shares &
blkio.weight for the curious) ensure that the bulk processing doesn't
interfere with the critical namenode & jobtracker systems.

 

 

From: Robert Dyer [mailto:psybers@gmail.com] 
Sent: Tuesday, May 14, 2013 11:23 PM
To: user@hadoop.apache.org
Subject: Re: About configuring cluster setup

 

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

 

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

Hi,

 

Can we configure 1 node as both Name node and Data node ?

RE: About configuring cluster setup

Posted by David Parks <da...@yahoo.com>.

We have a box that's a bit overpowered for just running our namenode and
jobtracker on a 10-node cluster and we also wanted to make use of the
storage and processor resources of that node, like you.

 

What we did is use LXC containers to segregate the different processes. LXC
is a very light weight psudo-virtualization platform for linux (near 0
overhead).

 

The key benefit to LXC, in this case, is that we can use linux cgroups
(standard, simple config in LXC) to specify that the container/VM running
the namenode/jobtracker should have 10x the CPU and IO resources than the
container that runs a tasktracker/data node (though since LXC containers all
run under the same kernel, any "unused" resources are assigned to runnable
processes).

 

We run cloudera hadoop and deployed a slightly modified tasktracker
configuration on the shared box (fewer task slots so as to not over utilize
memory). 

 

That tasktracker doesn't do as much work as the other dedicated nodes, but
it does a fair share, and the cgroup configurations (cpu.shares &
blkio.weight for the curious) ensure that the bulk processing doesn't
interfere with the critical namenode & jobtracker systems.

 

 

From: Robert Dyer [mailto:psybers@gmail.com] 
Sent: Tuesday, May 14, 2013 11:23 PM
To: user@hadoop.apache.org
Subject: Re: About configuring cluster setup

 

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

 

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

Hi,

 

Can we configure 1 node as both Name node and Data node ?

RE: About configuring cluster setup

Posted by David Parks <da...@yahoo.com>.

We have a box that's a bit overpowered for just running our namenode and
jobtracker on a 10-node cluster and we also wanted to make use of the
storage and processor resources of that node, like you.

 

What we did is use LXC containers to segregate the different processes. LXC
is a very light weight psudo-virtualization platform for linux (near 0
overhead).

 

The key benefit to LXC, in this case, is that we can use linux cgroups
(standard, simple config in LXC) to specify that the container/VM running
the namenode/jobtracker should have 10x the CPU and IO resources than the
container that runs a tasktracker/data node (though since LXC containers all
run under the same kernel, any "unused" resources are assigned to runnable
processes).

 

We run cloudera hadoop and deployed a slightly modified tasktracker
configuration on the shared box (fewer task slots so as to not over utilize
memory). 

 

That tasktracker doesn't do as much work as the other dedicated nodes, but
it does a fair share, and the cgroup configurations (cpu.shares &
blkio.weight for the curious) ensure that the bulk processing doesn't
interfere with the critical namenode & jobtracker systems.

 

 

From: Robert Dyer [mailto:psybers@gmail.com] 
Sent: Tuesday, May 14, 2013 11:23 PM
To: user@hadoop.apache.org
Subject: Re: About configuring cluster setup

 

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

 

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

Hi,

 

Can we configure 1 node as both Name node and Data node ?

Re: About configuring cluster setup

Posted by Robert Dyer <ps...@gmail.com>.

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>

Re: About configuring cluster setup

Posted by Robert Dyer <ps...@gmail.com>.

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>

Re: About configuring cluster setup

Posted by Robert Dyer <ps...@gmail.com>.

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>

Re: About configuring cluster setup

Posted by Nitin Pawar <ni...@gmail.com>.

yes you can. Its called as pseudo distributed mode


On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>



-- 
Nitin Pawar

Re: About configuring cluster setup

Posted by Nitin Pawar <ni...@gmail.com>.

yes you can. Its called as pseudo distributed mode


On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>



-- 
Nitin Pawar

Re: About configuring cluster setup

Posted by Robert Dyer <ps...@gmail.com>.

You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

On Tue, May 14, 2013 at 2:01 AM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>

Re: About configuring cluster setup

Posted by Nitin Pawar <ni...@gmail.com>.

yes you can. Its called as pseudo distributed mode


On Tue, May 14, 2013 at 12:31 PM, Ramya S <ra...@suntecgroup.com> wrote:

>  Hi,
>
> Can we configure 1 node as both Name node and Data node ?
>



-- 
Nitin Pawar