You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Chao Huang <ch...@gmail.com> on 2012/05/24 13:30:06 UTC

New bie questions re: namenode and masters

Hello experts,

I'm new to hdfs/hadoop.  After reading the hdfs documents, I'm getting
confused by the differences between a namenode and a master server.  It's
my understanding that the namenode is responsible for managing metadata,
while the master-replica group (which is comprised by a number of
datanodes) stores the actual data blocks.  In the master-replica group, the
master server accepts read/write requests, and load balances (or routes)
read requests to the appropriate replica. In other words, we should
configure the namenode and master server on two different physical machines
in a production environment, right?  Is this a correct assumption?

One other question about HDFS cluster setup:

- requirements:  one namenode, replication factor = 3, in a production
environment.

how would the topology look like?  Can I configure as follows?


in conf/core-site.xml:
    fs.default.name = hdfs://machineAAA:54321/

in conf/masters:
    machineBBB

in conf/slaves:
    machineCCC
    machineDDD


Can someone please confirm and/or comment?

Sorry for my new bie questions. Thanks for the help.

Chao

Re: New bie questions re: namenode and masters

Posted by Chao Huang <ch...@gmail.com>.

Hi Ravi,

Thanks for your reply. It's very helpful.

No, I was talking about this:
http://hadoop.apache.org/common/docs/r1.0.3/hdfs_design.html (hadoop
version:  r1.0.3). It says:

HDFS has a master/slave architecture. An HDFS cluster consists of a single
NameNode, a master server that manages the file system namespace and
regulates access to files by clients. In addition, there are a number of
DataNodes, usually one per node in the cluster, which manage storage
attached to the nodes that they run on.

This part confuses me, and made me think that the NameNode and the master
server should run on two exclusive nodes. Now according to your
explanation, I think masters mean the NameNodes, while slaves mean the
DataNodes. I hope this is correct now.  :)

Best regards,
Chao


On Thu, May 24, 2012 at 7:58 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Chao,
>
> What documentation are you reading? This is pretty accurate :
> http://hadoop.apache.org/common/docs/r0.20.203.0/hdfs_design.html
>
> The NameNode is indeed responsible for the metadata. And all the datanodes
> report to the NameNode (so they are all slaves). You are right, the data
> blocks are stored on the DataNodes. Perhaps I am lacking knowledge of the
> history, but as of now there's no "master server". All read write requests
> on files are directed at the Namenode from where they get redirected to the
> appropriate DataNode holding the block.
>
> So your configuration for replication factor 3 would look like:
>
> in conf/core-site.xml:
>     fs.default.name = hdfs://machineAAA:54321/
>
> in conf/slaves:
>     machineBBB
>     machineCCC
>     machineDDD
>     machineEEE
>     ....possibly a lot more
>
>
>
>
> Hope this helps
> Ravi
>
>
> On Thu, May 24, 2012 at 6:30 AM, Chao Huang <ch...@gmail.com> wrote:
>
>> Hello experts,
>>
>> I'm new to hdfs/hadoop.  After reading the hdfs documents, I'm getting
>> confused by the differences between a namenode and a master server.  It's
>> my understanding that the namenode is responsible for managing metadata,
>> while the master-replica group (which is comprised by a number of
>> datanodes) stores the actual data blocks.  In the master-replica group, the
>> master server accepts read/write requests, and load balances (or routes)
>> read requests to the appropriate replica. In other words, we should
>> configure the namenode and master server on two different physical machines
>> in a production environment, right?  Is this a correct assumption?
>>
>> One other question about HDFS cluster setup:
>>
>> - requirements:  one namenode, replication factor = 3, in a production
>> environment.
>>
>> how would the topology look like?  Can I configure as follows?
>>
>>
>> in conf/core-site.xml:
>>     fs.default.name = hdfs://machineAAA:54321/
>>
>> in conf/masters:
>>     machineBBB
>>
>> in conf/slaves:
>>     machineCCC
>>     machineDDD
>>
>>
>> Can someone please confirm and/or comment?
>>
>> Sorry for my new bie questions. Thanks for the help.
>>
>> Chao
>>
>
>

Re: New bie questions re: namenode and masters

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Chao,

What documentation are you reading? This is pretty accurate :
http://hadoop.apache.org/common/docs/r0.20.203.0/hdfs_design.html

The NameNode is indeed responsible for the metadata. And all the datanodes
report to the NameNode (so they are all slaves). You are right, the data
blocks are stored on the DataNodes. Perhaps I am lacking knowledge of the
history, but as of now there's no "master server". All read write requests
on files are directed at the Namenode from where they get redirected to the
appropriate DataNode holding the block.

So your configuration for replication factor 3 would look like:

in conf/core-site.xml:
    fs.default.name = hdfs://machineAAA:54321/

in conf/slaves:
    machineBBB
    machineCCC
    machineDDD
    machineEEE
    ....possibly a lot more

Hope this helps
Ravi

On Thu, May 24, 2012 at 6:30 AM, Chao Huang <ch...@gmail.com> wrote:

> Hello experts,
>
> I'm new to hdfs/hadoop.  After reading the hdfs documents, I'm getting
> confused by the differences between a namenode and a master server.  It's
> my understanding that the namenode is responsible for managing metadata,
> while the master-replica group (which is comprised by a number of
> datanodes) stores the actual data blocks.  In the master-replica group, the
> master server accepts read/write requests, and load balances (or routes)
> read requests to the appropriate replica. In other words, we should
> configure the namenode and master server on two different physical machines
> in a production environment, right?  Is this a correct assumption?
>
> One other question about HDFS cluster setup:
>
> - requirements:  one namenode, replication factor = 3, in a production
> environment.
>
> how would the topology look like?  Can I configure as follows?
>
>
> in conf/core-site.xml:
>     fs.default.name = hdfs://machineAAA:54321/
>
> in conf/masters:
>     machineBBB
>
> in conf/slaves:
>     machineCCC
>     machineDDD
>
>
> Can someone please confirm and/or comment?
>
> Sorry for my new bie questions. Thanks for the help.
>
> Chao
>