You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ambari.apache.org by rammohan ganapavarapu <ra...@gmail.com> on 2016/06/09 23:06:29 UTC

Hadoop: dfs.namenode.name.dir and dfs.datanode.data.dir

Hi,

I am trying to understand these two properties if i use multiple
disks/mount points,

For example i have a server with 3 100gb disk mounted on
/data1,/data2,/data3 and if i use them for both data.dir and name.dir do i
get total ~300gb disk space for the data or i only get 100gb and other two
disks are for redundant purpose only?

This is the description i got from hadoop docs:
dfs.namenode.name.dir:

Determines where on the local filesystem the DFS name node should store the
name table(fsimage). If this is a comma-delimited list of directories then
the name table is replicated in all of the directories, for redundancy.

dfs.datanode.data.dir:

Determines where on the local filesystem an DFS data node should store its
blocks. If this is a comma-delimited list of directories, then data will be
stored in all named directories, typically on different devices. The
directories should be tagged with corresponding storage types
([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default
storage type will be DISK if the directory does not have a storage type
tagged explicitly. Directories that do not exist will be created if local
filesystem permission allows.

From the above description i understand only namenode table will get
replicated in 3 disks but not sure how it works if i have multiple disks
for data dir.

I wanted to use all available disk (3:300gb) in a server for data, so can i
just use comma seperated dir list or should i do raid or lvm to combine
those disks?

Thanks,
Ram

Re: Hadoop: dfs.namenode.name.dir and dfs.datanode.data.dir

Posted by Andrew Stadtler <an...@phdata.io>.

This should probably go to user@hadoop.apache.org mailing list since it's an HDFS specific question and not Ambari related.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

That being said they are two different setting the namenode data dir just stores metadata about the filesystem. The datanode data dir stores actual hdfs blocks if you have three 100gb directories you will have 300gb of DFS space but by default all blocks are replicated 3 times. You don't want to use LVM or RAID just raw disk. 

> On Jun 9, 2016, at 6:06 PM, rammohan ganapavarapu <ra...@gmail.com> wrote:
> 
> Hi,
> 
> I am trying to understand these two properties if i use multiple disks/mount points, 
> 
> For example i have a server with 3 100gb disk mounted on /data1,/data2,/data3 and if i use them for both data.dir and name.dir do i get total ~300gb disk space for the data or i only get 100gb and other two disks are for redundant purpose only?
> 
> This is the description i got from hadoop docs:
> dfs.namenode.name.dir:
> 
> Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
> 
> dfs.datanode.data.dir:
> 
> Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.
> 
> From the above description i understand only namenode table will get replicated in 3 disks but not sure how it works if i have multiple disks for data dir.
> 
> I wanted to use all available disk (3:300gb) in a server for data, so can i just use comma seperated dir list or should i do raid or lvm to combine those disks?
> 
> Thanks,
> Ram
> 
> 
>