You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sai Sai <sa...@yahoo.in> on 2013/04/12 10:30:32 UTC

Re: Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine

A few basic questions:

Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine.


For NameNode, DataNode and others there is a process associated with each of em.
But no process is for HDFS, wondering why? I understand that fsImage has the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT needs to get file info will they just look into the fsImage.

When we put a file in HDFS is it possible to look/find in which node (NN/DN) it physically sits.

Any help is appreciated.
Thanks
Sai

Re: Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine

Posted by Nitin Pawar <ni...@gmail.com>.
HDFS - hadoop distributed file system
as it stands a file system .. first basic question you will need to search
is do you need a process to run a file system?
when you find an answer to that second question will be
will a single process be enough for a distributed system ? meaning sub
components of the system may exist on different machines

namenode and datanode combined make hdfs. combining all  of their processes
you make hdfs.

namenode is master for the hdfs which keeps the file system image in memory
when it starts it loads it up in memory and serves all requests from memory
there on. There are steps taken to save the FSImage to disk. You can read
about it in detail in hdfs architecture.

when you put a file in hdfs .. it may or may not go to a single machine.
Namenode never stores the data files. it just stores the metadata for the
hdfs.
so when you load a file it will be going to datanode and the file
information will be going to namenode. depending on the size it will be
split in multiple blocks and then multiple blocks may land on multiple
datanodes. If your filesize is less than or exactly equal to block size you
can find out which datanode it is located. else there is no guarantee that
file will be only on single node only if you have fully distributed mode

PS: this is my understanding. Others may correct me as well


On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <sa...@yahoo.in> wrote:

> A few basic questions:
>
> Will HDFS refer to the memory of NameNode & DataNode or is it a separate
> machine.
>
> For NameNode, DataNode and others there is a process associated with each
> of em.
> But no process is for HDFS, wondering why? I understand that fsImage has
> the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT
> needs to get file info will they just look into the fsImage.
>
> When we put a file in HDFS is it possible to look/find in which node
> (NN/DN) it physically sits.
>
> Any help is appreciated.
> Thanks
> Sai
>



-- 
Nitin Pawar

Re: Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine

Posted by Nitin Pawar <ni...@gmail.com>.
HDFS - hadoop distributed file system
as it stands a file system .. first basic question you will need to search
is do you need a process to run a file system?
when you find an answer to that second question will be
will a single process be enough for a distributed system ? meaning sub
components of the system may exist on different machines

namenode and datanode combined make hdfs. combining all  of their processes
you make hdfs.

namenode is master for the hdfs which keeps the file system image in memory
when it starts it loads it up in memory and serves all requests from memory
there on. There are steps taken to save the FSImage to disk. You can read
about it in detail in hdfs architecture.

when you put a file in hdfs .. it may or may not go to a single machine.
Namenode never stores the data files. it just stores the metadata for the
hdfs.
so when you load a file it will be going to datanode and the file
information will be going to namenode. depending on the size it will be
split in multiple blocks and then multiple blocks may land on multiple
datanodes. If your filesize is less than or exactly equal to block size you
can find out which datanode it is located. else there is no guarantee that
file will be only on single node only if you have fully distributed mode

PS: this is my understanding. Others may correct me as well


On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <sa...@yahoo.in> wrote:

> A few basic questions:
>
> Will HDFS refer to the memory of NameNode & DataNode or is it a separate
> machine.
>
> For NameNode, DataNode and others there is a process associated with each
> of em.
> But no process is for HDFS, wondering why? I understand that fsImage has
> the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT
> needs to get file info will they just look into the fsImage.
>
> When we put a file in HDFS is it possible to look/find in which node
> (NN/DN) it physically sits.
>
> Any help is appreciated.
> Thanks
> Sai
>



-- 
Nitin Pawar

Re: Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine

Posted by Nitin Pawar <ni...@gmail.com>.
HDFS - hadoop distributed file system
as it stands a file system .. first basic question you will need to search
is do you need a process to run a file system?
when you find an answer to that second question will be
will a single process be enough for a distributed system ? meaning sub
components of the system may exist on different machines

namenode and datanode combined make hdfs. combining all  of their processes
you make hdfs.

namenode is master for the hdfs which keeps the file system image in memory
when it starts it loads it up in memory and serves all requests from memory
there on. There are steps taken to save the FSImage to disk. You can read
about it in detail in hdfs architecture.

when you put a file in hdfs .. it may or may not go to a single machine.
Namenode never stores the data files. it just stores the metadata for the
hdfs.
so when you load a file it will be going to datanode and the file
information will be going to namenode. depending on the size it will be
split in multiple blocks and then multiple blocks may land on multiple
datanodes. If your filesize is less than or exactly equal to block size you
can find out which datanode it is located. else there is no guarantee that
file will be only on single node only if you have fully distributed mode

PS: this is my understanding. Others may correct me as well


On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <sa...@yahoo.in> wrote:

> A few basic questions:
>
> Will HDFS refer to the memory of NameNode & DataNode or is it a separate
> machine.
>
> For NameNode, DataNode and others there is a process associated with each
> of em.
> But no process is for HDFS, wondering why? I understand that fsImage has
> the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT
> needs to get file info will they just look into the fsImage.
>
> When we put a file in HDFS is it possible to look/find in which node
> (NN/DN) it physically sits.
>
> Any help is appreciated.
> Thanks
> Sai
>



-- 
Nitin Pawar

Re: Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine

Posted by Nitin Pawar <ni...@gmail.com>.
HDFS - hadoop distributed file system
as it stands a file system .. first basic question you will need to search
is do you need a process to run a file system?
when you find an answer to that second question will be
will a single process be enough for a distributed system ? meaning sub
components of the system may exist on different machines

namenode and datanode combined make hdfs. combining all  of their processes
you make hdfs.

namenode is master for the hdfs which keeps the file system image in memory
when it starts it loads it up in memory and serves all requests from memory
there on. There are steps taken to save the FSImage to disk. You can read
about it in detail in hdfs architecture.

when you put a file in hdfs .. it may or may not go to a single machine.
Namenode never stores the data files. it just stores the metadata for the
hdfs.
so when you load a file it will be going to datanode and the file
information will be going to namenode. depending on the size it will be
split in multiple blocks and then multiple blocks may land on multiple
datanodes. If your filesize is less than or exactly equal to block size you
can find out which datanode it is located. else there is no guarantee that
file will be only on single node only if you have fully distributed mode

PS: this is my understanding. Others may correct me as well


On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <sa...@yahoo.in> wrote:

> A few basic questions:
>
> Will HDFS refer to the memory of NameNode & DataNode or is it a separate
> machine.
>
> For NameNode, DataNode and others there is a process associated with each
> of em.
> But no process is for HDFS, wondering why? I understand that fsImage has
> the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT
> needs to get file info will they just look into the fsImage.
>
> When we put a file in HDFS is it possible to look/find in which node
> (NN/DN) it physically sits.
>
> Any help is appreciated.
> Thanks
> Sai
>



-- 
Nitin Pawar