You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by kang_min82 <ka...@yahoo.com> on 2009/02/02 00:09:00 UTC

How can HDFS spread the data across the data nodes ?

Hi everyone, 

I'm complete new to HDFS. Hope you guys can take a litte time to answer my
question :).

I have total 3 nodes in my cluster, one reserved for master (Namenode and
JobTracker) and the two other nodes for slaves (Datanode).

I tried to "copy" a file to HDFS with the following command:

kang@vn:~/v-0.18.0$ hadoop-0.18.0/bin/hadoop fs -put test_file /

If I start the command on master, HDFS spreads my file across all the name
nodes. That should be fine ! But when I start the command on anydata node,
HDFS doesn't spread the file, which means, the whole file is only written to
this data node. Is it a bug ?

My question is, how can HDFS manage something like that and which java class
is involved ? 

I read the script bin/hadoop and know that the class FsShell.java and the
method copyFromLocal are involved. But I don't see and know how master
manages and decides, on which data nodes can a file be written ?

Any help is appreciated, thanks so much.

Kang

-- 
View this message in context: http://www.nabble.com/How-can-HDFS-spread-the-data-across-the-data-nodes---tp21781703p21781703.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: How can HDFS spread the data across the data nodes ?

Posted by Tim Wintle <ti...@teamrubber.com>.

I believe the standard advice is to write to the cluster from a computer
that is not running a hadoop daemon itself. Otherwise the data is
written locally (if you only have a replication of 1) to avoid
congestion on the network.

Tim


On Sun, 2009-02-01 at 15:09 -0800, kang_min82 wrote:
> Hi everyone, 
> 
> I'm complete new to HDFS. Hope you guys can take a litte time to answer my
> question :).
> 
> I have total 3 nodes in my cluster, one reserved for master (Namenode and
> JobTracker) and the two other nodes for slaves (Datanode).
> 
> I tried to "copy" a file to HDFS with the following command:
> 
> kang@vn:~/v-0.18.0$ hadoop-0.18.0/bin/hadoop fs -put test_file /
> 
> If I start the command on master, HDFS spreads my file across all the name
> nodes. That should be fine ! But when I start the command on anydata node,
> HDFS doesn't spread the file, which means, the whole file is only written to
> this data node. Is it a bug ?
> 
> My question is, how can HDFS manage something like that and which java class
> is involved ? 
> 
> I read the script bin/hadoop and know that the class FsShell.java and the
> method copyFromLocal are involved. But I don't see and know how master
> manages and decides, on which data nodes can a file be written ?
> 
> Any help is appreciated, thanks so much.
> 
> Kang
>

Re: How can HDFS spread the data across the data nodes ?

Posted by jason hadoop <ja...@gmail.com>.

If the write is taking place on a datanode, by design, 1 replica will be
written to that datanode.
The other replicas will be written to different nodes.

When you write on the namenode, it generally is not a datanode, and hadoop
will pseudo randomly allocate the replica blocks  across all of your
datanodes.

On Sun, Feb 1, 2009 at 3:09 PM, kang_min82 <ka...@yahoo.com> wrote:

>
> Hi everyone,
>
> I'm complete new to HDFS. Hope you guys can take a litte time to answer my
> question :).
>
> I have total 3 nodes in my cluster, one reserved for master (Namenode and
> JobTracker) and the two other nodes for slaves (Datanode).
>
> I tried to "copy" a file to HDFS with the following command:
>
> kang@vn:~/v-0.18.0$ hadoop-0.18.0/bin/hadoop fs -put test_file /
>
> If I start the command on master, HDFS spreads my file across all the name
> nodes. That should be fine ! But when I start the command on anydata node,
> HDFS doesn't spread the file, which means, the whole file is only written
> to
> this data node. Is it a bug ?
>
> My question is, how can HDFS manage something like that and which java
> class
> is involved ?
>
> I read the script bin/hadoop and know that the class FsShell.java and the
> method copyFromLocal are involved. But I don't see and know how master
> manages and decides, on which data nodes can a file be written ?
>
> Any help is appreciated, thanks so much.
>
> Kang
>
> --
> View this message in context:
> http://www.nabble.com/How-can-HDFS-spread-the-data-across-the-data-nodes---tp21781703p21781703.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>