You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Lukas Kairies <lu...@googlemail.com> on 2013/07/26 13:44:53 UTC

HDFS block placement

Hey,

I am a bit confused about the block placement in Hadoop. Assume that 
there is no replication and a task (map or reduce) writes a file to 
HDFS, will be all blocks stored on the same local node (the node on 
which the task runs)? I think yes but I am node sure.

Kind Regards,
Lukas Kairies

RE: HDFS block placement

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Lukas

That is my understanding as the default strategy is to  avoid a network
transfer and place the first replica on the same server that executed the
hdfs client code (i.e. in your case the map or reduce task).  If writing to
the 'local' node is not possible, then I believe a random node will be
chosen. 
If you want to learn more about this, I suggest to look at the policies:

hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyDefault.java
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyWithNodeGroup.java

Also,  there is now a way to create your own policy via
dfs.block.replicator.classname.  I'm not familiar with this, but you can
read about it in https://issues.apache.org/jira/browse/HDFS-385

-----Original Message-----
From: Lukas Kairies [mailto:lukas.xtreemfs@googlemail.com] 
Sent: Friday, July 26, 2013 6:45 AM
To: user@hadoop.apache.org
Subject: HDFS block placement

Hey,

I am a bit confused about the block placement in Hadoop. Assume that there
is no replication and a task (map or reduce) writes a file to HDFS, will be
all blocks stored on the same local node (the node on which the task runs)?
I think yes but I am node sure.

Kind Regards,
Lukas Kairies

Re: HDFS block placement

Posted by Harsh J <ha...@cloudera.com>.

Your thought is correct. If space is available locally, then it is
automatically stored locally.

On Fri, Jul 26, 2013 at 5:14 PM, Lukas Kairies
<lu...@googlemail.com> wrote:
> Hey,
>
> I am a bit confused about the block placement in Hadoop. Assume that there
> is no replication and a task (map or reduce) writes a file to HDFS, will be
> all blocks stored on the same local node (the node on which the task runs)?
> I think yes but I am node sure.
>
> Kind Regards,
> Lukas Kairies



-- 
Harsh J

RE: HDFS block placement

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Lukas

That is my understanding as the default strategy is to  avoid a network
transfer and place the first replica on the same server that executed the
hdfs client code (i.e. in your case the map or reduce task).  If writing to
the 'local' node is not possible, then I believe a random node will be
chosen. 
If you want to learn more about this, I suggest to look at the policies:

hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyDefault.java
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyWithNodeGroup.java

Also,  there is now a way to create your own policy via
dfs.block.replicator.classname.  I'm not familiar with this, but you can
read about it in https://issues.apache.org/jira/browse/HDFS-385

-----Original Message-----
From: Lukas Kairies [mailto:lukas.xtreemfs@googlemail.com] 
Sent: Friday, July 26, 2013 6:45 AM
To: user@hadoop.apache.org
Subject: HDFS block placement

Hey,

I am a bit confused about the block placement in Hadoop. Assume that there
is no replication and a task (map or reduce) writes a file to HDFS, will be
all blocks stored on the same local node (the node on which the task runs)?
I think yes but I am node sure.

Kind Regards,
Lukas Kairies

RE: HDFS block placement

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Lukas

That is my understanding as the default strategy is to  avoid a network
transfer and place the first replica on the same server that executed the
hdfs client code (i.e. in your case the map or reduce task).  If writing to
the 'local' node is not possible, then I believe a random node will be
chosen. 
If you want to learn more about this, I suggest to look at the policies:

hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyDefault.java
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyWithNodeGroup.java

Also,  there is now a way to create your own policy via
dfs.block.replicator.classname.  I'm not familiar with this, but you can
read about it in https://issues.apache.org/jira/browse/HDFS-385

-----Original Message-----
From: Lukas Kairies [mailto:lukas.xtreemfs@googlemail.com] 
Sent: Friday, July 26, 2013 6:45 AM
To: user@hadoop.apache.org
Subject: HDFS block placement

Hey,

I am a bit confused about the block placement in Hadoop. Assume that there
is no replication and a task (map or reduce) writes a file to HDFS, will be
all blocks stored on the same local node (the node on which the task runs)?
I think yes but I am node sure.

Kind Regards,
Lukas Kairies

Re: HDFS block placement

Posted by Harsh J <ha...@cloudera.com>.

Your thought is correct. If space is available locally, then it is
automatically stored locally.

On Fri, Jul 26, 2013 at 5:14 PM, Lukas Kairies
<lu...@googlemail.com> wrote:
> Hey,
>
> I am a bit confused about the block placement in Hadoop. Assume that there
> is no replication and a task (map or reduce) writes a file to HDFS, will be
> all blocks stored on the same local node (the node on which the task runs)?
> I think yes but I am node sure.
>
> Kind Regards,
> Lukas Kairies



-- 
Harsh J

RE: HDFS block placement

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Lukas

That is my understanding as the default strategy is to  avoid a network
transfer and place the first replica on the same server that executed the
hdfs client code (i.e. in your case the map or reduce task).  If writing to
the 'local' node is not possible, then I believe a random node will be
chosen. 
If you want to learn more about this, I suggest to look at the policies:

hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyDefault.java
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Bloc
kPlacementPolicyWithNodeGroup.java

Also,  there is now a way to create your own policy via
dfs.block.replicator.classname.  I'm not familiar with this, but you can
read about it in https://issues.apache.org/jira/browse/HDFS-385

-----Original Message-----
From: Lukas Kairies [mailto:lukas.xtreemfs@googlemail.com] 
Sent: Friday, July 26, 2013 6:45 AM
To: user@hadoop.apache.org
Subject: HDFS block placement

Hey,

I am a bit confused about the block placement in Hadoop. Assume that there
is no replication and a task (map or reduce) writes a file to HDFS, will be
all blocks stored on the same local node (the node on which the task runs)?
I think yes but I am node sure.

Kind Regards,
Lukas Kairies

Re: HDFS block placement

Posted by Harsh J <ha...@cloudera.com>.

Your thought is correct. If space is available locally, then it is
automatically stored locally.

On Fri, Jul 26, 2013 at 5:14 PM, Lukas Kairies
<lu...@googlemail.com> wrote:
> Hey,
>
> I am a bit confused about the block placement in Hadoop. Assume that there
> is no replication and a task (map or reduce) writes a file to HDFS, will be
> all blocks stored on the same local node (the node on which the task runs)?
> I think yes but I am node sure.
>
> Kind Regards,
> Lukas Kairies



-- 
Harsh J

Re: HDFS block placement

Posted by Harsh J <ha...@cloudera.com>.

Your thought is correct. If space is available locally, then it is
automatically stored locally.

On Fri, Jul 26, 2013 at 5:14 PM, Lukas Kairies
<lu...@googlemail.com> wrote:
> Hey,
>
> I am a bit confused about the block placement in Hadoop. Assume that there
> is no replication and a task (map or reduce) writes a file to HDFS, will be
> all blocks stored on the same local node (the node on which the task runs)?
> I think yes but I am node sure.
>
> Kind Regards,
> Lukas Kairies



-- 
Harsh J