You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Ruhua Jiang <ru...@gmail.com> on 2016/05/19 14:29:56 UTC

HDFS Block placement policy

Hi all,

I have a question related to HDFS Block placement policy.  The default,
"The default block placement policy is as follows: Place the first replica
somewhere – either a random node (if the HDFS client is outside the
Hadoop/DataNode cluster) or on the local node (if the HDFS client is
running on a node inside the cluster). Place the second replica in a
different rack"

Let's consider the situation that data are in *1 datanode local disk*, a *hdfs
-put* command is used (which means HDFS client is on a datanode) to ingest
this data into HDFS.
- What will happen (in terms of block placement) if this datanode local
disk is full?
- Is there a list of available alternative block placement policy
implemented, and hdfs -put can use it just by change the hdfs-site.xml
 config?  I notice this https://issues.apache.org/jira/browse/HDFS-385 JIRA
ticket but it seems not what we want.
- I understand place first block on local machine can improve the
perfermance, and  we can use HDFS balancer to solve the imblance problem
afterwards. However, I just want to explore alternative solutions to avoid
this problem at beginning.


Thanks

Ruhua Jiang

Re: HDFS Block placement policy

Posted by Gurmukh Singh <gu...@yahoo.com.INVALID>.

the best practice is to have an Edge/Gateway node, so the there is no 
local copy of data. It is also good from a security perspective.

I think my this video can help you understand this better: 
https://www.youtube.com/watch?v=t20niJDO1f4

Regards
Gurmukh

On 20/05/16 12:29 AM, Ruhua Jiang wrote:
> Hi all,
>
> I have a question related to HDFS Block placement policy. The default,
> "The default block placement policy is as follows: Place the first 
> replica somewhere \u2013 either a random node (if the HDFS client is 
> outside the Hadoop/DataNode cluster) or on the local node (if the HDFS 
> client is running on a node inside the cluster). Place the second 
> replica in a different rack"
>
> Let's consider the situation that data are in *1 datanode local disk*, 
> a *hdfs -put* command is used (which means HDFS client is on a 
> datanode) to ingest this data into HDFS.
> - What will happen (in terms of block placement) if this datanode 
> local disk is full?
> - Is there a list of available alternative block placement policy 
> implemented, and hdfs -put can use it just by change the hdfs-site.xml 
>  config?  I notice this https://issues.apache.org/jira/browse/HDFS-385 
> JIRA ticket but it seems not what we want.
> - I understand place first block on local machine can improve the 
> perfermance, and  we can use HDFS balancer to solve the imblance 
> problem afterwards. However, I just want to explore alternative 
> solutions to avoid this problem at beginning.
>
>
> Thanks
>
> Ruhua Jiang

-- 
--
Thanks and Regards

Gurmukh Singh