You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ruhua Jiang <ru...@gmail.com> on 2016/05/19 14:29:56 UTC
HDFS Block placement policy
Hi all,
I have a question related to HDFS Block placement policy. The default,
"The default block placement policy is as follows: Place the first replica
somewhere – either a random node (if the HDFS client is outside the
Hadoop/DataNode cluster) or on the local node (if the HDFS client is
running on a node inside the cluster). Place the second replica in a
different rack"
Let's consider the situation that data are in *1 datanode local disk*, a *hdfs
-put* command is used (which means HDFS client is on a datanode) to ingest
this data into HDFS.
- What will happen (in terms of block placement) if this datanode local
disk is full?
- Is there a list of available alternative block placement policy
implemented, and hdfs -put can use it just by change the hdfs-site.xml
config? I notice this https://issues.apache.org/jira/browse/HDFS-385 JIRA
ticket but it seems not what we want.
- I understand place first block on local machine can improve the
perfermance, and we can use HDFS balancer to solve the imblance problem
afterwards. However, I just want to explore alternative solutions to avoid
this problem at beginning.
Thanks
Ruhua Jiang
Re: HDFS Block placement policy
Posted by Gurmukh Singh <gu...@yahoo.com.INVALID>.
the best practice is to have an Edge/Gateway node, so the there is no
local copy of data. It is also good from a security perspective.
I think my this video can help you understand this better:
https://www.youtube.com/watch?v=t20niJDO1f4
Regards
Gurmukh
On 20/05/16 12:29 AM, Ruhua Jiang wrote:
> Hi all,
>
> I have a question related to HDFS Block placement policy. The default,
> "The default block placement policy is as follows: Place the first
> replica somewhere \u2013 either a random node (if the HDFS client is
> outside the Hadoop/DataNode cluster) or on the local node (if the HDFS
> client is running on a node inside the cluster). Place the second
> replica in a different rack"
>
> Let's consider the situation that data are in *1 datanode local disk*,
> a *hdfs -put* command is used (which means HDFS client is on a
> datanode) to ingest this data into HDFS.
> - What will happen (in terms of block placement) if this datanode
> local disk is full?
> - Is there a list of available alternative block placement policy
> implemented, and hdfs -put can use it just by change the hdfs-site.xml
> config? I notice this https://issues.apache.org/jira/browse/HDFS-385
> JIRA ticket but it seems not what we want.
> - I understand place first block on local machine can improve the
> perfermance, and we can use HDFS balancer to solve the imblance
> problem afterwards. However, I just want to explore alternative
> solutions to avoid this problem at beginning.
>
>
> Thanks
>
> Ruhua Jiang
--
--
Thanks and Regards
Gurmukh Singh