You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Giovanni Marzulli <gi...@ba.infn.it> on 2012/03/14 17:24:56 UTC

Questions about HDFS’s placement policy

Hello,

I'm trying HDFS on a small test cluster and I need to clarify some 
doubts about hadoop behaviour.

Some details of my cluster:
Hadoop version: 0.20.2
I have two racks (rack1, rack2). Three datanodes for every rack.
Replication factor is set to 3.

"HDFS’s placement policy is to put one replica on one node in the local 
rack, another on a node in a different (remote) rack, and the last on a 
different node in the same remote rack."
Instead, I noticed that sometimes, a few blocks of files are stored as 
follows: two replicas in the local rack and a replica in a different 
rack. Are there exceptions that cause different behaviour than default 
placement policy?
Likewise, at times some blocks are read from nodes in the remote rack 
instead of nodes in the local rack. Why does it happen?

Another thing:if I have two datacenters and two racks for each of them 
(so a hierarchical network topology), where tworemote replicas 
arestored? Does Hadoop consider the hierarchy and stores one replica in 
the local datacenter and two replicas in the other datacenter? Or the 
two replicas are stored in a totally random rack?

Thanks
Gianni