You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Hilmi Egemen Ciritoğlu <hi...@gmail.com> on 2017/08/02 11:31:36 UTC

Replication Factor Details

Hi guys,

I spend my time to read too much about setting replication factor as well
as block placement so far. But I still wonder how setrep command is working
behind in the code.

I am looking for answer to following questions:

What if you have one rack and increase and decrease replication factor, is
it block distribution will be randomised or based on disk usage etc.
(except or after rack-awareness issue) ?

And what if I have 5 rack and replication factor 4 ? I am looking for
corner case to understand completely.

I would be really appreciated if you can answer my question and explain
code side bit more too.

Regards,
Egemen

Re: Replication Factor Details

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Hilmi!

The topology script / DNSToSwitchMapping tell the NameNode about the
topology of the cluster :
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html

You can trace through
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L805
to find out how re-replications are ordered. (If you start the Namenode
with environment variable "export HADOOP_NAMENODE_OPTS='-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1049' " set, you
can connect a debugger to it.

You might want to set a breakpoint in
BlockManager.updateNeededReconstructions() (
https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4148)
and
BlockManager.computeDatanodeWork() (
https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4508
)

I suspect most of what you are looking for is here
BlockPlacementPolicyDefault.chooseTarget() (
https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L134
)

Also, please be aware that the code has changed a lot over different
versions thanks to incredible contributions from the community. If you're
trying to debug something, please make sure to find the right links in the
right branch.

HTH
Ravi

On Wed, Aug 2, 2017 at 4:31 AM, Hilmi Egemen Ciritoğlu <
hilmi.egemen.ciritoglu@gmail.com> wrote:

> Hi guys,
>
> I spend my time to read too much about setting replication factor as well
> as block placement so far. But I still wonder how setrep command is working
> behind in the code.
>
> I am looking for answer to following questions:
>
> What if you have one rack and increase and decrease replication factor, is
> it block distribution will be randomised or based on disk usage etc.
> (except or after rack-awareness issue) ?
>
> And what if I have 5 rack and replication factor 4 ? I am looking for
> corner case to understand completely.
>
> I would be really appreciated if you can answer my question and explain
> code side bit more too.
>
> Regards,
> Egemen
>
>