You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Amit Gupta <gu...@gmail.com> on 2011/04/24 01:30:09 UTC

Queries about HDFS block migration

Hi,

Ive been trying to understand how the Balancer class works. I know
the basics about the threshold guidelines and the general motivations
of running the balancer.

I had a few questions about block migrations:

1. I noticed there is a Source and Target, but there is also a ProxySource.
I havent been able to grasp the conceptual difference between a source and a
proxy source

2. I'm also having trouble understanding the actual mechanics of a block
being relocated from
one datanode to the other. I understand that the communication with the
namenode occurs to change
metadata of the block being relocated accordingly, but have not understood
what does the actual copy
of the block itself.

I'd appreciate it if some developer can shed some light on this for me and
point me in the right direction
(like which classes/functions I should be looking at to understand this)

Thanks,
-- 
Amit Gupta

Re: Queries about HDFS block migration

Posted by Harsh J <ha...@cloudera.com>.
Hello Amit,

On Sun, Apr 24, 2011 at 5:00 AM, Amit Gupta <gu...@gmail.com> wrote:
> 1. I noticed there is a Source and Target, but there is also a ProxySource.
> I havent been able to grasp the conceptual difference between a source and a
> proxy source

The 'proxy' source is a DN that has a replica of a given block. This
is used in order to even out the load across the DNs in the network.
It would not be a good idea to put too much tx load onto the breaching
DN, when you can copy off a replica.

> 2. I'm also having trouble understanding the actual mechanics of a block
> being relocated from
> one datanode to the other. I understand that the communication with the
> namenode occurs to change
> metadata of the block being relocated accordingly, but have not understood
> what does the actual copy
> of the block itself.

The block movement is done from a DN to another DN directly when it
receives a command from the NameNode to do so (part of the heartbeat
mechanism).

-- 
Harsh J