You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2012/09/06 15:05:59 UTC

Re: Transfer blocks from one datanode to another

Hi,

Please do not use the general@ lists for development/usage questions.
This list is meant for project-level discussions alone. Thanks! :)

I've moved this mail to hdfs-dev@hadoop.apache.org. When replying,
please instead use this list, going forward.

My reply inline:

On Thu, Sep 6, 2012 at 3:12 PM, Adrian (Xinyu) Liu <ad...@sas.com> wrote:
> Hi All,
>
> Nowadays, I am working with HDFS and implementing some functionalities base on HDFS API.
> As I knew, one specific file is divided into several blocks and distributed into different datanode with certain replication numbers.
> And I want to find out a series of HDFS API which can meet the following requirement:
>
>
> 1.       Given a specific filename and related information that already uploaded into the HDFS, retrieve: how many blocks are there,
>
> each datanode contain which blocks, etc.

This isn't possible to get if you're using simple Public APIs.

The FileSystem#getFileBlockLocations will tell you what hosts are
carrying the blocks of a file (a list of hosts for each block in the
file). See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

For the list of block IDs, you'd have to pull from a DFSClient
instance, which calls the (NameNode-side) ClientProtocol's
getBlockLocations(…) method call. See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java?view=markup

> 2.       Given a specific filename, source datanode, specific block id, destination datanode and related information to transfer the block
>
> from source node to destination node.

This needs to be done via the DataTransferProtocol, and its specific
method of replaceBlock(…). See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java?view=markup

> I've read several materials and API reference about HDFS and can't find appropriate ones, the objective of this mail is to make sure
> if such  API existed, and if so, what are they (especially the second one, transfer a specific block of a specific file from a certain datanode to another)
>
> There is a tool called Balancer already existed in HDFS package, I am reading the source code, but it's too intricate to track the line, can anyone help me?

In the Balancer sources, see the final replaceBlock(…) call made at
L376 at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java?view=markup,
and then trace backwards from that point to see how its built up till
that point.

Feel free to send across any more questions you have!

-- 
Harsh J