You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by "Adrian (Xinyu) Liu" <ad...@sas.com> on 2012/09/06 11:42:47 UTC

Transfer blocks from one datanode to another

Hi All,

Nowadays, I am working with HDFS and implementing some functionalities base on HDFS API.
As I knew, one specific file is divided into several blocks and distributed into different datanode with certain replication numbers.
And I want to find out a series of HDFS API which can meet the following requirement:


1.       Given a specific filename and related information that already uploaded into the HDFS, retrieve: how many blocks are there,

each datanode contain which blocks, etc.


2.       Given a specific filename, source datanode, specific block id, destination datanode and related information to transfer the block

from source node to destination node.

I've read several materials and API reference about HDFS and can't find appropriate ones, the objective of this mail is to make sure
if such  API existed, and if so, what are they (especially the second one, transfer a specific block of a specific file from a certain datanode to another)

There is a tool called Balancer already existed in HDFS package, I am reading the source code, but it's too intricate to track the line, can anyone help me?


Thanks & Best Regards,
Adrian(Xinyu) Liu
Customer Competence Division, SAS R&D Beijing
TEL: +86 10 83193849   MOBILE: +86 186 1064 1590
SAS(r) ... THE POWER TO KNOW(r)



Re: Transfer blocks from one datanode to another

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Please do not use the general@ lists for development/usage questions.
This list is meant for project-level discussions alone. Thanks! :)

I've moved this mail to hdfs-dev@hadoop.apache.org. When replying,
please instead use this list, going forward.

My reply inline:

On Thu, Sep 6, 2012 at 3:12 PM, Adrian (Xinyu) Liu <ad...@sas.com> wrote:
> Hi All,
>
> Nowadays, I am working with HDFS and implementing some functionalities base on HDFS API.
> As I knew, one specific file is divided into several blocks and distributed into different datanode with certain replication numbers.
> And I want to find out a series of HDFS API which can meet the following requirement:
>
>
> 1.       Given a specific filename and related information that already uploaded into the HDFS, retrieve: how many blocks are there,
>
> each datanode contain which blocks, etc.

This isn't possible to get if you're using simple Public APIs.

The FileSystem#getFileBlockLocations will tell you what hosts are
carrying the blocks of a file (a list of hosts for each block in the
file). See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

For the list of block IDs, you'd have to pull from a DFSClient
instance, which calls the (NameNode-side) ClientProtocol's
getBlockLocations(…) method call. See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java?view=markup

> 2.       Given a specific filename, source datanode, specific block id, destination datanode and related information to transfer the block
>
> from source node to destination node.

This needs to be done via the DataTransferProtocol, and its specific
method of replaceBlock(…). See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java?view=markup

> I've read several materials and API reference about HDFS and can't find appropriate ones, the objective of this mail is to make sure
> if such  API existed, and if so, what are they (especially the second one, transfer a specific block of a specific file from a certain datanode to another)
>
> There is a tool called Balancer already existed in HDFS package, I am reading the source code, but it's too intricate to track the line, can anyone help me?

In the Balancer sources, see the final replaceBlock(…) call made at
L376 at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java?view=markup,
and then trace backwards from that point to see how its built up till
that point.

Feel free to send across any more questions you have!

-- 
Harsh J

Re: Transfer blocks from one datanode to another

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Please do not use the general@ lists for development/usage questions.
This list is meant for project-level discussions alone. Thanks! :)

I've moved this mail to hdfs-dev@hadoop.apache.org. When replying,
please instead use this list, going forward.

My reply inline:

On Thu, Sep 6, 2012 at 3:12 PM, Adrian (Xinyu) Liu <ad...@sas.com> wrote:
> Hi All,
>
> Nowadays, I am working with HDFS and implementing some functionalities base on HDFS API.
> As I knew, one specific file is divided into several blocks and distributed into different datanode with certain replication numbers.
> And I want to find out a series of HDFS API which can meet the following requirement:
>
>
> 1.       Given a specific filename and related information that already uploaded into the HDFS, retrieve: how many blocks are there,
>
> each datanode contain which blocks, etc.

This isn't possible to get if you're using simple Public APIs.

The FileSystem#getFileBlockLocations will tell you what hosts are
carrying the blocks of a file (a list of hosts for each block in the
file). See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

For the list of block IDs, you'd have to pull from a DFSClient
instance, which calls the (NameNode-side) ClientProtocol's
getBlockLocations(…) method call. See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java?view=markup

> 2.       Given a specific filename, source datanode, specific block id, destination datanode and related information to transfer the block
>
> from source node to destination node.

This needs to be done via the DataTransferProtocol, and its specific
method of replaceBlock(…). See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java?view=markup

> I've read several materials and API reference about HDFS and can't find appropriate ones, the objective of this mail is to make sure
> if such  API existed, and if so, what are they (especially the second one, transfer a specific block of a specific file from a certain datanode to another)
>
> There is a tool called Balancer already existed in HDFS package, I am reading the source code, but it's too intricate to track the line, can anyone help me?

In the Balancer sources, see the final replaceBlock(…) call made at
L376 at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java?view=markup,
and then trace backwards from that point to see how its built up till
that point.

Feel free to send across any more questions you have!

-- 
Harsh J