You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ZanderXu (Jira)" <ji...@apache.org> on 2022/08/11 03:49:00 UTC

[jira] [Comment Edited] (HDFS-2139) Fast copy for HDFS.

    [ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578236#comment-17578236 ] 

ZanderXu edited comment on HDFS-2139 at 8/11/22 3:48 AM:
---------------------------------------------------------

[~weichiu] [~ferhui] Thanks for planing to push this feature forward. 
{quote}Some questions I have as I wasn't involved in this from the begining. How is this different from other similar features? E.g. HDFS-3370 HDFS-15294 (federation rename/balance)
{quote}
HDFS-3370 proposes hard link a file in one NameService with the same block list.

HDFS-15294 proposes a solution to balance files in different NameServices by DistCp. 

HDFS-2139 proposes a high performance data migration tool, FastCp. Because if the source DN belongs to both the source NameService and the target NameService, we can use hard link technology instead of data copy to improve the performance.

!image-2022-08-11-11-48-17-994.png|width=849,height=300!

I have some practical experience with it. I'd like to take over and push this feature forward if I can. [~ferhui] [~weichiu] [~hexiaoqiao] 


was (Author: xuzq_zander):
[~weichiu] [~ferhui] Thanks for planing to push this feature forward. 
{quote}Some questions I have as I wasn't involved in this from the begining. How is this different from other similar features? E.g. HDFS-3370 HDFS-15294 (federation rename/balance)
{quote}
HDFS-3370 proposes hard link a file in one NameService with the same block list.

HDFS-15294 proposes a solution to balance files in different NameServices by DistCp. 

HDFS-2139 proposes a high performance data migration tool, FastCp. Because if the source DN belongs to both the source NameService and the target NameService, we can use hard link technology instead of data copy to improve the performance.

!SeaTalk_IMG_1660188087.png|width=1175,height=415!

I have some practical experience with it. I'd like to take over and push this feature forward if I can. [~ferhui] [~weichiu] [~hexiaoqiao] 

> Fast copy for HDFS.
> -------------------
>
>                 Key: HDFS-2139
>                 URL: https://issues.apache.org/jira/browse/HDFS-2139
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Pritam Damania
>            Assignee: Rituraj
>            Priority: Major
>         Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org