You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/02/05 18:33:00 UTC
[jira] [Created] (HADOOP-15209) PoC: DistCp to eliminate needless
deletion of files under deleted directories
Steve Loughran created HADOOP-15209:
---------------------------------------
Summary: PoC: DistCp to eliminate needless deletion of files under deleted directories
Key: HADOOP-15209
URL: https://issues.apache.org/jira/browse/HADOOP-15209
Project: Hadoop Common
Issue Type: Improvement
Components: tools/distcp
Affects Versions: 2.9.0
Reporter: Steve Loughran
DistCP issues a delete(file) request even if is underneath an already deleted directory. This generates needless load on filesystems/object stores, and, if the store throttles delete, can dramatically slow down the delete operation.
If the distcp delete operation can build a history of deleted directories, then it will know when it does not need to issue those deletes.
Care is needed here to make sure that whatever structure is created does not overload the heap of the process.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org