You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jason Cwik (JIRA)" <ji...@apache.org> on 2018/04/12 20:58:00 UTC

[jira] [Commented] (HADOOP-14698) Make copyFromLocal's -t option available for put as well

    [ https://issues.apache.org/jira/browse/HADOOP-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436296#comment-16436296 ] 

Jason Cwik commented on HADOOP-14698:
-------------------------------------

As mentioned above in https://issues.apache.org/jira/browse/HADOOP-14698?focusedCommentId=16107552&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16107552 the current threading model only works for the leaf nodes.  In deep/wide tree structures, the enumeration can take a significant amount of time itself, especially when using other FileSystem implementations like S3A or other object store connectors.  I started a patch in HDFS-13398 to address this (especially for `ls` or `du` commands) but it could likely be combined with this effort to parallelize the FsShell module in general.

So far, we've tried two approaches.  The first simply creates another executor in the base class and enqueues the child operations in processPaths.  The second approach uses ForkJoinPool to crawl the tree and process subtrees in parallel.  Currently, we have FJP working with `ls` and `du`, but not other operations.  I think that FJP is the best route since we could do things like wait to delete a directory until all its children have been deleted, but in order to do this properly it might require a significant refactoring of the whole FsShell module to implement the correct ForkJoinTask structure.

Thoughts?


> Make copyFromLocal's -t option available for put as well
> --------------------------------------------------------
>
>                 Key: HADOOP-14698
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14698
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Andras Bokor
>            Assignee: Andras Bokor
>            Priority: Major
>         Attachments: HADOOP-14698.01.patch, HADOOP-14698.02.patch, HADOOP-14698.03.patch, HADOOP-14698.04.patch, HADOOP-14698.05.patch, HADOOP-14698.06.patch, HADOOP-14698.07.patch, HADOOP-14698.08.patch
>
>
> After HDFS-11786 copyFromLocal and put are no longer identical.
> I do not see any reason why not to add the new feature to put as well.
> Being non-identical makes the understanding/usage of command more complicated from user point of view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org