You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2017/10/13 05:53:02 UTC

[jira] [Commented] (REEF-1892) HDFS File Copy only uses local HDFS

    [ https://issues.apache.org/jira/browse/REEF-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203093#comment-16203093 ] 

Markus Weimer commented on REEF-1892:
-------------------------------------

PR #1383 "fixed" this by undoing earlier changes for another kind of cluster. Hence, we now can't support those clusters. It seems to me that the right strategy could be to make this _configurable_: Whenever we need to convert a {{URL}} to a {{string}}, we use an instance of a yet to be defined {{IURLFormatter}} to do so. We can then use Tang to make different implementation of that interface available, depending on the cluster.

[~rogan], [~shouhengyi], WDYT?

> HDFS File Copy only uses local HDFS
> -----------------------------------
>
>                 Key: REEF-1892
>                 URL: https://issues.apache.org/jira/browse/REEF-1892
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF.NET IO
>    Affects Versions: 0.17
>            Reporter: Rogan Carr
>
> In REEF-1827 [1], the URI used to specify remote and local files were changed to use the "AbsolutePath". [2]
> This means that a file specified as "hdfs://my/file" becomes "/my/file" and the hdfs:// is assumed by the `dfs` command.
> This is fine if you are using vanilla HDFS, but for cases like Blob Storage in Azure, there is a special prefix, `wasb://` that is used instead of `hdfs://`. This means that the AbsolutePath method trims off the wasb, and this Copy() function instead attempts to download the file from the local HDFS instead of WASB.
> We need to revisit this issue and keep the full path for copies while also keeping proper casing in the path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)