You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2015/10/08 18:27:27 UTC
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

    [ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948941#comment-14948941 ] 

Thomas Graves commented on SPARK-10858:
---------------------------------------

Sorry for the delay on this didn't have time to look at it.  Not sure why you are seeing different from me.

thanks for looking into this.  I agree its in the parsing  in the resolveURI where its calling     new File(path).getAbsoluteFile().toURI().

When I don't specify file://:
15/10/08 15:35:56 INFO Client: local uri is: file:/homes/tgraves/R_install/R_install.tgz%23R_installation

with file://
15/10/08 15:38:27 INFO Client: local uri is: file:/homes/tgraves/R_install/R_install.tgz#R_installation

That is coming back with the %23 encoded versus the #.   when I originally wrote those code it wasn't calling the Utils.resolveURIs.  

 Looking at the actual code for File.toURI() you will see its not really parsing the fragment out before calling URI() which I think is the problem:

   public URI toURI() {
        try {
            File f = getAbsoluteFile();
            String sp = slashify(f.getPath(), f.isDirectory());
            if (sp.startsWith("//"))
                sp = "//" + sp;
            return new URI("file", null, sp, null);
        } catch (URISyntaxException x) {
            throw new Error(x);         // Can't happen
        }
    }


It seems like a bad idea to call this based on the fact that the string might already be URI format.  So we are now going from possible URI to File and back to URI.  When we change it to a File its not expecting it to be URI with fragment already so its treating it as part of the path.


> YARN: archives/jar/files rename with # doesn't work unless scheme given
> -----------------------------------------------------------------------
>
>                 Key: SPARK-10858
>                 URL: https://issues.apache.org/jira/browse/SPARK-10858
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.5.1
>            Reporter: Thomas Graves
>            Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you can rename the file/archive using a # symbol only works if you explicitly include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File file:/home/foo/my.jar#renamed.jar does not exist
>         at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>         at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>         at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>         at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
>         at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
>         at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
>         at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
>         at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org