You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/10/02 22:49:34 UTC
[jira] [Commented] (SPARK-3769) SparkFiles.get gives me the wrong fully qualified path

    [ https://issues.apache.org/jira/browse/SPARK-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157156#comment-14157156 ] 

Sean Owen commented on SPARK-3769:
----------------------------------

My understanding is that you execute:

{code}
sc.addFile("/opt/tom/SparkFiles.sas");
...
SparkFiles.get("SparkFiles.sas");
{code}

I would not expect the key used by remote workers must be aware of the location on the driver that the file came from. The path may not be absolute in all cases anyway. I can see the argument that it feels like both should be the same "key" but really the key being set is the file name, not path.

You don't have to parse it by hand though. Usually you might do something like this anyway:

{code}
File myFile = new File(args[1]);
sc.addFile(myFile.getAbsolutePath());
String fileName = myFile.getName();
...
SparkFiles.get(fileName);
{code}

AFAIK this is as intended.

> SparkFiles.get gives me the wrong fully qualified path
> ------------------------------------------------------
>
>                 Key: SPARK-3769
>                 URL: https://issues.apache.org/jira/browse/SPARK-3769
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.0.2, 1.1.0
>         Environment: linux host, and linux grid.
>            Reporter: Tom Weber
>            Priority: Minor
>
> My spark pgm running on my host, (submitting work to my grid).
> JavaSparkContext sc =new JavaSparkContext(conf);
> final String path = args[1];
> sc.addFile(path); /* args[1] = /opt/tom/SparkFiles.sas */
> The log shows:
> 14/10/02 16:07:14 INFO Utils: Copying /opt/tom/SparkFiles.sas to /tmp/spark-4c661c3f-cb57-4c9f-a0e9-c2162a89db77/SparkFiles.sas
> 14/10/02 16:07:15 INFO SparkContext: Added file /opt/tom/SparkFiles.sas at http://10.20.xx.xx:49587/files/SparkFiles.sas with timestamp 1412280434986
> those are paths on my host machine. The location that this file gets on grid nodes is:
> /opt/tom/spark-1.1.0-bin-hadoop2.4/work/app-20141002160704-0002/1/SparkFiles.sas
> While the call to get the path in my code that runs in my mapPartitions function on the grid nodes is:
> String pgm = SparkFiles.get(path);
> And this returns the following string:
> /opt/tom/spark-1.1.0-bin-hadoop2.4/work/app-20141002160704-0002/1/./opt/tom/SparkFiles.sas
> So, am I expected to take the qualified path that was given to me and parse it to get only the file name at the end, and then concatenate that to what I get from the SparkFiles.getRootDirectory() call in order to get this to work?
> Or pass only the parsed file name to the SparkFiles.get method? Seems as though I should be able to pass the same file specification to both sc.addFile() and SparkFiles.get() and get the correct location of the file.
> Thanks,
> Tom



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org