You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kun Liu (JIRA)" <ji...@apache.org> on 2016/06/15 18:53:09 UTC
[jira] [Updated] (SPARK-15969) FileNotFoundException: Multiple
arguments for py-files flag, (also jars) for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kun Liu updated SPARK-15969:
----------------------------
Remaining Estimate: 120h (was: 168h)
Original Estimate: 120h (was: 168h)
> FileNotFoundException: Multiple arguments for py-files flag, (also jars) for spark-submit
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-15969
> URL: https://issues.apache.org/jira/browse/SPARK-15969
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 1.5.0, 1.6.1
> Environment: Mac OS X 10.11.5
> Reporter: Kun Liu
> Priority: Minor
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> First time to open a JIRA issue. Newbie to the Spark community. Correct me if I was wrong. Thanks.
> An exception, java.io.FileNotFoundException, happened when multiple arguments were specified for the -py-files (also -jars) flag.
> I searched for a while but only found a similar issue on Windows OS: https://issues.apache.org/jira/browse/SPARK-6435
> My experiments environment was Mac OS X and Spark version 1.5.0 and 1.6.1
> 1.1 Observations:
> 1) Quotation does not make any difference for the arguments, the result will always be the same
> 2) The first path before comma, as long as valid, won’t be a problem whether it is an absolute or a relative path
> 3) The second and further py-files paths won’t be a problem if ALL of them are:
> a. are relative paths under the same directory as the working directory ($PWD); OR
> b. specified by using environment variable at the beginning, e.g. $ENV_VAR/path/to/file; OR
> c. preprocessed by $(echo path/to/*.py | tr ' ' ‘,’), no matter absolute or relative paths, as long as valid
> 4) The path of the driver program, assuming valid, does not matter, as it is a single file
> 1.2 Experiments:
> Assuming main.py calls functions from helper1.py and helper2.py, and all paths below are valid.
> ~/Desktop/testpath: main.py, helper1.py, helper2.py
> $SPARK_HOME/testpath: helper1.py, helper2.py
> 1) Successful output:
> a. Multiple python paths are relative paths under the same directory as the working directory
> cd $SPARK_HOME
> bin/spark-submit --py-files testpath/helper1.py,testpath/helper2.py ~/Desktop/testpath/main.py
> cd ~/Desktop
> $SPARK_HOME/bin/spark-submit --py-files testpath/helper1.py,testpath/helper2.py testpath/main.py
> b. Multiple python paths are specified by using environment variable
> export TEST_DIR=~/Desktop/testpath
> cd ~
> $SPARK_HOME/bin/spark-submit --py-files $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
>
> cd ~/Documents
> $SPARK_HOME/bin/spark-submit --py-files $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
> c. Multiple paths (absolute or relative) after being preprocessed:
> $SPARK_HOME/bin/spark-submit --py-files $(echo $SPARK_HOME/testpath/helper*.py | tr ' ' ',') ~/Desktop/testpath/main.py
> cd ~/Desktop
> $SPARK_HOME/bin/spark-submit --py-files $(echo testpath/helper*.py | tr ' ' ',') ~/Desktop/testpath/main.py
> (reference link: http://stackoverflow.com/questions/24855368/spark-throws-classnotfoundexception-when-using-jars-option)
> 2) Failure output: if the second python path is an absolute one; the same problem will happen for further paths
> cd ~/Documents
> $SPARK_HOME/bin/spark-submit --py-files ~/Desktop/testpath/helper1.py,~/Desktop/testpath/helper2.py ~/Desktop/testpath/main.py
> py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
> : java.io.FileNotFoundException: Added file file:/Users/kunliu/Documents/~/Desktop/testpath/helper2.py does not exist.
> 1.3 Conclusions
> I would suggest the py-files flag of spark-submit could support all absolute paths arguments, not just relative path under the working directory.
> If necessary, I would like to submit a pull request and start working on it as my first contribution to the Spark community.
> 1.4 Note
> 1) I think the same issue will happen when multiple jar files delimited by comma are passed to the —jars flag flag for Java applications.
> 2) I suggest wildcard paths arguments should also be supported, as indicated by https://issues.apache.org/jira/browse/SPARK-3451
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org