You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kun Liu (JIRA)" <ji...@apache.org> on 2016/10/30 01:17:58 UTC

[jira] [Closed] (SPARK-15969) FileNotFoundException: Multiple arguments for py-files flag, (also jars) for spark-submit

     [ https://issues.apache.org/jira/browse/SPARK-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kun Liu closed SPARK-15969.
---------------------------
    Resolution: Done

Seems to be working. So close this JIRA.

> FileNotFoundException: Multiple arguments for py-files flag, (also jars) for spark-submit
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-15969
>                 URL: https://issues.apache.org/jira/browse/SPARK-15969
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.5.0, 1.6.1
>         Environment: Mac OS X 10.11.5
>            Reporter: Kun Liu
>            Priority: Minor
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> First time to open a JIRA issue. Newbie to the Spark community. Correct me if I was wrong. Thanks.
> An exception, java.io.FileNotFoundException, happened when multiple arguments were specified for the -py-files (also -jars) flag.
> I searched for a while but only found a similar issue on Windows OS: https://issues.apache.org/jira/browse/SPARK-6435
> My experiments environment was Mac OS X and Spark version 1.5.0 and 1.6.1
> 1.1 Observations:
> 1) Quotation does not make any difference for the arguments, the result will always be the same
> 2) The first path before comma, as long as valid, won’t be a problem whether it is an absolute or a relative path
> 3) The second and further py-files paths won’t be a problem if ALL of them are:
> 	a. are relative paths under the same directory as the working directory ($PWD); OR
> 	b. specified by using environment variable at the beginning, e.g. $ENV_VAR/path/to/file; OR
> 	c. preprocessed by $(echo path/to/*.py | tr ' ' ‘,’), no matter absolute or relative paths, as long as valid
> 4) The path of the driver program, assuming valid, does not matter, as it is a single file
> 1.2 Experiments:
> Assuming main.py calls functions from helper1.py and helper2.py, and all paths below are valid.
> ~/Desktop/testpath: main.py, helper1.py, helper2.py
> $SPARK_HOME/testpath: helper1.py, helper2.py
> 1) Successful output:
> 	a. Multiple python paths are relative paths under the same directory as the working directory
> 	cd $SPARK_HOME
> 	bin/spark-submit --py-files testpath/helper1.py,testpath/helper2.py ~/Desktop/testpath/main.py
> 	cd ~/Desktop
> 	$SPARK_HOME/bin/spark-submit --py-files testpath/helper1.py,testpath/helper2.py testpath/main.py
> 	b. Multiple python paths are specified by using environment variable
> 	export TEST_DIR=~/Desktop/testpath
> 	cd ~
> 	$SPARK_HOME/bin/spark-submit --py-files $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
> 	
> 	cd ~/Documents
> 	$SPARK_HOME/bin/spark-submit --py-files $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
> 	c. Multiple paths (absolute or relative) after being preprocessed:
> 	$SPARK_HOME/bin/spark-submit --py-files $(echo $SPARK_HOME/testpath/helper*.py | tr ' ' ',') ~/Desktop/testpath/main.py 
> 	cd ~/Desktop
> 	$SPARK_HOME/bin/spark-submit --py-files $(echo testpath/helper*.py | tr ' ' ',') ~/Desktop/testpath/main.py 
> 	(reference link: http://stackoverflow.com/questions/24855368/spark-throws-classnotfoundexception-when-using-jars-option)
> 2) Failure output: if the second python path is an absolute one; the same problem will happen for further paths
> 	cd ~/Documents
> 	$SPARK_HOME/bin/spark-submit --py-files ~/Desktop/testpath/helper1.py,~/Desktop/testpath/helper2.py ~/Desktop/testpath/main.py 
> 	py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
> 	: java.io.FileNotFoundException: Added file file:/Users/kunliu/Documents/~/Desktop/testpath/helper2.py does not exist.
> 1.3 Conclusions
> I would suggest the py-files flag of spark-submit could support all absolute paths arguments, not just relative path under the working directory.
> If necessary, I would like to submit a pull request and start working on it as my first contribution to the Spark community.
> 1.4 Note
> 1) I think the same issue will happen when multiple jar files delimited by comma are passed to the —jars flag flag for Java applications.
> 2) I suggest wildcard paths arguments should also be supported, as indicated by https://issues.apache.org/jira/browse/SPARK-3451



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org