You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/05/24 18:17:17 UTC
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/21426
[SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correctly into PythonRunner in submit with client mode in spark-submit
## What changes were proposed in this pull request?
In client side before context initialization specifically, .py file doesn't work in client side before context initialization when the application is a Python file. See below:
```
$ cat /home/spark/tmp.py
def testtest():
return 1
```
This works:
```
$ cat app.py
import pyspark
pyspark.sql.SparkSession.builder.getOrCreate()
import tmp
print("************************%s" % tmp.testtest())
$ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py
...
************************1
```
but this doesn't:
```
$ cat app.py
import pyspark
import tmp
pyspark.sql.SparkSession.builder.getOrCreate()
print("************************%s" % tmp.testtest())
$ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py
Traceback (most recent call last):
File "/home/spark/spark/app.py", line 2, in <module>
import tmp
ImportError: No module named tmp
```
### How did it happen?
In client mode specifically, the paths are being added into PythonRunner as are:
https://github.com/apache/spark/blob/628c7b517969c4a7ccb26ea67ab3dd61266073ca/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L430
https://github.com/apache/spark/blob/628c7b517969c4a7ccb26ea67ab3dd61266073ca/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L49-L88
The problem here is, .py file shouldn't be added as are since `PYTHONPATH` expects a directory or an archive like zip or egg.
### How does this PR fix?
We shouldn't simply just add its parent directory because other files in the parent directory could also be added into the `PYTHONPATH` in client mode before context initialization.
Therefore, we copy .py files into a temp directory for .py files and add it to `PYTHONPATH`.
## How was this patch tested?
Unit tests are added and manually tested in both standalond and yarn client modes with submit.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-24384
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21426.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21426
----
commit b76854dc58b4cd5c73933cff2b8b7d8e3ffb23ac
Author: hyukjinkwon <gu...@...>
Date: 2018-05-24T17:34:31Z
Add .py files correctly into PythonRunner in submit with client mode in spark-submit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r190683345
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
localJars = Option(args.jars).map {
downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
}.orNull
- localPyFiles = Option(args.pyFiles).map {
- downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+ localPyFiles = Option(args.pyFiles).map { pyFiles =>
+ if (isClientPythonSubmit) {
+ // In case of client with submit, the python paths should be set before context
+ // initialization.
+ // In case of shell, the context initialization is done ahead so we are
+ // fine but in case of client with submit, the context initialization can be done later.
+ // We will copy the local .py files because .py file shouldn't be added
+ // alone but its parent directory. See SPARK-24384.
+ localPyFilesTargetDir = Utils.createTempDir(namePrefix = "localPyFiles")
+ Utils.stringToSeq(pyFiles).map { pyFile =>
--- End diff --
This logic is copied from `downloadFileList`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91185/testReport)** for PR 21426 at commit [`90b38b9`](https://github.com/apache/spark/commit/90b38b9ed395bca7c1a872a1ceeac536e8196550).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91140/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r190778033
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
localJars = Option(args.jars).map {
downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
}.orNull
- localPyFiles = Option(args.pyFiles).map {
- downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+ localPyFiles = Option(args.pyFiles).map { pyFiles =>
+ if (isClientPythonSubmit) {
--- End diff --
Yup, it can be. Will try.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91249/testReport)** for PR 21426 at commit [`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r191039095
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,30 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ val dest = Utils.createTempDir(namePrefix = "localPyFiles")
+ pyFiles.flatMap { pyFile =>
+ // In case of client with submit, the python paths should be set before context
+ // initialization because the context initialization can be done later.
+ // We will copy the local ".py" files because ".py" file shouldn't be added
+ // alone but its parent directory in PYTHONPATH. See SPARK-24384.
+ if (pyFile.endsWith(".py")) {
+ val source = new File(pyFile)
+ if (source.exists() && source.canRead) {
--- End diff --
@vanzin, do you mean that this should be checked ahead (for example in SparkSubmit) before we are in this logic?
Just for clarification, this is just a sanity check. The previous behaviour was that the path is added but it's ignored and the current behaviour is that it doesn't add the path.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3662/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r191855502
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,30 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ lazy val dest = Utils.createTempDir(namePrefix = "localPyFiles")
+ pyFiles.flatMap { pyFile =>
+ // In case of client with submit, the python paths should be set before context
+ // initialization because the context initialization can be done later.
+ // We will copy the local ".py" files because ".py" file shouldn't be added
+ // alone but its parent directory in PYTHONPATH. See SPARK-24384.
+ if (pyFile.endsWith(".py")) {
+ val source = new File(pyFile)
+ if (source.exists() && source.isFile && source.canRead) {
--- End diff --
Using both `exists` and `isFile` is redundant, but no biggie.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r191010819
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,30 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ val dest = Utils.createTempDir(namePrefix = "localPyFiles")
+ pyFiles.flatMap { pyFile =>
+ // In case of client with submit, the python paths should be set before context
+ // initialization because the context initialization can be done later.
+ // We will copy the local ".py" files because ".py" file shouldn't be added
+ // alone but its parent directory in PYTHONPATH. See SPARK-24384.
+ if (pyFile.endsWith(".py")) {
+ val source = new File(pyFile)
+ if (source.exists() && source.canRead) {
--- End diff --
`source.isFile() && source.canRead()`
re: unreadable files, is there a check for it anywhere else? If not, that should be added, or the app might fail with some hard to debug exception.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91139/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3559/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91185/testReport)** for PR 21426 at commit [`90b38b9`](https://github.com/apache/spark/commit/90b38b9ed395bca7c1a872a1ceeac536e8196550).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91139/testReport)** for PR 21426 at commit [`15d6ae2`](https://github.com/apache/spark/commit/15d6ae219ac134a277a74f5e4884e4ebc6cfcf34).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21426
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3654/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91140/testReport)** for PR 21426 at commit [`39b10c5`](https://github.com/apache/spark/commit/39b10c5656a48f813a95d48d752e2d44ccb2c0d9).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21426
@vanzin and @jerryshao, thanks you so much.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r190778192
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
localJars = Option(args.jars).map {
downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
}.orNull
- localPyFiles = Option(args.pyFiles).map {
- downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+ localPyFiles = Option(args.pyFiles).map { pyFiles =>
+ if (isClientPythonSubmit) {
--- End diff --
Agreed with @vanzin , we can move this logic to python related code.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91139/testReport)** for PR 21426 at commit [`15d6ae2`](https://github.com/apache/spark/commit/15d6ae219ac134a277a74f5e4884e4ebc6cfcf34).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91118/testReport)** for PR 21426 at commit [`b76854d`](https://github.com/apache/spark/commit/b76854dc58b4cd5c73933cff2b8b7d8e3ffb23ac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91249/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91240/testReport)** for PR 21426 at commit [`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91118/testReport)** for PR 21426 at commit [`b76854d`](https://github.com/apache/spark/commit/b76854dc58b4cd5c73933cff2b8b7d8e3ffb23ac).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21426
cc @vanzin and @jerryshao.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21426
Did you try remote py files, does it have similar issue?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91118/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3583/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r190803966
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,25 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ val dest = Utils.createTempDir(namePrefix = "localPyFiles")
+ pyFiles.map { pyFile =>
+ // In case of client with submit, the python paths should be set before context
+ // initialization because the context initialization can be done later.
+ // We will copy the local ".py" files because ".py" file shouldn't be added
+ // alone but its parent directory in PYTHONPATH. See SPARK-24384.
+ if (pyFile.endsWith(".py")) {
+ val source = new File(pyFile)
--- End diff --
Shall we check if the file is existed or not?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r190808099
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,25 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ val dest = Utils.createTempDir(namePrefix = "localPyFiles")
+ pyFiles.map { pyFile =>
+ // In case of client with submit, the python paths should be set before context
+ // initialization because the context initialization can be done later.
+ // We will copy the local ".py" files because ".py" file shouldn't be added
+ // alone but its parent directory in PYTHONPATH. See SPARK-24384.
+ if (pyFile.endsWith(".py")) {
+ val source = new File(pyFile)
--- End diff --
Yeap
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91240/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21426
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91150/testReport)** for PR 21426 at commit [`3db9bad`](https://github.com/apache/spark/commit/3db9bad9375594b01916da5311273f41cb571b76).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21426
I tested:
submit with yarn client: .py local
submit with yarn client: .py remote
submit with standalone client: .py local
submit with standalone client: .py remote
they all work fine.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91150/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91150/testReport)** for PR 21426 at commit [`3db9bad`](https://github.com/apache/spark/commit/3db9bad9375594b01916da5311273f41cb571b76).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3572/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21426
@vanzin, for https://github.com/apache/spark/pull/21426#discussion_r191010819, mind if we proceed in a separate ticket? From my look, it needs some changes to verify this to address this comment. I think we can't simply raise an exception since we can't recognise if that file is downloaded or not in `deploy.PythonRunner`'s perspective.
The most appropriate place seems to be in `SparkSubmit` and `DependencyUtils.downloadFile`. seems we should inject some codes in `DependencyUtils.downloadFile` since that's where we know the original path and where we download the file into local when needed, and I would like to avoid add such changes here. It probably needs another review iteration and the current change doesn't actually target or change the previous behaviour, really.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3609/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91184/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91185/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3574/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r190761869
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
localJars = Option(args.jars).map {
downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
}.orNull
- localPyFiles = Option(args.pyFiles).map {
- downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+ localPyFiles = Option(args.pyFiles).map { pyFiles =>
+ if (isClientPythonSubmit) {
--- End diff --
Couldn't this logic be in `PythonRunner`? That's basically what SparkSubmit runs when the conditions you use to create `isClientPythonSubmit` are met.
This class is already pretty hard to navigate, it'd be better to avoid adding more special cases to it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91184/testReport)** for PR 21426 at commit [`e0e9e00`](https://github.com/apache/spark/commit/e0e9e002039f65dac09ce38c5e5d94cdf9014333).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91249/testReport)** for PR 21426 at commit [`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91240/testReport)** for PR 21426 at commit [`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91140/testReport)** for PR 21426 at commit [`39b10c5`](https://github.com/apache/spark/commit/39b10c5656a48f813a95d48d752e2d44ccb2c0d9).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21426
**[Test build #91184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91184/testReport)** for PR 21426 at commit [`e0e9e00`](https://github.com/apache/spark/commit/e0e9e002039f65dac09ce38c5e5d94cdf9014333).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r191491127
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,30 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ val dest = Utils.createTempDir(namePrefix = "localPyFiles")
+ pyFiles.flatMap { pyFile =>
+ // In case of client with submit, the python paths should be set before context
+ // initialization because the context initialization can be done later.
+ // We will copy the local ".py" files because ".py" file shouldn't be added
+ // alone but its parent directory in PYTHONPATH. See SPARK-24384.
+ if (pyFile.endsWith(".py")) {
+ val source = new File(pyFile)
+ if (source.exists() && source.canRead) {
--- End diff --
I think providing a non-existent file to spark-submit should result in an error. Whether the error happens here or somewhere else it doesn't really matter.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21426
I haven't tried yet but I believe it has since It downloads into local. It has the assumption that the file is local within deploy.PythonRunner side too. Will check for doubly sure.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21426#discussion_r191010406
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -153,4 +154,30 @@ object PythonRunner {
.map { p => formatPath(p, testWindows) }
}
+ /**
+ * Resolves the ".py" files. ".py" file should not be added as is because PYTHONPATH does
+ * not expect a file. This method creates a temporary directory and puts the ".py" files
+ * if exist in the given paths.
+ */
+ private def resolvePyFiles(pyFiles: Array[String]): Array[String] = {
+ val dest = Utils.createTempDir(namePrefix = "localPyFiles")
--- End diff --
`lazy`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3573/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21426
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org