You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/05/08 08:11:35 UTC
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/21267
[SPARK-21945][YARN][PYTHON] Make --py-files work in PySpark shell in Yarn client mode
## What changes were proposed in this pull request?
### Problem
When we run _PySpark shell with Yarn client mode_, specified `--py-files` are not recognised in _driver side_.
Here are the steps I took to check:
```bash
$ cat /home/spark/tmp.py
def testtest():
return 1
```
```bash
$ ./bin/pyspark --master yarn --deploy-mode client --py-files /home/spark/tmp.py
```
```python
>>> def test():
... import tmp
... return tmp.testtest()
...
>>> spark.range(1).rdd.map(lambda _: test()).collect() # executor side
[1]
>>> test() # driver side
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in test
ImportError: No module named tmp
```
### How it happened?
Unlike Yarn cluster and client mode with Spark submit, when Yarn client mode with PySpark shell specifically,
1. It first runs Python shell via:
https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L158 as pointed out by @tgravescs in the JIRA.
2. this triggers shell.py and submit another application to launch a py4j gateway:
https://github.com/apache/spark/blob/209b9361ac8a4410ff797cff1115e1888e2f7e66/python/pyspark/java_gateway.py#L45-L60
3. it runs a Py4J gateway:
https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L425
4. it copies --py-files into local temp directory:
https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L365-L376
and then these directories are set up to `spark.submit.pyFiles`
5. Py4J JVM is launched and then the Python paths are set via:
https://github.com/apache/spark/blob/7013eea11cb32b1e0038dc751c485da5c94a484b/python/pyspark/context.py#L209-L216
However, these are not actually set because those files were copied into a tmp directory in 4. whereas this code path looks for `SparkFiles.getRootDirectory` where the files are stored only when `SparkContext.addFile()` is called.
In other cluster mode, `spark.files` are set via:
https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L554-L555
and those files are explicitly added via:
https://github.com/apache/spark/blob/ecb8b383af1cf1b67f3111c148229e00c9c17c40/core/src/main/scala/org/apache/spark/SparkContext.scala#L395
So we are fine in other modes.
In case of Yarn client and submit with _submit_, these are manually being handled. In particular https://github.com/apache/spark/pull/6360 added most of the logics. In this case, the Python path looks manually set via, for example, `deploy.PythonRunner`. We don't use `spark.files` here.
### How does the PR fix the problem?
I tried to make an isolated approach as possible as I can: simply copy py file or zip files into `SparkFiles.getRootDirectory()` in driver side if not existing. Another possible way is to set `spark.files` but it does unnecessary stuff together and sounds a bit invasive.
### Before
```python
>>> def test():
... import tmp
... return tmp.testtest()
...
>>> spark.range(1).rdd.map(lambda _: test()).collect()
[1]
>>> test()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in test
ImportError: No module named tmp
```
### After
```python
>>> def test():
... import tmp
... return tmp.testtest()
...
>>> spark.range(1).rdd.map(lambda _: test()).collect()
[1]
>>> test()
1
```
## How was this patch tested?
I manually tested in standalone and yarn cluster with PySpark shell. .zip and .py files were also tested with the similar steps above. It's difficult to add a test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-21945
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21267.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21267
----
commit 68be3baef22d8b7aa58a432cb5bd12437c07feb7
Author: hyukjinkwon <gu...@...>
Date: 2018-05-08T07:36:31Z
Make --py-files work in PySpark shell in Yarn client mode
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90616/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21267
Does it only happen in yarn client PySpark shell? I would suggest to fix this in the SparkSubmit side, to treat this as a special case and set the proper config.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
(I have tried to explain why it's specific to PySpark shell with Yarn client mode in PR description)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3213/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187274278
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
--- End diff --
Are 'spark.submit.pyFiles' files only missing on driver side? I mean, if they are not added by `SparkContext.addFile`, shouldn't they also be missing on executors?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r186650331
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+ self._python_includes.append(filename)
+ sys.path.insert(1, filepath)
+ except Exception as e:
+ from pyspark import util
+ warnings.warn(
--- End diff --
Log was also tested manually:
```
.../python/pyspark/context.py:230: RuntimeWarning: Python file [/home/spark/tmp.py] specified in 'spark.submit.pyFiles' failed to be added in the Python path, excluding this in the Python path.
: ...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3214/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)** for PR 21267 at commit [`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90364/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187216038
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
--- End diff --
that's the initial approach I tried. thing is, .py file in the configuration. it needs its parent directory (not .py file itself) and it would add other .py files too if there are in the directort.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90704/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90565/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
Hm .. @jerryshao, seems it's a bit difficult to do so. The simplest way should be just to copy files into the directories in `SparkFiles.getRootDirectory`; however, `SparkEnv` is inaccessible at this stage in `SparkSubmit` ..
Another way might be to find if there's a way by setting `spark.files` so that they are added via `addFile` later which put the file in `SparkFiles.getRootDirectory` at driver side too but .. I wonder if it makes sense to set this which Yarn doesn't use.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187133079
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
--- End diff --
Is this copy necessary? Couldn't you just add `path` to `sys.path` (instead of adding `filepath`) and that would solve the problem?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21267
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
Yea, this is specific to yarn client PySpark shell. In case of yarn client and cluster with submit, they are specially handled via #6360 but I think PySpark shell in yarn client mode was missed out. The way of launching it is diverted if I understood correctly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90565/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3184/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90565/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90704/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187274825
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
--- End diff --
Yup, that's only missing on driver side in this mode specifically. Yarn doesn't add it since `spark.files` is not set if I understood correctly. They are specially handled in case of submit but shell case seems missing.
I described a bit in the PR description too.
> In case of Yarn client and cluster with submit, these are manually being handled. In particular #6360 added most of the logics. In this case, the Python path looks manually set via, for example, deploy.PythonRunner. We don't use spark.files here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)** for PR 21267 at commit [`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90614/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r186670486
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
--- End diff --
Am I missing anything? Looks like `PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar')`. So `.py` seems not in that?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work in PySp...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
cc @vanzin, @jerryshao and @tgravescs, could you take a look and see if it makes sense please?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187259682
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
--- End diff --
Oh, I see.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work in PySp...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90364/testReport)** for PR 21267 at commit [`68be3ba`](https://github.com/apache/spark/commit/68be3baef22d8b7aa58a432cb5bd12437c07feb7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90364/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3277/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187264822
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
--- End diff --
I don't think so but that's already being done in other cluster / client modes. The copies are made via addFile in other modes but it's not being copied in this case specifically. I think we should better consistently copy.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21267
**[Test build #90704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90704/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r188144573
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+ self._python_includes.append(filename)
+ sys.path.insert(1, filepath)
+ except Exception:
+ from pyspark import util
+ warnings.warn(
--- End diff --
Likewise, I checked the warning manually:
```
.../pyspark/context.py:229: RuntimeWarning: Failed to add file [/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path:
...
/usr/lib64/python27.zip
/usr/lib64/python2.7
...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21267
Looks good aside from the log message.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r187259493
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
--- End diff --
For file types in `PACKAGE_EXTENSIONS`, do we need to copy?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3036/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21267
Will try to put this into SparkSubmit.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r186673789
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
--- End diff --
the root is added into the path above. .py file needs its parent directory ..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r186920316
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+ self._python_includes.append(filename)
+ sys.path.insert(1, filepath)
+ except Exception as e:
+ from pyspark import util
+ warnings.warn(
--- End diff --
BTW, this should now be safer in any case since we now don't put non-existent files and print out warnings.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21267
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r188037259
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode, 'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we check if the file exists,
+ # try to copy and then add it to the path. See SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+ self._python_includes.append(filename)
+ sys.path.insert(1, filepath)
+ except Exception as e:
+ from pyspark import util
+ warnings.warn(
+ "Python file [%s] specified in 'spark.submit.pyFiles' failed "
--- End diff --
Simplify this message?
"Failed to add file [%s] speficied in 'spark.submit.pyFiles' to Python path:\n %s"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org