You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/14 21:58:41 UTC
[GitHub] vanzin opened a new pull request #23793: [SPARK-24736][k8s] Let
spark-submit handle dependency resolution.
vanzin opened a new pull request #23793: [SPARK-24736][k8s] Let spark-submit handle dependency resolution.
URL: https://github.com/apache/spark/pull/23793
Before this change, there was some code in the k8s backend to deal
with how to resolve dependencies and make them available to the
Spark application. It turns out that none of that code is necessary,
since spark-submit already handles all that for applications started
in client mode - like the k8s driver that is run inside a Spark-created
pod.
For that reason, specifically for pyspark, there's no need for the
k8s backend to deal with PYTHONPATH; or, in general, to change the URIs
provided by the user at all. spark-submit takes care of that.
For testing, I created a pyspark script that depends on another module
that is shipped with --py-files. Then I used:
- --py-files http://.../dep.py http://.../test.py
- --py-files local:/.../dep.py local:/.../test.py
In both cases the driver now see all the needed files, while before
the driver would not see the dependency in the http case.
The application completes successfully after this patch in the
first case. Although that is because currently k8s apps will
download files to the working dir, making it possible for the
pyspark app to load them without PYTHONPATH tricks.
The app itself in the second case did not work before this change,
and continues to not work. That's because there's no code in Spark
to properly make local: files available in the executor's PYTHONPATH.
I'm leaving that as a separate issue.
I also tested a Scala app using the main jar from an http server.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org