You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/14 21:58:41 UTC

[GitHub] vanzin opened a new pull request #23793: [SPARK-24736][k8s] Let spark-submit handle dependency resolution.

vanzin opened a new pull request #23793: [SPARK-24736][k8s] Let spark-submit handle dependency resolution.
URL: https://github.com/apache/spark/pull/23793
 
 
   Before this change, there was some code in the k8s backend to deal
   with how to resolve dependencies and make them available to the
   Spark application. It turns out that none of that code is necessary,
   since spark-submit already handles all that for applications started
   in client mode - like the k8s driver that is run inside a Spark-created
   pod.
   
   For that reason, specifically for pyspark, there's no need for the
   k8s backend to deal with PYTHONPATH; or, in general, to change the URIs
   provided by the user at all. spark-submit takes care of that.
   
   For testing, I created a pyspark script that depends on another module
   that is shipped with --py-files. Then I used:
   
   - --py-files http://.../dep.py http://.../test.py
   - --py-files local:/.../dep.py local:/.../test.py
   
   In both cases the driver now see all the needed files, while before
   the driver would not see the dependency in the http case.
   
   The application completes successfully after this patch in the
   first case. Although that is because currently k8s apps will
   download files to the working dir, making it possible for the
   pyspark app to load them without PYTHONPATH tricks.
   
   The app itself in the second case did not work before this change,
   and continues to not work. That's because there's no code in Spark
   to properly make local: files available in the executor's PYTHONPATH.
   I'm leaving that as a separate issue.
   
   I also tested a Scala app using the main jar from an http server.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org