You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matt Mould (JIRA)" <ji...@apache.org> on 2018/06/14 13:22:00 UTC

[jira] [Comment Edited] (SPARK-13587) Support virtualenv in PySpark

    [ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512433#comment-16512433 ] 

Matt Mould edited comment on SPARK-13587 at 6/14/18 1:21 PM:
-------------------------------------------------------------

What is the current status of this ticket please? This [article|https://community.hortonworks.com/articles/104947/using-virtualenv-with-pyspark.html] suggests that it's done, but it doesn't work for me with the following command.
{code:java}
spark-submit --deploy-mode cluster --master yarn --py-files parallelisation_hack-0.1-py2.7.egg --conf spark.pyspark.virtualenv.enabled=true  --conf spark.pyspark.virtualenv.type=native --conf spark.pyspark.virtualenv.requirements=requirements.txt --conf spark.pyspark.virtualenv.bin.path=virtualenv --conf spark.pyspark.python=python3 pyspark_poc_runner.py{code}


was (Author: mattmould):
What is the current status of this ticket please? This [article|https://community.hortonworks.com/articles/104947/using-virtualenv-with-pyspark.html] suggests that it's done, but the it doesn't work for me with the following command.
{code:java}
spark-submit --deploy-mode cluster --master yarn --py-files parallelisation_hack-0.1-py2.7.egg --conf spark.pyspark.virtualenv.enabled=true  --conf spark.pyspark.virtualenv.type=native --conf spark.pyspark.virtualenv.requirements=requirements.txt --conf spark.pyspark.virtualenv.bin.path=virtualenv --conf spark.pyspark.python=python3 pyspark_poc_runner.py{code}

> Support virtualenv in PySpark
> -----------------------------
>
>                 Key: SPARK-13587
>                 URL: https://issues.apache.org/jira/browse/SPARK-13587
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Major
>
> Currently, it's not easy for user to add third party python packages in pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native virtualenv another is through conda. This jira is trying to migrate these 2 tools to distributed environment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org