You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yesha Vora (JIRA)" <ji...@apache.org> on 2017/01/06 01:58:58 UTC
[jira] [Updated] (SPARK-19095) virtualenv example does not work in yarn cluster mode

     [ https://issues.apache.org/jira/browse/SPARK-19095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yesha Vora updated SPARK-19095:
-------------------------------
    Description: 
Spark version: 2
Steps:
* install virtualenv on all nodes
* create requirement1.txt with "numpy > requirement1.txt "
* Run kmeans.py application in yarn-cluster mode. 
{code}
spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3{code}
The application fails to find numpy.
{code}
LogType:stdout
Log Upload Time:Thu Jan 05 20:05:49 +0000 2017
LogLength:134
Log Contents:
Traceback (most recent call last):
  File "kmeans.py", line 27, in <module>
    import numpy as np
ImportError: No module named numpy

End of LogType:stdout
{code}

  was:
Steps:
* install virtualenv on all nodes
* create requirement1.txt with "numpy > requirement1.txt "
* Run kmeans.py application in yarn-cluster mode. 
{code}
spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3{code}
The application fails to find numpy.
{code}
LogType:stdout
Log Upload Time:Thu Jan 05 20:05:49 +0000 2017
LogLength:134
Log Contents:
Traceback (most recent call last):
  File "kmeans.py", line 27, in <module>
    import numpy as np
ImportError: No module named numpy

End of LogType:stdout
{code}


> virtualenv example does not work in yarn cluster mode
> -----------------------------------------------------
>
>                 Key: SPARK-19095
>                 URL: https://issues.apache.org/jira/browse/SPARK-19095
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Priority: Critical
>
> Spark version: 2
> Steps:
> * install virtualenv on all nodes
> * create requirement1.txt with "numpy > requirement1.txt "
> * Run kmeans.py application in yarn-cluster mode. 
> {code}
> spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3{code}
> The application fails to find numpy.
> {code}
> LogType:stdout
> Log Upload Time:Thu Jan 05 20:05:49 +0000 2017
> LogLength:134
> Log Contents:
> Traceback (most recent call last):
>   File "kmeans.py", line 27, in <module>
>     import numpy as np
> ImportError: No module named numpy
> End of LogType:stdout
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org