You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yesha Vora (JIRA)" <ji...@apache.org> on 2017/01/06 02:02:58 UTC

[jira] [Reopened] (SPARK-19096) Kmeans.py application fails with virtualenv and due to parse error

     [ https://issues.apache.org/jira/browse/SPARK-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yesha Vora reopened SPARK-19096:
--------------------------------

> Kmeans.py application fails with virtualenv and due to  parse error 
> --------------------------------------------------------------------
>
>                 Key: SPARK-19096
>                 URL: https://issues.apache.org/jira/browse/SPARK-19096
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>            Reporter: Yesha Vora
>
> Spark version : 2
> Steps:
> * Install virtualenv ( pip install virtualenv)
> * create requirements.txt (pip freeze > /tmp/requirements.txt)
> * start kmeans.py application in yarn-client mode.
> The application fails with Runtime Exception
> {code:title=app log}
> 17/01/05 19:49:59 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
> 17/01/05 19:49:59 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
> Invalid requirement: 'pip freeze'
> Traceback (most recent call last):
>   File "/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1483592608863_0006/container_1483592608863_0006_01_000002/virtualenv_application_1483592608863_0006_0/lib/python2.7/site-packages/pip/req/req_install.py", line 82, in __init__
>     req = Requirement(req)
>   File "/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1483592608863_0006/container_1483592608863_0006_01_000002/virtualenv_application_1483592608863_0006_0/lib/python2.7/site-packages/pip/_vendor/packaging/requirements.py", line 96, in __init__
>     requirement_string[e.loc:e.loc + 8]))
> InvalidRequirement: Invalid requirement, parse error at "u'freeze'"
> 17/01/05 19:50:03 WARN BlockManager: Putting block rdd_3_0 failed due to an exception
> 17/01/05 19:50:03 WARN BlockManager: Block rdd_3_0 could not be removed as it was not found on disk or in memory
> 17/01/05 19:50:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> {code}
> {code:title=job client log}
> 17/01/05 19:50:07 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2, xxx.site, executor 1): java.lang.RuntimeException: Fail to run command: virtualenv_application_1483592608863_0006_1/bin/python -m pip --cache-dir /home/yarn install -r requirements.txt
> 	at org.apache.spark.api.python.PythonWorkerFactory.execCommand(PythonWorkerFactory.scala:142)
> 	at org.apache.spark.api.python.PythonWorkerFactory.setupVirtualEnv(PythonWorkerFactory.scala:128)
> 	at org.apache.spark.api.python.PythonWorkerFactory.<init>(PythonWorkerFactory.scala:70)
> 	at org.apache.spark.SparkEnv$$anonfun$createPythonWorker$1.apply(SparkEnv.scala:117)
> 	at org.apache.spark.SparkEnv$$anonfun$createPythonWorker$1.apply(SparkEnv.scala:117)
> 	at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
> 	at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
> 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
> 	at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
> 	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
> 	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
> 	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
> 	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
> 	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
> 	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> 	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
> 	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
> 	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:99)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org