You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lisbeth Ron (JIRA)" <ji...@apache.org> on 2015/05/06 13:32:00 UTC

[jira] [Updated] (SPARK-7369) Spark Python 1.3.1 Mllib dataframe random forest problem

     [ https://issues.apache.org/jira/browse/SPARK-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lisbeth Ron updated SPARK-7369:
-------------------------------
    Attachment: random_forest_dataframe_spark_30042015.py

Hi Sean,

I still have problems with python spark here are the errors and also the
code that I'm using.

Thanks

Lisbeth



15/05/06 13:14:24 INFO ContextCleaner: Cleaned broadcast 1
15/05/06 13:14:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on node001.ca-innovation.fr:47882 (size: 11.0 KB, free: 8.3 GB)
15/05/06 13:14:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on node006.ca-innovation.fr:50830 (size: 11.0 KB, free: 8.3 GB)
15/05/06 13:14:25 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5,
node001.ca-innovation.fr): java.lang.NullPointerException
        at
org.apache.spark.api.python.SerDeUtil$$anonfun$toJavaArray$1.apply(SerDeUtil.scala:106)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:123)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:114)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:114)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:421)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205)

15/05/06 13:14:25 INFO TaskSetManager: Starting task 0.1 in stage 3.0 (TID
7, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:25 INFO TaskSetManager: Lost task 1.0 in stage 3.0 (TID 6)
on executor node006.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 1]
15/05/06 13:14:25 INFO TaskSetManager: Starting task 1.1 in stage 3.0 (TID
8, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:26 INFO TaskSetManager: Lost task 0.1 in stage 3.0 (TID 7)
on executor node001.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 2]
15/05/06 13:14:26 INFO TaskSetManager: Starting task 0.2 in stage 3.0 (TID
9, node006.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:26 INFO TaskSetManager: Lost task 1.1 in stage 3.0 (TID 8)
on executor node001.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 3]
15/05/06 13:14:26 INFO TaskSetManager: Starting task 1.2 in stage 3.0 (TID
10, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:27 INFO TaskSetManager: Lost task 0.2 in stage 3.0 (TID 9)
on executor node006.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 4]
15/05/06 13:14:27 INFO TaskSetManager: Starting task 0.3 in stage 3.0 (TID
11, node006.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:27 INFO TaskSetManager: Lost task 1.2 in stage 3.0 (TID 10)
on executor node001.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 5]
15/05/06 13:14:27 INFO TaskSetManager: Starting task 1.3 in stage 3.0 (TID
12, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:28 INFO TaskSetManager: Lost task 0.3 in stage 3.0 (TID 11)
on executor node006.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 6]
15/05/06 13:14:28 ERROR TaskSetManager: Task 0 in stage 3.0 failed 4 times;
aborting job
15/05/06 13:14:28 INFO TaskSchedulerImpl: Cancelling stage 3
15/05/06 13:14:28 INFO TaskSchedulerImpl: Stage 3 was cancelled
15/05/06 13:14:28 INFO DAGScheduler: Stage 3 (count at
/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py:79)
failed in 4.025 s
15/05/06 13:14:28 INFO DAGScheduler: Job 3 failed: count at
/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py:79,
took 4.052326 s
Traceback (most recent call last):
  File
"/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py",
line 79, in <module>
    print trainingData.count()
  File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
932, in count
    return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
  File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
923, in sum
    return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
  File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
739, in reduce
    vals = self.mapPartitions(func).collect()
  File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
713, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File
"/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File
"/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage
3.0 (TID 11, node006.ca-innovation.fr): java.lang.NullPointerException
        at
org.apache.spark.api.python.SerDeUtil$$anonfun$toJavaArray$1.apply(SerDeUtil.scala:106)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:123)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:114)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:114)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:421)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)








-- 
*Lisbeth*


> Spark Python 1.3.1 Mllib dataframe random forest problem
> --------------------------------------------------------
>
>                 Key: SPARK-7369
>                 URL: https://issues.apache.org/jira/browse/SPARK-7369
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, PySpark
>    Affects Versions: 1.3.1
>            Reporter: Lisbeth Ron
>              Labels: hadoop
>         Attachments: random_forest_dataframe_spark_30042015.py
>
>
> I'm working with Dataframes to train a random forest with mllib
> and I have this error
>   File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o58.sql.
> somebody can help me...???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org