You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lisbeth Ron (JIRA)" <ji...@apache.org> on 2015/05/06 13:32:00 UTC
[jira] [Updated] (SPARK-7369) Spark Python 1.3.1 Mllib dataframe
random forest problem
[ https://issues.apache.org/jira/browse/SPARK-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lisbeth Ron updated SPARK-7369:
-------------------------------
Attachment: random_forest_dataframe_spark_30042015.py
Hi Sean,
I still have problems with python spark here are the errors and also the
code that I'm using.
Thanks
Lisbeth
15/05/06 13:14:24 INFO ContextCleaner: Cleaned broadcast 1
15/05/06 13:14:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on node001.ca-innovation.fr:47882 (size: 11.0 KB, free: 8.3 GB)
15/05/06 13:14:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on node006.ca-innovation.fr:50830 (size: 11.0 KB, free: 8.3 GB)
15/05/06 13:14:25 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5,
node001.ca-innovation.fr): java.lang.NullPointerException
at
org.apache.spark.api.python.SerDeUtil$$anonfun$toJavaArray$1.apply(SerDeUtil.scala:106)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:123)
at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:114)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:114)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:421)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205)
15/05/06 13:14:25 INFO TaskSetManager: Starting task 0.1 in stage 3.0 (TID
7, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:25 INFO TaskSetManager: Lost task 1.0 in stage 3.0 (TID 6)
on executor node006.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 1]
15/05/06 13:14:25 INFO TaskSetManager: Starting task 1.1 in stage 3.0 (TID
8, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:26 INFO TaskSetManager: Lost task 0.1 in stage 3.0 (TID 7)
on executor node001.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 2]
15/05/06 13:14:26 INFO TaskSetManager: Starting task 0.2 in stage 3.0 (TID
9, node006.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:26 INFO TaskSetManager: Lost task 1.1 in stage 3.0 (TID 8)
on executor node001.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 3]
15/05/06 13:14:26 INFO TaskSetManager: Starting task 1.2 in stage 3.0 (TID
10, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:27 INFO TaskSetManager: Lost task 0.2 in stage 3.0 (TID 9)
on executor node006.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 4]
15/05/06 13:14:27 INFO TaskSetManager: Starting task 0.3 in stage 3.0 (TID
11, node006.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:27 INFO TaskSetManager: Lost task 1.2 in stage 3.0 (TID 10)
on executor node001.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 5]
15/05/06 13:14:27 INFO TaskSetManager: Starting task 1.3 in stage 3.0 (TID
12, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes)
15/05/06 13:14:28 INFO TaskSetManager: Lost task 0.3 in stage 3.0 (TID 11)
on executor node006.ca-innovation.fr: java.lang.NullPointerException (null)
[duplicate 6]
15/05/06 13:14:28 ERROR TaskSetManager: Task 0 in stage 3.0 failed 4 times;
aborting job
15/05/06 13:14:28 INFO TaskSchedulerImpl: Cancelling stage 3
15/05/06 13:14:28 INFO TaskSchedulerImpl: Stage 3 was cancelled
15/05/06 13:14:28 INFO DAGScheduler: Stage 3 (count at
/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py:79)
failed in 4.025 s
15/05/06 13:14:28 INFO DAGScheduler: Job 3 failed: count at
/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py:79,
took 4.052326 s
Traceback (most recent call last):
File
"/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py",
line 79, in <module>
print trainingData.count()
File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
932, in count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
923, in sum
return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
739, in reduce
vals = self.mapPartitions(func).collect()
File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line
713, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File
"/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
File
"/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage
3.0 (TID 11, node006.ca-innovation.fr): java.lang.NullPointerException
at
org.apache.spark.api.python.SerDeUtil$$anonfun$toJavaArray$1.apply(SerDeUtil.scala:106)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:123)
at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:114)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:114)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:421)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
--
*Lisbeth*
> Spark Python 1.3.1 Mllib dataframe random forest problem
> --------------------------------------------------------
>
> Key: SPARK-7369
> URL: https://issues.apache.org/jira/browse/SPARK-7369
> Project: Spark
> Issue Type: Bug
> Components: MLlib, PySpark
> Affects Versions: 1.3.1
> Reporter: Lisbeth Ron
> Labels: hadoop
> Attachments: random_forest_dataframe_spark_30042015.py
>
>
> I'm working with Dataframes to train a random forest with mllib
> and I have this error
> File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o58.sql.
> somebody can help me...???
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org