You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alex Sobrino (JIRA)" <ji...@apache.org> on 2015/07/28 10:03:04 UTC

[jira] [Commented] (SPARK-9089) Failing to run simple job on Spark Standalone Cluster

    [ https://issues.apache.org/jira/browse/SPARK-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644033#comment-14644033 ] 

Alex Sobrino commented on SPARK-9089:
-------------------------------------

Hi, we're running into this same issue. Any progress?

> Failing to run simple job on Spark Standalone Cluster
> -----------------------------------------------------
>
>                 Key: SPARK-9089
>                 URL: https://issues.apache.org/jira/browse/SPARK-9089
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 1.4.0
>         Environment: Staging
>            Reporter: Amar Goradia
>            Priority: Critical
>
> We are trying out Spark and as part of that, we have setup Standalone Spark Cluster. As part of testing things out, we simple open PySpark shell and ran this simple job: a=sc.parallelize([1,2,3]).count()
> As a result, we are getting errors. We tried googling around this error but haven't been able to find exact reasoning behind why we are running into this state. Can somebody please help us further look into this issue and advise us on what we are missing here?
> Here is full error stack:
> >>> a=sc.parallelize([1,2,3]).count()
> 15/07/16 00:52:15 INFO SparkContext: Starting job: count at <stdin>:1
> 15/07/16 00:52:15 INFO DAGScheduler: Got job 5 (count at <stdin>:1) with 2 output partitions (allowLocal=false)
> 15/07/16 00:52:15 INFO DAGScheduler: Final stage: ResultStage 5(count at <stdin>:1)
> 15/07/16 00:52:15 INFO DAGScheduler: Parents of final stage: List()
> 15/07/16 00:52:15 INFO DAGScheduler: Missing parents: List()
> 15/07/16 00:52:15 INFO DAGScheduler: Submitting ResultStage 5 (PythonRDD[12] at count at <stdin>:1), which has no missing parents
> 15/07/16 00:52:15 INFO TaskSchedulerImpl: Cancelling stage 5
> 15/07/16 00:52:15 INFO DAGScheduler: ResultStage 5 (count at <stdin>:1) failed in Unknown s
> 15/07/16 00:52:15 INFO DAGScheduler: Job 5 failed: count at <stdin>:1, took 0.004963 s
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 972, in count
>     return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 963, in sum
>     return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 771, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 745, in collect
>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.reflect.InvocationTargetException
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80)
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> org.apache.spark.SparkContext.broadcast(SparkContext.scala:1289)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:884)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org