You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bouke van der Bijl (JIRA)" <ji...@apache.org> on 2014/05/11 00:12:20 UTC

[jira] [Comment Edited] (SPARK-1764) EOF reached before Python server acknowledged

    [ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992994#comment-13992994 ] 

Bouke van der Bijl edited comment on SPARK-1764 at 5/8/14 6:30 PM:
-------------------------------------------------------------------

I can semi-reliably recreate this by just running this code:

```
    while True:
        sc.parallelize(range(100)).map(lambda n: n * 2).collect()
```

Running this on Mesos will eventually crash with 

Py4JJavaError: An error occurred while calling o1142.collect.
: org.apache.spark.SparkException: Job 101 cancelled as part of cancellation of all jobs
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
	at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:998)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:499)
	at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1151)
	at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1147)
	at akka.actor.SupervisorStrategy.handleFailure(FaultHandling.scala:295)
	at akka.actor.dungeon.FaultHandling$class.handleFailure(FaultHandling.scala:253)
	at akka.actor.ActorCell.handleFailure(ActorCell.scala:338)
	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:423)
	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
	at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
	at akka.dispatch.Mailbox.run(Mailbox.scala:218)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I0508 18:29:03.623627  7868 sched.cpp:730] Stopping framework '20140508-173240-16842879-5050-24645-0032'
14/05/08 18:29:04 ERROR OneForOneStrategy: EOF reached before Python server acknowledged
org.apache.spark.SparkException: EOF reached before Python server acknowledged
	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
	at org.apache.spark.Accumulators$.add(Accumulators.scala:277)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


was (Author: bouk):
I can semi-reliably recreate this by just running this code:

    while True:
        sc.parallelize(range(100)).map(lambda n: n * 2).collect()

Running this on Mesos will eventually crash with 

Py4JJavaError: An error occurred while calling o1142.collect.
: org.apache.spark.SparkException: Job 101 cancelled as part of cancellation of all jobs
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
	at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:998)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:499)
	at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1151)
	at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1147)
	at akka.actor.SupervisorStrategy.handleFailure(FaultHandling.scala:295)
	at akka.actor.dungeon.FaultHandling$class.handleFailure(FaultHandling.scala:253)
	at akka.actor.ActorCell.handleFailure(ActorCell.scala:338)
	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:423)
	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
	at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
	at akka.dispatch.Mailbox.run(Mailbox.scala:218)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I0508 18:29:03.623627  7868 sched.cpp:730] Stopping framework '20140508-173240-16842879-5050-24645-0032'
14/05/08 18:29:04 ERROR OneForOneStrategy: EOF reached before Python server acknowledged
org.apache.spark.SparkException: EOF reached before Python server acknowledged
	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
	at org.apache.spark.Accumulators$.add(Accumulators.scala:277)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

> EOF reached before Python server acknowledged
> ---------------------------------------------
>
>                 Key: SPARK-1764
>                 URL: https://issues.apache.org/jira/browse/SPARK-1764
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos, PySpark
>    Affects Versions: 1.0.0
>            Reporter: Bouke van der Bijl
>            Priority: Critical
>              Labels: mesos, pyspark
>
> I'm getting "EOF reached before Python server acknowledged" while using PySpark on Mesos. The error manifests itself in multiple ways. One is:
> 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext
> And the other has a full stacktrace:
> 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server acknowledged
> org.apache.spark.SparkException: EOF reached before Python server acknowledged
> 	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
> 	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
> 	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
> 	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
> 	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> 	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> 	at org.apache.spark.Accumulators$.add(Accumulators.scala:277)
> 	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204)
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> This error causes the SparkContext to shutdown. I have not been able to reliably reproduce this bug, it seems to happen randomly, but if you run enough tasks on a SparkContext it'll hapen eventually



--
This message was sent by Atlassian JIRA
(v6.2#6252)