You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sujith Jay Nair (JIRA)" <ji...@apache.org> on 2018/01/03 09:42:00 UTC
[jira] [Commented] (SPARK-22714) Spark API Not responding when
Fatal exception occurred in event loop
[ https://issues.apache.org/jira/browse/SPARK-22714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309366#comment-16309366 ]
Sujith Jay Nair commented on SPARK-22714:
-----------------------------------------
Hi [~todesking], is this reproducible outside of Spark REPL? Trying to understand if this is specific to Spark shell.
> Spark API Not responding when Fatal exception occurred in event loop
> --------------------------------------------------------------------
>
> Key: SPARK-22714
> URL: https://issues.apache.org/jira/browse/SPARK-22714
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: todesking
> Priority: Critical
>
> To reproduce, let Spark to throw an OOM Exception in event loop:
> {noformat}
> scala> spark.sparkContext.getConf.get("spark.driver.memory")
> res0: String = 1g
> scala> val a = new Array[Int](4 * 1000 * 1000)
> scala> val ds = spark.createDataset(a)
> scala> ds.rdd.zipWithIndex
> [Stage 0:> (0 + 0) / 3]Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space
> [Stage 0:> (0 + 0) / 3]
> // Spark is not responding
> {noformat}
> While not responding, Spark waiting for some Promise, but is never done.
> The promise depends some process in event loop thread, but the thread is dead when Fatal exception is thrown.
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x00007ffc9300b000 nid=0x1703 waiting on condition [0x0000700000216000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007ad978eb8> (a scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:50)
> at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
> at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1292)
> {noformat}
> I don't know how to fix it properly, but it seems we need to add Fatal error handling to EventLoop.run() in core/EventLoop.scala
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org