You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hieu Tri Huynh (JIRA)" <ji...@apache.org> on 2018/07/31 18:22:00 UTC

[jira] [Created] (SPARK-24981) ShutdownHook timeout causes job to fail when succeeded when SparkContext stop() not called

Hieu Tri Huynh created SPARK-24981:
--------------------------------------

             Summary: ShutdownHook timeout causes job to fail when succeeded when SparkContext stop() not called
                 Key: SPARK-24981
                 URL: https://issues.apache.org/jira/browse/SPARK-24981
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.1
            Reporter: Hieu Tri Huynh


When user does not stop the SparkContext at the end of their program, ShutdownHookManger will stop the SparkContext. However, each shutdown hook is only given 10s to run, it will be interrupted and cancelled after that given time. In case stopping spark context takes longer than 10s, InterruptedException will be thrown, and the job will fail even though it succeeded before. An example of this is shown below.

I think there are a few ways to fix this, below are the 2 ways that I have now: 

1. After user program finished, we can check if user program stoped SparkContext or not. If user didn't stop the SparkContext, we can stop it before finishing the userThread. By doing so, SparkContext.stop() can take as much time as it needed.

2. We can just catch the InterruptedException thrown by ShutdownHookManger while we are stopping the SparkContext, and ignoring all the things that we haven't stopped inside the SparkContext. Since we are shutting down, I think it will be okay to ignore those things.

 
{code:java}
18/07/31 17:11:49 ERROR Utils: Uncaught exception in thread pool-4-thread-1
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1249)
	at java.lang.Thread.join(Thread.java:1323)
	at org.apache.spark.scheduler.AsyncEventQueue.stop(AsyncEventQueue.scala:136)
	at org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219)
	at org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.spark.scheduler.LiveListenerBus.stop(LiveListenerBus.scala:219)
	at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1922)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1921)
	at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:573)
	at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
	at scala.util.Try$.apply(Try.scala:192)
	at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
	at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
18/07/31 17:11:49 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org