You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nan Zhu (JIRA)" <ji...@apache.org> on 2017/06/23 22:55:00 UTC

[jira] [Updated] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

     [ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nan Zhu updated SPARK-21197:
----------------------------
    Summary: Tricky use case makes dead application struggle for a long duration  (was: Tricky use cases makes dead application struggle for a long duration)

> Tricky use case makes dead application struggle for a long duration
> -------------------------------------------------------------------
>
>                 Key: SPARK-21197
>                 URL: https://issues.apache.org/jira/browse/SPARK-21197
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams, Spark Core
>    Affects Versions: 2.0.2, 2.1.1
>            Reporter: Nan Zhu
>
> The use case is in Spark Streaming while the root cause is in DAGScheduler, so I said the component as both of DStreams and Core
> Use case: 
> the user has a thread periodically triggering Spark jobs, and in the same application, they retrieve data through Spark Streaming from somewhere....in the Streaming logic, an exception is thrown so that the whole application is supposed to be shutdown and let YARN restart it...
> The user observed that after the exception is propagated to Spark core and SparkContext.stop() is called, after 18 hours, the application is still running...
> The root cause is that when we call DAGScheduler.stop(), we will wait for eventLoop's thread to finish (https://github.com/apache/spark/blob/03eb6117affcca21798be25706a39e0d5a2f7288/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1704 and https://github.com/apache/spark/blob/03eb6117affcca21798be25706a39e0d5a2f7288/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L40)
> Since there is a thread periodically push events to DAGScheduler's event queue, it will never finish
> a potential solution is that in EventLoop, we should allow interrupt the thread directly for some cases, e.g. this one, and simultaneously allow graceful shutdown for other cases, e.g. ListenerBus one, 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org