You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/12/20 09:50:58 UTC

[jira] [Resolved] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

     [ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-18881.
-------------------------------
    Resolution: Duplicate

> Spark never finishes jobs and stages, JobProgressListener fails
> ---------------------------------------------------------------
>
>                 Key: SPARK-18881
>                 URL: https://issues.apache.org/jira/browse/SPARK-18881
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.2
>         Environment: yarn, deploy-mode = client
>            Reporter: Mathieu D
>
> We have a Spark application that process continuously a lot of incoming jobs. Several jobs are processed in parallel, on multiple threads.
> During intensive workloads, at some point, we start to have hundreds of  warnings like this :
> {code}
> 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
> 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
> 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
> 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
> 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
> {code}
> Starting from that, the performance of the app plummet, most of Stages and Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs.
> I can't see clearly another related exception happening before. Maybe this one, but it concerns another listener :
> {code}
> 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
> 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 01:00:00 CET 1970
> {code}
> This is very problematic for us, since it's hard to detect, and requires an app restart.
> *EDIT :*
> I confirm the sequence :
> 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue
> then
> 2- JobProgressListener losing track of job and stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org