You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Haopu Wang <HW...@qilinsoft.com> on 2014/07/10 10:21:18 UTC

All of the tasks have been completed but the Stage is still shown as "Active"?

I'm running an App for hours in a standalone cluster. From the data
injector and "Streaming" tab of web ui, it's running well.

However, I see quite a lot of Active stages in web ui even some of them
have all of their tasks completed.

I attach a screenshot for your reference.

Do you ever see this kind of behavior?


Re: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Surendranauth Hiraman <su...@velos.io>.
History Server is also very helpful.



On Thu, Jul 10, 2014 at 7:37 AM, Haopu Wang <HW...@qilinsoft.com> wrote:

>  I didn't keep the driver's log. It's a lesson.
>
> I will try to run it again to see if it happens again.
>
>
>  ------------------------------
>
> *From:* Tathagata Das [mailto:tathagata.das1565@gmail.com]
> *Sent:* 2014年7月10日 17:29
> *To:* user@spark.apache.org
> *Subject:* Re: All of the tasks have been completed but the Stage is
> still shown as "Active"?
>
>
>
> Do you see any errors in the logs of the driver?
>
>
>
> On Thu, Jul 10, 2014 at 1:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
>
> I'm running an App for hours in a standalone cluster. From the data
> injector and "Streaming" tab of web ui, it's running well.
>
> However, I see quite a lot of Active stages in web ui even some of them
> have all of their tasks completed.
>
> I attach a screenshot for your reference.
>
> Do you ever see this kind of behavior?
>
>
>



-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <su...@sociocast.com>elos.io
W: www.velos.io

Re: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Tathagata Das <ta...@gmail.com>.
Seems like it is related. Possibly those PRs that Andrew mentioned are
going to fix this issue.


On Fri, Jul 11, 2014 at 5:51 AM, Haopu Wang <HW...@qilinsoft.com> wrote:

>   I saw some exceptions like this in driver log. Can you shed some
> lights? Is it related with the behaviour?
>
>
>
> 14/07/11 20:40:09 ERROR LiveListenerBus: Listener JobProgressListener
> threw an exception
>
> java.util.NoSuchElementException: key not found: 64019
>
>          at scala.collection.MapLike$class.default(MapLike.scala:228)
>
>          at scala.collection.AbstractMap.default(Map.scala:58)
>
>          at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>
>          at
> org.apache.spark.ui.jobs.JobProgressListener.onStageCompleted(JobProgressListener.scala:78)
>
>          at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
>
>          at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
>
>          at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>
>          at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
>
>          at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>          at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>          at
> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
>
>          at
> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:48)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>
>          at scala.Option.foreach(Option.scala:236)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>
>          at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
>
>          at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
>
>
>  ------------------------------
>
> *From:* Haopu Wang
> *Sent:* Thursday, July 10, 2014 7:38 PM
> *To:* user@spark.apache.org
> *Subject:* RE: All of the tasks have been completed but the Stage is
> still shown as "Active"?
>
>
>
> I didn't keep the driver's log. It's a lesson.
>
> I will try to run it again to see if it happens again.
>
>
>  ------------------------------
>
> *From:* Tathagata Das [mailto:tathagata.das1565@gmail.com]
> *Sent:* 2014年7月10日 17:29
> *To:* user@spark.apache.org
> *Subject:* Re: All of the tasks have been completed but the Stage is
> still shown as "Active"?
>
>
>
> Do you see any errors in the logs of the driver?
>
>
>
> On Thu, Jul 10, 2014 at 1:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
>
> I'm running an App for hours in a standalone cluster. From the data
> injector and "Streaming" tab of web ui, it's running well.
>
> However, I see quite a lot of Active stages in web ui even some of them
> have all of their tasks completed.
>
> I attach a screenshot for your reference.
>
> Do you ever see this kind of behavior?
>
>
>

RE: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Haopu Wang <HW...@qilinsoft.com>.
I saw some exceptions like this in driver log. Can you shed some lights? Is it related with the behaviour?

 

14/07/11 20:40:09 ERROR LiveListenerBus: Listener JobProgressListener threw an exception

java.util.NoSuchElementException: key not found: 64019

         at scala.collection.MapLike$class.default(MapLike.scala:228)

         at scala.collection.AbstractMap.default(Map.scala:58)

         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)

         at org.apache.spark.ui.jobs.JobProgressListener.onStageCompleted(JobProgressListener.scala:78)

         at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)

         at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)

         at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)

         at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)

         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

         at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)

         at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:48)

         at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)

         at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

         at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

         at scala.Option.foreach(Option.scala:236)

         at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)

         at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

         at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)

         at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

 

________________________________

From: Haopu Wang 
Sent: Thursday, July 10, 2014 7:38 PM
To: user@spark.apache.org
Subject: RE: All of the tasks have been completed but the Stage is still shown as "Active"?

 

I didn't keep the driver's log. It's a lesson.

I will try to run it again to see if it happens again.

 

________________________________

From: Tathagata Das [mailto:tathagata.das1565@gmail.com] 
Sent: 2014年7月10日 17:29
To: user@spark.apache.org
Subject: Re: All of the tasks have been completed but the Stage is still shown as "Active"?

 

Do you see any errors in the logs of the driver?

 

On Thu, Jul 10, 2014 at 1:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:

I'm running an App for hours in a standalone cluster. From the data
injector and "Streaming" tab of web ui, it's running well.

However, I see quite a lot of Active stages in web ui even some of them
have all of their tasks completed.

I attach a screenshot for your reference.

Do you ever see this kind of behavior?

 


RE: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Haopu Wang <HW...@qilinsoft.com>.
I didn't keep the driver's log. It's a lesson.

I will try to run it again to see if it happens again.

 

________________________________

From: Tathagata Das [mailto:tathagata.das1565@gmail.com] 
Sent: 2014年7月10日 17:29
To: user@spark.apache.org
Subject: Re: All of the tasks have been completed but the Stage is still shown as "Active"?

 

Do you see any errors in the logs of the driver?

 

On Thu, Jul 10, 2014 at 1:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:

I'm running an App for hours in a standalone cluster. From the data
injector and "Streaming" tab of web ui, it's running well.

However, I see quite a lot of Active stages in web ui even some of them
have all of their tasks completed.

I attach a screenshot for your reference.

Do you ever see this kind of behavior?

 


Re: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Tathagata Das <ta...@gmail.com>.
Do you see any errors in the logs of the driver?


On Thu, Jul 10, 2014 at 1:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:

> I'm running an App for hours in a standalone cluster. From the data
> injector and "Streaming" tab of web ui, it's running well.
>
> However, I see quite a lot of Active stages in web ui even some of them
> have all of their tasks completed.
>
> I attach a screenshot for your reference.
>
> Do you ever see this kind of behavior?
>
>

Re: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by "anthonyjschulte@gmail.com" <an...@gmail.com>.
Similarly, I am seeing tasks moved to the "completed" section which
apparently haven't finished all elements... (succeeded/total < 1)... is this
related?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/All-of-the-tasks-have-been-completed-but-the-Stage-is-still-shown-as-Active-tp9274p11725.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Andrew Or <an...@databricks.com>.
Yes, there are a few bugs in the UI in the event of a node failure.

The duplicated stages in both the active and completed tables should be
fixed by this PR: https://github.com/apache/spark/pull/1262
The fact that the progress bar on the stages page displays an overflow
(e.g. 5/4) is still an open issue, but a related PR fixed the tasks page
side of it: https://github.com/apache/spark/pull/1236 (merged)

Keep reporting any additional anomalies you observe (or better yet, file a
JIRA here <https://issues.apache.org/jira/browse/SPARK>)!


2014-07-10 7:09 GMT-07:00 Daniel Siegmann <da...@velos.io>:

> One thing to keep in mind is that the progress bar doesn't take into
> account tasks which are rerun. If you see 4/4 but the stage is still
> active, click the stage name and look at the task list. That will show you
> if any are actually running. When rerun tasks complete, it can result in
> the number of successful tasks being greater than the number of total
> tasks; e.g. the progress bar might display 5/4.
>
> Another bug is that a stage might complete and be moved to the completed
> list, but if tasks are then rerun it will appear in both the completed and
> active stages list. If it completes again, you will see that stage *twice*
> in the completed stages list.
>
> Of course, you should only be seeing this behavior if things are going
> wrong; a node failing, for example.
>
>
> On Thu, Jul 10, 2014 at 4:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
>
>> I'm running an App for hours in a standalone cluster. From the data
>> injector and "Streaming" tab of web ui, it's running well.
>>
>> However, I see quite a lot of Active stages in web ui even some of them
>> have all of their tasks completed.
>>
>> I attach a screenshot for your reference.
>>
>> Do you ever see this kind of behavior?
>>
>>
>
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: daniel.siegmann@velos.io W: www.velos.io
>

Re: All of the tasks have been completed but the Stage is still shown as "Active"?

Posted by Daniel Siegmann <da...@velos.io>.
One thing to keep in mind is that the progress bar doesn't take into
account tasks which are rerun. If you see 4/4 but the stage is still
active, click the stage name and look at the task list. That will show you
if any are actually running. When rerun tasks complete, it can result in
the number of successful tasks being greater than the number of total
tasks; e.g. the progress bar might display 5/4.

Another bug is that a stage might complete and be moved to the completed
list, but if tasks are then rerun it will appear in both the completed and
active stages list. If it completes again, you will see that stage *twice*
in the completed stages list.

Of course, you should only be seeing this behavior if things are going
wrong; a node failing, for example.


On Thu, Jul 10, 2014 at 4:21 AM, Haopu Wang <HW...@qilinsoft.com> wrote:

> I'm running an App for hours in a standalone cluster. From the data
> injector and "Streaming" tab of web ui, it's running well.
>
> However, I see quite a lot of Active stages in web ui even some of them
> have all of their tasks completed.
>
> I attach a screenshot for your reference.
>
> Do you ever see this kind of behavior?
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io W: www.velos.io