You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Baoxu Shi (JIRA)" <ji...@apache.org> on 2014/06/22 00:27:24 UTC
[jira] [Updated] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will throw in onStageCompleted

     [ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Baoxu Shi updated SPARK-2228:
-----------------------------

    Description: 
We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. 

This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code.

I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me.

FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean.

https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd

  was:
We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. 

This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code.

I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me.

FYI, here is the test code that could reproduce the problem. I do not see code filed in the system so I put the code on gist.

https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd


> onStageSubmitted does not properly called so NoSuchElement will throw in onStageCompleted
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-2228
>                 URL: https://issues.apache.org/jira/browse/SPARK-2228
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Baoxu Shi
>
> We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. 
> This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code.
> I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me.
> FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean.
> https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd



--
This message was sent by Atlassian JIRA
(v6.2#6252)