You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by CodingCat <gi...@git.apache.org> on 2017/01/11 00:26:46 UTC

[GitHub] spark pull request #16542: [SPARK-18905] Fix the issue of removing a failed ...

GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/16542

    [SPARK-18905] Fix the issue of removing a failed jobset from JobScheduler.jobSets

    ## What changes were proposed in this pull request?
    
    the current implementation of Spark streaming considers a batch is completed no matter the results of the jobs (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L203)
    Let's consider the following case:
    A micro batch contains 2 jobs and they read from two different kafka topics respectively. One of these jobs is failed due to some problem in the user defined logic, after the other one is finished successfully.
    1. The main thread in the Spark streaming application will execute the line mentioned above,
    2. and another thread (checkpoint writer) will make a checkpoint file immediately after this line is executed.
    3. Then due to the current error handling mechanism in Spark Streaming, StreamingContext will be closed (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L214)
    the user recovers from the checkpoint file, and because the JobSet containing the failed job has been removed (taken as completed) before the checkpoint is constructed, the data being processed by the failed job would never be reprocessed
    
    This PR fix it by removing jobset from JobScheduler.jobSets only when all jobs in a jobset are successfully finished
    
    ## How was this patch tested?
    
    existing tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-18905

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16542
    
----
commit 24bfa382a43c2fbbf54b24bb8f03766910216490
Author: CodingCat <zh...@gmail.com>
Date:   2016-03-07T14:37:37Z

    improve the doc for "spark.memory.offHeap.size"

commit 2209e345df4636f8fa881b3ad45084b75f9fe3eb
Author: CodingCat <zh...@gmail.com>
Date:   2016-03-07T19:00:16Z

    fix

commit 65623f4408ab6152719046c55093c70435da82c8
Author: Nan Zhu <zh...@gmail.com>
Date:   2017-01-11T00:04:55Z

    Merge branch 'master' of https://github.com/apache/spark

commit a8646acf826cfbbabdf0d20129e62a14be404c3b
Author: Nan Zhu <zh...@gmail.com>
Date:   2017-01-11T00:24:02Z

    do not remove a jobset with any failed job from jobset to prevent data loss

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    **[Test build #71172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71172/testReport)** for PR 16542 at commit [`a8646ac`](https://github.com/apache/spark/commit/a8646acf826cfbbabdf0d20129e62a14be404c3b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16542: [SPARK-18905][STREAMING] Fix the issue of removin...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16542#discussion_r95935489
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala ---
    @@ -200,19 +200,19 @@ class JobScheduler(val ssc: StreamingContext) extends Logging {
         job.setEndTime(completedTime)
         listenerBus.post(StreamingListenerOutputOperationCompleted(job.toOutputOperationInfo))
         logInfo("Finished job " + job.id + " from job set of time " + jobSet.time)
    -    if (jobSet.hasCompleted) {
    -      jobSets.remove(jobSet.time)
    -      jobGenerator.onBatchCompletion(jobSet.time)
    -      logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format(
    -        jobSet.totalDelay / 1000.0, jobSet.time.toString,
    -        jobSet.processingDelay / 1000.0
    -      ))
    -      listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo))
    -    }
         job.result match {
           case Failure(e) =>
             reportError("Error running job " + job, e)
           case _ =>
    +        if (jobSet.hasCompleted) {
    +          jobSets.remove(jobSet.time)
    +          jobGenerator.onBatchCompletion(jobSet.time)
    +          logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format(
    +            jobSet.totalDelay / 1000.0, jobSet.time.toString,
    +            jobSet.processingDelay / 1000.0
    +          ))
    +          listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo))
    --- End diff --
    
    sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    **[Test build #71288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71288/testReport)** for PR 16542 at commit [`465ccc6`](https://github.com/apache/spark/commit/465ccc68368da50579c10fa1daf7f46809411670).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71288/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905] Fix the issue of removing a failed jobset ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    **[Test build #71172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71172/testReport)** for PR 16542 at commit [`a8646ac`](https://github.com/apache/spark/commit/a8646acf826cfbbabdf0d20129e62a14be404c3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16542: [SPARK-18905][STREAMING] Fix the issue of removin...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16542


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905] Fix the issue of removing a failed jobset ...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16542: [SPARK-18905][STREAMING] Fix the issue of removin...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16542#discussion_r95864979
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala ---
    @@ -200,19 +200,19 @@ class JobScheduler(val ssc: StreamingContext) extends Logging {
         job.setEndTime(completedTime)
         listenerBus.post(StreamingListenerOutputOperationCompleted(job.toOutputOperationInfo))
         logInfo("Finished job " + job.id + " from job set of time " + jobSet.time)
    -    if (jobSet.hasCompleted) {
    -      jobSets.remove(jobSet.time)
    -      jobGenerator.onBatchCompletion(jobSet.time)
    -      logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format(
    -        jobSet.totalDelay / 1000.0, jobSet.time.toString,
    -        jobSet.processingDelay / 1000.0
    -      ))
    -      listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo))
    -    }
         job.result match {
           case Failure(e) =>
             reportError("Error running job " + job, e)
           case _ =>
    +        if (jobSet.hasCompleted) {
    +          jobSets.remove(jobSet.time)
    +          jobGenerator.onBatchCompletion(jobSet.time)
    +          logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format(
    +            jobSet.totalDelay / 1000.0, jobSet.time.toString,
    +            jobSet.processingDelay / 1000.0
    +          ))
    +          listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo))
    --- End diff --
    
    Could you also post this event for failure jobSet? Otherwise, the web UI cannot show it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    LGTM. Thanks! Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    **[Test build #71288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71288/testReport)** for PR 16542 at commit [`465ccc6`](https://github.com/apache/spark/commit/465ccc68368da50579c10fa1daf7f46809411670).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16542
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71172/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org