You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ahmed-mahran <gi...@git.apache.org> on 2016/07/12 00:59:00 UTC

[GitHub] spark pull request #14145: [SPARK-16487] [STREAMING] Fix some batches might ...

GitHub user ahmed-mahran opened a pull request:

    https://github.com/apache/spark/pull/14145

    [SPARK-16487] [STREAMING] Fix some batches might not get marked as fully processed in JobGenerator

    ## What changes were proposed in this pull request?
    
    In `JobGenerator`, the code reads like that some batches might not get marked as fully processed. In the following flowchart, the batch should get marked fully processed before endpoint C however it is not. Currently, this does not actually cause an issue, as the condition `(time - zeroTime) is multiple of checkpoint duration?` always evaluates to `true` as the `checkpoint duration` is always set to be equal to the `batch duration`.
    
    ![Flowchart](https://s31.postimg.org/udy9lti2j/spark_streaming_job_generator.png)
    
    This PR fixes this issue so as to improve code readability and to avoid any potential issue in case there is any future change making checkpoint duration to be set different from batch duration.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ahmed-mahran/spark b-mark-batch-fully-processed

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14145
    
----
commit 4df7e367fa11578767f2c2fa23b40fa2cb6714c7
Author: Ahmed Mahran <ah...@mashin.io>
Date:   2016-07-12T00:43:29Z

    Mark uncheckpointed batch as fully processed
    
    The code reads like that some batches might not get marked as fully
    processed. Currently, this does not actually cause an issue, as the
    condition '(time - zeroTime) is multiple of checkpoint duration?'
    always evaluates to true as the checkpoint duration is always set
    to be equal to the batch duration.

commit 585870f04cf93fcc663fd2b910c6bb1df3ef1dc5
Author: Ahmed Mahran <ah...@mashin.io>
Date:   2016-07-12T00:46:38Z

    Fix comment typo

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    **[Test build #62666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62666/consoleFull)** for PR 14145 at commit [`585870f`](https://github.com/apache/spark/commit/585870f04cf93fcc663fd2b910c6bb1df3ef1dc5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    @tdas WDYT for `master`? maybe @zsxwing ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    I think you're right. `doCheckpoint` gets called with `clearCheckpointDataLater = false` when it means "do a checkpoint, but, the batch isn't actually done yet". In those cases, yes you don't mark the batch complete. When `clearCheckpointDataLater = true` it means "checkpoint, but this batch is also done, so you can do more cleanup like cleaning old checkpoints". But yes that means that the batch should be marked completed when `clearCheckpointDataLater = true` in all cases, and if it's not going to actually checkpoint then it's done.
    CC @tdas 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    **[Test build #62666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62666/consoleFull)** for PR 14145 at commit [`585870f`](https://github.com/apache/spark/commit/585870f04cf93fcc663fd2b910c6bb1df3ef1dc5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62666/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14145: [SPARK-16487] [STREAMING] Fix some batches might ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14145


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14145: [SPARK-16487] [STREAMING] Fix some batches might not get...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14145
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org