You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mwws <gi...@git.apache.org> on 2016/02/06 06:22:28 UTC

[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

GitHub user mwws opened a pull request:

    https://github.com/apache/spark/pull/11101

    [SPARK-13222][Streaming][WIP]make sure latest status of stateful RDD be checkpointed on gracefulshutdown

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mwws/spark SPARK-CheckpointOnShutdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11101
    
----
commit e1977fe70a95ee646d3b9e6330edf6039687ac06
Author: mwws <we...@intel.com>
Date:   2016-02-06T04:28:47Z

    make sure latest status of stateful RDD be checkpointed on gracefulshutdown

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r52098146
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala ---
    @@ -131,6 +138,10 @@ final private[streaming] class DStreamGraph extends Serializable with Logging {
         logDebug("Cleared old metadata for time " + time)
       }
     
    +  def isCheckpointMissedLastTime() : Boolean = {
    --- End diff --
    
    Nit: remove the space before `:`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-180703548
  
    **[Test build #50863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50863/consoleFull)** for PR 11101 at commit [`4f2efe4`](https://github.com/apache/spark/commit/4f2efe4962c01a83de600d2651463c1fb63038cb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-180696141
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r53118993
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala ---
    @@ -123,6 +126,12 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging {
             timedOut
           }
     
    +      // generate one more bacth to make sure RDD in lastJob is checkpointed.
    +      if (!jobScheduler.receiverTracker.hasUnallocatedBlocks &&
    +        ssc.graph.isCheckpointMissedLastTime) {
    +        Thread.sleep(ssc.graph.batchDuration.milliseconds)
    --- End diff --
    
    What about if the last batch execution takes longer time than the batchDuration? Probably we need a notification with a timeout?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r54167619
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala ---
    @@ -123,6 +126,15 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging {
             timedOut
           }
     
    +      // generate one more batch to make sure RDD in lastJob is checkpointed. As an performance
    +      // optimization, if the latest info has been checkpointed in last batch, there is no need
    +      // to run another round. "isCheckpointMissedLastTime" method here is in charge of collect
    +      // such information from every DStream recursively.
    +      if (!jobScheduler.receiverTracker.hasUnallocatedBlocks &&
    +        ssc.graph.isCheckpointMissedLastTime) {
    +        Thread.sleep(ssc.graph.batchDuration.milliseconds)
    +      }
    --- End diff --
    
    I just wonder if we can add a double check if the last mini batch finish, otherwise, at least we can add a warning log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-180696618
  
    **[Test build #50863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50863/consoleFull)** for PR 11101 at commit [`4f2efe4`](https://github.com/apache/spark/commit/4f2efe4962c01a83de600d2651463c1fb63038cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-185005411
  
    Some of users claims that they will use Kafka, but without the WAL for performance concern, see comments at https://github.com/apache/spark/pull/10252#issuecomment-184515856
    
    People have workaround solution, but it's better if Spark Streaming can support that internally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-180703559
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50863/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-180703558
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-188666648
  
    **[Test build #51947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51947/consoleFull)** for PR 11101 at commit [`cef511c`](https://github.com/apache/spark/commit/cef511c497d664ac33a87b5f1b40d07dfe695cdf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r52098184
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala ---
    @@ -131,6 +138,10 @@ final private[streaming] class DStreamGraph extends Serializable with Logging {
         logDebug("Cleared old metadata for time " + time)
       }
     
    +  def isCheckpointMissedLastTime() : Boolean = {
    +    outputStreams.foldLeft(false)((value, next) => value || next.isCheckpointMissedLastTime)
    --- End diff --
    
    call the `outputStreams.isCheckpointMissedLastTime()` directly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r52098195
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala ---
    @@ -123,6 +126,12 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging {
             timedOut
           }
     
    +      // generate one more bacth to make sure RDD in lastJob is checkpointed.
    +      if (!jobScheduler.receiverTracker.hasUnallocatedBlocks &&
    +        ssc.graph.isCheckpointMissedLastTime) {
    --- End diff --
    
    Can you add more comment to explain why we need to check the `ssc.graph.isCheckpointMissedLastTime`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by mwws <gi...@git.apache.org>.
Github user mwws commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r53335941
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala ---
    @@ -123,6 +126,12 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging {
             timedOut
           }
     
    +      // generate one more bacth to make sure RDD in lastJob is checkpointed.
    +      if (!jobScheduler.receiverTracker.hasUnallocatedBlocks &&
    +        ssc.graph.isCheckpointMissedLastTime) {
    +        Thread.sleep(ssc.graph.batchDuration.milliseconds)
    --- End diff --
    
    If task execution time is longer than batchDuration, it usually means that the streaming application is not healhy and user need to adjust parameters to avoid such scenario.
    
    And the "thread.sleep" here is to make sure a new job could be generated, it's nothing to do with job execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-188689319
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-188688811
  
    **[Test build #51947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51947/consoleFull)** for PR 11101 at commit [`cef511c`](https://github.com/apache/spark/commit/cef511c497d664ac33a87b5f1b40d07dfe695cdf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-188689322
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51947/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r52098138
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala ---
    @@ -100,6 +100,13 @@ final private[streaming] class DStreamGraph extends Serializable with Logging {
     
       def getOutputStreams(): Array[DStream[_]] = this.synchronized { outputStreams.toArray }
     
    +  def readyToShutdown() : Unit = this.synchronized {
    +    this.synchronized {
    --- End diff --
    
    typo? remove the extra `this.synchronized {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-181620645
  
    I'm not sure if this is really helpful. In general, if the user doesn't enable WAL and also doesn't use a replayable source, I don't think the user will care about losing data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by mwws <gi...@git.apache.org>.
Github user mwws commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-183796117
  
    @zsxwing, we have real use case that data integrity is important, but at the same time, we can not afford extra time cost of WAL with limited data processing power/host. (Batch processing time is already very closed to BatchInterval at peak time.) 
    
    And anyway, this PR is making the best effort to avoid data losing with little overhead. I think it's worth to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by mwws <gi...@git.apache.org>.
Github user mwws closed the pull request at:

    https://github.com/apache/spark/pull/11101


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r53417276
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala ---
    @@ -123,6 +126,12 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging {
             timedOut
           }
     
    +      // generate one more bacth to make sure RDD in lastJob is checkpointed.
    +      if (!jobScheduler.receiverTracker.hasUnallocatedBlocks &&
    +        ssc.graph.isCheckpointMissedLastTime) {
    +        Thread.sleep(ssc.graph.batchDuration.milliseconds)
    --- End diff --
    
    ok, I was wondering it will be more graceful if this thread can be woke up as soon as the batch job finished, other than sleep with a fixed length of time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-185790044
  
    Sorry, I mean we'd better try our best to save the stateful data as much as we can, no matters whether if it's the replayable data source or WAL enabled etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r53118838
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -490,6 +494,19 @@ abstract class DStream[T: ClassTag] (
         logDebug("Cleared checkpoint data")
       }
     
    +  private[streaming] def readyToShutdown(): Unit = {
    +    _readyToShutdown = true
    +    dependencies.foreach(_.readyToShutdown())
    +    logDebug("Ready to shutdown")
    --- End diff --
    
    yes I know, I mean you have one https://github.com/apache/spark/pull/11101/files#diff-221bc6301915f7a476786c794b855b21R105 already which print out exactly the same log, probably we can add more information in this log? for example: creationSite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11101#issuecomment-180696142
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50862/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r52098173
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -490,6 +494,19 @@ abstract class DStream[T: ClassTag] (
         logDebug("Cleared checkpoint data")
       }
     
    +  private[streaming] def readyToShutdown(): Unit = {
    +    _readyToShutdown = true
    +    dependencies.foreach(_.readyToShutdown())
    +    logDebug("Ready to shutdown")
    --- End diff --
    
    Will it cause many log output since it's called in a recursive way? Since we have one in `DStreamGraph`, can we remove this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

Posted by mwws <gi...@git.apache.org>.
Github user mwws commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11101#discussion_r52098190
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
    @@ -490,6 +494,19 @@ abstract class DStream[T: ClassTag] (
         logDebug("Cleared checkpoint data")
       }
     
    +  private[streaming] def readyToShutdown(): Unit = {
    +    _readyToShutdown = true
    +    dependencies.foreach(_.readyToShutdown())
    +    logDebug("Ready to shutdown")
    --- End diff --
    
    It's in debug level, and you can refer to other method in this file, they all have the same behavior. I think it should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org