You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rekhajoshm <gi...@git.apache.org> on 2017/10/01 07:02:58 UTC

[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

GitHub user rekhajoshm opened a pull request:

    https://github.com/apache/spark/pull/19407

    [SPARK-21667][Streaming] ConsoleSink should not fail streaming query with checkpointLocation option

    ## What changes were proposed in this pull request?
    Fix to allow recovery on console , avoid checkpoint exception
    
    ## How was this patch tested?
    existing tests
    manual tests [ Replicating error and seeing no checkpoint error after fix]

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rekhajoshm/spark SPARK-21667

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19407.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19407
    
----
commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-05-05T23:10:08Z

    Merge pull request #1 from apache/master
    
    Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-05-08T21:49:09Z

    Merge pull request #2 from apache/master
    
    pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-06-22T00:08:08Z

    Merge pull request #3 from apache/master
    
    Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-09-17T01:03:09Z

    Merge pull request #4 from apache/master
    
    Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-11-25T18:50:32Z

    Merge pull request #5 from apache/master
    
    pull request from apache/master

commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi <re...@gmail.com>
Date:   2016-03-18T19:13:51Z

    Merge pull request #6 from apache/master
    
    pull latest from apache spark

commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi <re...@gmail.com>
Date:   2016-04-05T20:26:40Z

    Merge pull request #8 from apache/master
    
    pull latest from apache spark

commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1
Author: Rekha Joshi <re...@gmail.com>
Date:   2017-05-01T23:00:30Z

    Merge pull request #9 from apache/master
    
    Pull apache spark

commit 63d99b3ce5f222d7126133170a373591f0ac67dd
Author: Rekha Joshi <re...@gmail.com>
Date:   2017-09-30T22:26:44Z

    Merge pull request #10 from apache/master
    
    pull latest apache spark

commit 57e0e26474b66afd3bd54be061a5982836e28792
Author: rjoshi2 <re...@gmail.com>
Date:   2017-10-01T06:57:12Z

    [SPARK-21667][Streaming] ConsoleSink should not fail streaming query with checkpointLocation option

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19407


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Thanks! Merging to master and 2.2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    **[Test build #82370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82370/testReport)** for PR 19407 at commit [`57e0e26`](https://github.com/apache/spark/commit/57e0e26474b66afd3bd54be061a5982836e28792).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    **[Test build #83704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83704/testReport)** for PR 19407 at commit [`788fbf3`](https://github.com/apache/spark/commit/788fbf309261f1b003d5047ad4c86039de2fe16e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19407#discussion_r142023140
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
    @@ -269,7 +269,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
         } else {
           val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
             if (source == "console") {
    -          (true, false)
    +          (true, true)
    --- End diff --
    
    Good point @jaceklaskowski updated.thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19407#discussion_r150314230
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
    @@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
             useTempCheckpointLocation = true,
             trigger = trigger)
         } else {
    -      val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
    +      val recoverFromCheckpointLocation = true
    +      val useTempCheckpointLocation =
    --- End diff --
    
    nit: `val useTempCheckpointLocation = source == "console"`
    
    you can just also update the below statement to
    ```
          df.sparkSession.sessionState.streamingQueryManager.startQuery(
            extraOptions.get("queryName"),
            extraOptions.get("checkpointLocation"),
            df,
            dataSource.createSink(outputMode),
            outputMode,
            useTempCheckpointLocation = source == "console",
            recoverFromCheckpointLocation = true,
            trigger = trigger)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82370/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83704/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    **[Test build #82371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82371/testReport)** for PR 19407 at commit [`44b6ce2`](https://github.com/apache/spark/commit/44b6ce23dfc32c19928533ab7d3f6916a4268562).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82371/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    **[Test build #82370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82370/testReport)** for PR 19407 at commit [`57e0e26`](https://github.com/apache/spark/commit/57e0e26474b66afd3bd54be061a5982836e28792).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    **[Test build #83704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83704/testReport)** for PR 19407 at commit [`788fbf3`](https://github.com/apache/spark/commit/788fbf309261f1b003d5047ad4c86039de2fe16e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    **[Test build #82371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82371/testReport)** for PR 19407 at commit [`44b6ce2`](https://github.com/apache/spark/commit/44b6ce23dfc32c19928533ab7d3f6916a4268562).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19407#discussion_r150330668
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
    @@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
             useTempCheckpointLocation = true,
             trigger = trigger)
         } else {
    -      val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
    +      val recoverFromCheckpointLocation = true
    +      val useTempCheckpointLocation =
    --- End diff --
    
    done. thanks @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    LGTM pending tests


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19407#discussion_r142022819
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
    @@ -269,7 +269,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
         } else {
           val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
             if (source == "console") {
    -          (true, false)
    +          (true, true)
    --- End diff --
    
    Is there any source that uses `recoverFromCheckpointLocation` disabled? What's the use case if any?
    
    Remove  `recoverFromCheckpointLocation` here as it's always `true` and make it explicit.
    
    The JIRA issue is to fix the exception followed by cleaning the code that was needed in the past.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19407
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19407#discussion_r151838606
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
    @@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
             useTempCheckpointLocation = true,
             trigger = trigger)
         } else {
    -      val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
    +      val recoverFromCheckpointLocation = true
    +      val useTempCheckpointLocation =
             if (source == "console") {
    -          (true, true)
    +          true
             } else {
    -          (false, true)
    +          false
    --- End diff --
    
    Do we really need it anymore since the `if` expression is just `source == "console"`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org