You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rekhajoshm <gi...@git.apache.org> on 2017/10/01 07:02:58 UTC
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
GitHub user rekhajoshm opened a pull request:
https://github.com/apache/spark/pull/19407
[SPARK-21667][Streaming] ConsoleSink should not fail streaming query with checkpointLocation option
## What changes were proposed in this pull request?
Fix to allow recovery on console , avoid checkpoint exception
## How was this patch tested?
existing tests
manual tests [ Replicating error and seeing no checkpoint error after fix]
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rekhajoshm/spark SPARK-21667
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19407.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19407
----
commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi <re...@gmail.com>
Date: 2015-05-05T23:10:08Z
Merge pull request #1 from apache/master
Pulling functionality from apache spark
commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi <re...@gmail.com>
Date: 2015-05-08T21:49:09Z
Merge pull request #2 from apache/master
pull latest from apache spark
commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi <re...@gmail.com>
Date: 2015-06-22T00:08:08Z
Merge pull request #3 from apache/master
Pulling functionality from apache spark
commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi <re...@gmail.com>
Date: 2015-09-17T01:03:09Z
Merge pull request #4 from apache/master
Pulling functionality from apache spark
commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi <re...@gmail.com>
Date: 2015-11-25T18:50:32Z
Merge pull request #5 from apache/master
pull request from apache/master
commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi <re...@gmail.com>
Date: 2016-03-18T19:13:51Z
Merge pull request #6 from apache/master
pull latest from apache spark
commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi <re...@gmail.com>
Date: 2016-04-05T20:26:40Z
Merge pull request #8 from apache/master
pull latest from apache spark
commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1
Author: Rekha Joshi <re...@gmail.com>
Date: 2017-05-01T23:00:30Z
Merge pull request #9 from apache/master
Pull apache spark
commit 63d99b3ce5f222d7126133170a373591f0ac67dd
Author: Rekha Joshi <re...@gmail.com>
Date: 2017-09-30T22:26:44Z
Merge pull request #10 from apache/master
pull latest apache spark
commit 57e0e26474b66afd3bd54be061a5982836e28792
Author: rjoshi2 <re...@gmail.com>
Date: 2017-10-01T06:57:12Z
[SPARK-21667][Streaming] ConsoleSink should not fail streaming query with checkpointLocation option
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19407
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19407
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19407
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/19407
Thanks! Merging to master and 2.2.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19407
**[Test build #82370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82370/testReport)** for PR 19407 at commit [`57e0e26`](https://github.com/apache/spark/commit/57e0e26474b66afd3bd54be061a5982836e28792).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19407
**[Test build #83704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83704/testReport)** for PR 19407 at commit [`788fbf3`](https://github.com/apache/spark/commit/788fbf309261f1b003d5047ad4c86039de2fe16e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on a diff in the pull request:
https://github.com/apache/spark/pull/19407#discussion_r142023140
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
@@ -269,7 +269,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
} else {
val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
if (source == "console") {
- (true, false)
+ (true, true)
--- End diff --
Good point @jaceklaskowski updated.thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19407#discussion_r150314230
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
@@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
useTempCheckpointLocation = true,
trigger = trigger)
} else {
- val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
+ val recoverFromCheckpointLocation = true
+ val useTempCheckpointLocation =
--- End diff --
nit: `val useTempCheckpointLocation = source == "console"`
you can just also update the below statement to
```
df.sparkSession.sessionState.streamingQueryManager.startQuery(
extraOptions.get("queryName"),
extraOptions.get("checkpointLocation"),
df,
dataSource.createSink(outputMode),
outputMode,
useTempCheckpointLocation = source == "console",
recoverFromCheckpointLocation = true,
trigger = trigger)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19407
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82370/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19407
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83704/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19407
**[Test build #82371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82371/testReport)** for PR 19407 at commit [`44b6ce2`](https://github.com/apache/spark/commit/44b6ce23dfc32c19928533ab7d3f6916a4268562).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19407
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82371/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19407
**[Test build #82370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82370/testReport)** for PR 19407 at commit [`57e0e26`](https://github.com/apache/spark/commit/57e0e26474b66afd3bd54be061a5982836e28792).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19407
**[Test build #83704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83704/testReport)** for PR 19407 at commit [`788fbf3`](https://github.com/apache/spark/commit/788fbf309261f1b003d5047ad4c86039de2fe16e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19407
**[Test build #82371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82371/testReport)** for PR 19407 at commit [`44b6ce2`](https://github.com/apache/spark/commit/44b6ce23dfc32c19928533ab7d3f6916a4268562).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on a diff in the pull request:
https://github.com/apache/spark/pull/19407#discussion_r150330668
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
@@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
useTempCheckpointLocation = true,
trigger = trigger)
} else {
- val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
+ val recoverFromCheckpointLocation = true
+ val useTempCheckpointLocation =
--- End diff --
done. thanks @zsxwing
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/19407
LGTM pending tests
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/19407#discussion_r142022819
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
@@ -269,7 +269,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
} else {
val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
if (source == "console") {
- (true, false)
+ (true, true)
--- End diff --
Is there any source that uses `recoverFromCheckpointLocation` disabled? What's the use case if any?
Remove `recoverFromCheckpointLocation` here as it's always `true` and make it explicit.
The JIRA issue is to fix the exception followed by cleaning the code that was needed in the past.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19407: [SPARK-21667][Streaming] ConsoleSink should not fail str...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19407
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/19407#discussion_r151838606
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ---
@@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
useTempCheckpointLocation = true,
trigger = trigger)
} else {
- val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
+ val recoverFromCheckpointLocation = true
+ val useTempCheckpointLocation =
if (source == "console") {
- (true, true)
+ true
} else {
- (false, true)
+ false
--- End diff --
Do we really need it anymore since the `if` expression is just `source == "console"`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org