You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2018/05/07 03:36:19 UTC
[GitHub] spark pull request #21253: [SPARK-24158][SS] Enabled no-data batches for str...
GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/21253
[SPARK-24158][SS] Enabled no-data batches for streaming joins
## What changes were proposed in this pull request?
This is a continuation of the larger task of enabling zero-data batches for more eager state cleanup. This PR enables it for stream-stream joins.
## How was this patch tested?
- Updated join tests. Additionally, updated them to not use `CheckLastBatch` anywhere to set good precedence for future.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-24158
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21253.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21253
----
commit cb5f55b4622fc8637950013a5f6a7005cecf9a07
Author: Tathagata Das <ta...@...>
Date: 2018-05-03T02:55:35Z
Enabled no-data batches for joins
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3269/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/21253
lgtm
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21253
**[Test build #90690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90690/testReport)** for PR 21253 at commit [`e944069`](https://github.com/apache/spark/commit/e9440691b80c28d789fcb4abb52dae0adf6c7b5e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2973/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90342/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:
https://github.com/apache/spark/pull/21253
jenkins retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:
https://github.com/apache/spark/pull/21253
Thanks @jose-torres. merging this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21253
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90291/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21253
**[Test build #90690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90690/testReport)** for PR 21253 at commit [`e944069`](https://github.com/apache/spark/commit/e9440691b80c28d789fcb4abb52dae0adf6c7b5e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21253
**[Test build #90291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90291/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enabled no-data batches for streaming ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21253
**[Test build #90291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90291/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21253
**[Test build #90342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90342/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...
Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/21253#discussion_r188703325
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala ---
@@ -568,14 +567,16 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
testStream(joined)(
// Test inner part of the join.
MultiAddData(leftInput, 1, 2, 3, 4, 5)(rightInput, 3, 4, 5, 6, 7),
- CheckLastBatch((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
+ CheckNewAnswer((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
+
// Old state doesn't get dropped until the batch *after* it gets introduced, so the
--- End diff --
(here and in other tests)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3017/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21253
**[Test build #90342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90342/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90690/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:
https://github.com/apache/spark/pull/21253
@brkyvz Can you take a look?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...
Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/21253#discussion_r188703005
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala ---
@@ -187,6 +187,12 @@ case class StreamingSymmetricHashJoinExec(
s"${getClass.getSimpleName} should not take $x as the JoinType")
}
+ override def shouldRunAnotherBatch(newMetadata: OffsetSeqMetadata): Boolean = {
+ (stateWatermarkPredicates.left.nonEmpty || stateWatermarkPredicates.right.nonEmpty) &&
+ eventTimeWatermark.isDefined &&
--- End diff --
nit: we should clearly document that this is the watermark before the current batch
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...
Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/21253#discussion_r188703161
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala ---
@@ -568,14 +567,16 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
testStream(joined)(
// Test inner part of the join.
MultiAddData(leftInput, 1, 2, 3, 4, 5)(rightInput, 3, 4, 5, 6, 7),
- CheckLastBatch((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
+ CheckNewAnswer((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
+
// Old state doesn't get dropped until the batch *after* it gets introduced, so the
--- End diff --
this isn't true anymore right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org