You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2018/05/07 03:36:19 UTC

[GitHub] spark pull request #21253: [SPARK-24158][SS] Enabled no-data batches for str...

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/21253

    [SPARK-24158][SS] Enabled no-data batches for streaming joins

    ## What changes were proposed in this pull request?
    
    This is a continuation of the larger task of enabling zero-data batches for more eager state cleanup. This PR enables it for stream-stream joins. 
    
    ## How was this patch tested?
    - Updated join tests. Additionally, updated them to not use `CheckLastBatch` anywhere to set good precedence for future.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-24158

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21253
    
----
commit cb5f55b4622fc8637950013a5f6a7005cecf9a07
Author: Tathagata Das <ta...@...>
Date:   2018-05-03T02:55:35Z

    Enabled no-data batches for joins

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3269/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    lgtm


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    **[Test build #90690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90690/testReport)** for PR 21253 at commit [`e944069`](https://github.com/apache/spark/commit/e9440691b80c28d789fcb4abb52dae0adf6c7b5e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2973/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90342/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    jenkins retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Thanks @jose-torres. merging this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21253


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90291/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    **[Test build #90690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90690/testReport)** for PR 21253 at commit [`e944069`](https://github.com/apache/spark/commit/e9440691b80c28d789fcb4abb52dae0adf6c7b5e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    **[Test build #90291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90291/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enabled no-data batches for streaming ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    **[Test build #90291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90291/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    **[Test build #90342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90342/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21253#discussion_r188703325
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala ---
    @@ -568,14 +567,16 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
         testStream(joined)(
           // Test inner part of the join.
           MultiAddData(leftInput, 1, 2, 3, 4, 5)(rightInput, 3, 4, 5, 6, 7),
    -      CheckLastBatch((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
    +      CheckNewAnswer((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
    +
           // Old state doesn't get dropped until the batch *after* it gets introduced, so the
    --- End diff --
    
    (here and in other tests)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3017/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    **[Test build #90342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90342/testReport)** for PR 21253 at commit [`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90690/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21253
  
    @brkyvz Can you take a look?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21253#discussion_r188703005
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala ---
    @@ -187,6 +187,12 @@ case class StreamingSymmetricHashJoinExec(
             s"${getClass.getSimpleName} should not take $x as the JoinType")
       }
     
    +  override def shouldRunAnotherBatch(newMetadata: OffsetSeqMetadata): Boolean = {
    +    (stateWatermarkPredicates.left.nonEmpty || stateWatermarkPredicates.right.nonEmpty) &&
    +      eventTimeWatermark.isDefined &&
    --- End diff --
    
    nit: we should clearly document that this is the watermark before the current batch


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21253: [SPARK-24158][SS] Enable no-data batches for stre...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21253#discussion_r188703161
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala ---
    @@ -568,14 +567,16 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
         testStream(joined)(
           // Test inner part of the join.
           MultiAddData(leftInput, 1, 2, 3, 4, 5)(rightInput, 3, 4, 5, 6, 7),
    -      CheckLastBatch((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
    +      CheckNewAnswer((3, 10, 6, 9), (4, 10, 8, 12), (5, 10, 10, 15)),
    +
           // Old state doesn't get dropped until the batch *after* it gets introduced, so the
    --- End diff --
    
    this isn't true anymore right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org