You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/17 08:16:43 UTC

[GitHub] [spark] c21 opened a new pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

c21 opened a new pull request #30395:
URL: https://github.com/apache/spark/pull/30395


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR is to add full outer stream-stream join, and the implementation of full outer join is:
   * For left side input row, check if there's a match on right side state store.
     * if there's a match, output the joined row, o.w. output nothing. Put the row in left side state store.
   * For right side input row, check if there's a match on left side state store.
     * if there's a match, output the joined row, o.w. output nothing. Put the row in right side state store.
   * State store eviction: evict rows from left/right side state store below watermark, and output rows never matched before (a combination of left outer and right outer join).
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Enable more use cases for spark stream-stream join.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Added unit tests in `UnsupportedOperationChecker.scala` and `StreamingJoinSuite.scala`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735184558


   > FYI, during reviewing I found confusing method names: setupWindowedJoinWithRangeCondition / setupWindowedSelfJoin. We should remove Windowed there, but not from this PR as these methods exist.
   
   @HeartSaVioR - sounds good, I will submit a separate PR later tomorrow. thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728854405


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35816/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r533712944



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -409,13 +409,15 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
     streamStreamSupported = false,
     expectedMsg = "is not supported in Update output mode")
 
-  // Full outer joins: only batch-batch is allowed
+  // Full outer joins: stream-batch/batch-stream join are not allowed,
+  // and stream-stream join is allowed 'conditionally' - see below check
   testBinaryOperationInStreamingPlan(
-    "full outer join",
+    "FullOuter join",
     _.join(_, joinType = FullOuter),
     streamStreamSupported = false,

Review comment:
       streamStreamSupported should be true now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 closed pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 closed pull request #30395:
URL: https://github.com/apache/spark/pull/30395


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728766200


   **[Test build #131209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131209/testReport)** for PR 30395 at commit [`9111532`](https://github.com/apache/spark/commit/9111532e2528ec8d017fb2e7dc382e0602e70d6c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736347573


   @c21 Jenkins is down for a couple of days, so we have to rely on Github Action. Closing and reopening PR would work for you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r532009689



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -411,11 +411,12 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
 
   // Full outer joins: only batch-batch is allowed

Review comment:
       I agree this is a bit confusing, and I was confused when I looked at the unit test earlier. The test (`testBinaryOperationInStreamingPlan `) is to check whether `_.join(_, joinType = _)` supports stream-stream or not. For left outer/right outer/left semi/full outer, stream-stream join is not supported in this case as the watermark condition is not defined. See original unit tests for left outer/right outer/left semi [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala#L426-L435). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-731360322


   Asked in dev mailing list as well in http://apache-spark-developers-list.1001551.n3.nabble.com/SS-full-outer-stream-stream-join-td30415.html to make sure we are not missing any voice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728923557






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735494898


   @xuanyuanking , @HeartSaVioR - wondering do you have any more comment? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735526166






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736870625


   Jenkins doesn't seem to work, or struggling to do with queued builds. I'll trigger Github Action instead, and merge once it passes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728768539


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-729403253


   Probably the general question would be this; "Is there a historical reason we didn't implement full outer join?"
   
   The concept is straightforward so the code change should be also straightforward. Just wanted to know if there's specific reason not to implement stream-stream full outer join at the time when we implemented stream-stream left/right outer join, so that we could avoid missing any major considerations on this feature.
   
   cc. to @tdas @jose-torres again here, as they've authored and reviewed the PR #19327


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736318493


   I am seeing the unit test failure `org.apache.spark.storage.BlockManagerDecommissionIntegrationSuite` from https://github.com/apache/spark/pull/30395/checks?check_run_id=1478597516, I don't think the failed one is relevant. @HeartSaVioR - could you help re-trigger the tests? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-729400402


   cc. @tdas @zsxwing @jose-torres @gaborgsomogyi as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728768539






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR closed pull request #30395:
URL: https://github.com/apache/spark/pull/30395


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736858028


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728818055






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735603975






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-729427520


   Just to add my thought, and would like to get opinions from previous authors and reviewers in this area for sure.
   
   * In terms of scalability, reliability, etc, full outer join stores same amount of data in state store, compared to left outer/right outer/inner join. The only difference is full outer join would output more rows (all rows from both stream sides). So I think from system perspective, we probably can support it, similar to other joins.
   
   * The motivation from my side is that, I am investigating the performance difference between spark structured streaming (micro-batch) vs other internal streaming systems, and plan to add some more stuff on top of it to support internal use cases. So it would be good to have a basic full outer join implementation here (merge to upstream instead of forking for each spark version upgrade for me).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736830290


   > @viirya - wondering is my reply above addressing your comment? Thanks.
   
   Yeah, I didn't see your reply before I submitted the review comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-731856289


   Shall we start review on this? It seems that no one raises concern here or mailing list. Would like to get this in before 3.1 deadline if this PR makes sense, thanks. cc @HeartSaVioR .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-731923788


   Spent some time on "Is there a historical reason we didn't implement full outer join?" but also don't know the answer.
   Mind waiting for 1 more day since it's still the weekend for someone's side? +1 to get this in 3.1 if it makes sense.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736925810


   Thanks @HeartSaVioR , @xuanyuanking and @viirya for review, and making it into spark 3.1 !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735603975






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728923557






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r530517789



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
##########
@@ -315,40 +321,17 @@ object UnsupportedOperationChecker extends Logging {
                 throwError(s"$joinType join with a streaming DataFrame/Dataset " +
                   "on the right and a static DataFrame/Dataset on the left is not supported")
               } else if (left.isStreaming && right.isStreaming) {
-                val watermarkInJoinKeys = StreamingJoinHelper.isWatermarkInJoinKeys(subPlan)
-
-                val hasValidWatermarkRange =
-                  StreamingJoinHelper.getStateValueWatermark(
-                    left.outputSet, right.outputSet, condition, Some(1000000)).isDefined
-
-                if (!watermarkInJoinKeys && !hasValidWatermarkRange) {
-                  throwError(
-                    s"Stream-stream $joinType join between two streaming DataFrame/Datasets " +
-                    "is not supported without a watermark in the join keys, or a watermark on " +
-                    "the nullable side and an appropriate range condition")
-                }
+                checkForStreamStreamJoinWatermark(j)
               }
 
             // We support streaming right outer joins with static on the left always, and with
             // stream on both sides under the appropriate conditions.
             case RightOuter =>
               if (left.isStreaming && !right.isStreaming) {
-                throwError("Right outer join with a streaming DataFrame/Dataset on the left and " +
+                throwError(s"$RightOuter join with a streaming DataFrame/Dataset on the left and " +

Review comment:
       ditto

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
##########
@@ -298,8 +298,14 @@ object UnsupportedOperationChecker extends Logging {
               // no further validations needed
 
             case FullOuter =>
-              if (left.isStreaming || right.isStreaming) {
-                throwError("Full outer joins with streaming DataFrames/Datasets are not supported")
+              if (left.isStreaming && !right.isStreaming) {
+                throwError(s"$FullOuter joins with streaming DataFrames/Datasets on the left " +

Review comment:
       nit: do you mean `${FullOuter.sql}` here? I think we can use literal 'FullOuter' here.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala
##########
@@ -333,13 +339,23 @@ case class StreamingSymmetricHashJoinExec(
           stateFormatVersion match {
             case 1 => matchesWithLeftSideState(new UnsafeRowPair(kv.key, kv.value))
             case 2 => kv.matched
-            case _ =>
-              throw new IllegalStateException("Unexpected state format version! " +
-                s"version $stateFormatVersion")
+            case _ => throwBadStateFormatVersionException()
           }
         }.map(pair => joinedRow.withLeft(nullLeft).withRight(pair.value))
 
         hashJoinOutputIter ++ outerOutputIter
+      case FullOuter =>
+        lazy val isKeyToValuePairMatched = (kv: KeyToValuePair) =>
+          stateFormatVersion match {
+            case 2 => kv.matched
+            case _ => throwBadStateFormatVersionException()
+          }
+        val leftSideOutputIter = leftSideJoiner.removeOldState().filterNot(isKeyToValuePairMatched)

Review comment:
       Super nit, let's start the new line in the same position for `left/rightSideOutputIter`
   ```
   val leftSideOutputIter = leftSideJoiner.removeOldState().filterNot(
     isKeyToValuePairMatched).map(pair => joinedRow.withLeft(pair.value).withRight(nullRight))
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
##########
@@ -438,4 +421,32 @@ object UnsupportedOperationChecker extends Logging {
     throw new AnalysisException(
       msg, operator.origin.line, operator.origin.startPosition, Some(operator))
   }
+
+  private def checkForStreamStreamJoinWatermark(join: Join): Unit = {
+    val watermarkInJoinKeys = StreamingJoinHelper.isWatermarkInJoinKeys(join)
+
+    val hasValidWatermarkRange = join.joinType match {

Review comment:
       Let's also keep the comment https://github.com/apache/spark/pull/30395/files#diff-7c879c08d2f379c139d5229a88857229ae69bb48f0a138a3d64e1b2dde3502feL341 in this new method.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala
##########
@@ -382,6 +398,7 @@ case class StreamingSymmetricHashJoinExec(
             leftSideJoiner.removeOldState() ++ rightSideJoiner.removeOldState()
           case LeftOuter => rightSideJoiner.removeOldState()
           case RightOuter => leftSideJoiner.removeOldState()
+          case FullOuter => Iterator.empty

Review comment:
       Let's also add comments for FullOuter in: https://github.com/apache/spark/pull/30395/files#diff-6cd66da710d8d54025c1edf658bbec5230e8b4e748f9f2f884a60b1ba1efed42R395




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728818035


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35812/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728922274


   **[Test build #131213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131213/testReport)** for PR 30395 at commit [`37fa4d3`](https://github.com/apache/spark/commit/37fa4d3b957515289d3ab5a89dff67be143140ae).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728777361


   **[Test build #131213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131213/testReport)** for PR 30395 at commit [`37fa4d3`](https://github.com/apache/spark/commit/37fa4d3b957515289d3ab5a89dff67be143140ae).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736324485


   @viirya Do you have a plan to review this? Then I'll open this for a day more (as actually planned).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r531818134



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -411,11 +411,12 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
 
   // Full outer joins: only batch-batch is allowed

Review comment:
       This comment should be no longer valid - you may want to elaborate here for the reason of  `streamStreamSupported = false`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r530795959



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
##########
@@ -298,8 +298,14 @@ object UnsupportedOperationChecker extends Logging {
               // no further validations needed
 
             case FullOuter =>
-              if (left.isStreaming || right.isStreaming) {
-                throwError("Full outer joins with streaming DataFrames/Datasets are not supported")
+              if (left.isStreaming && !right.isStreaming) {
+                throwError(s"$FullOuter joins with streaming DataFrames/Datasets on the left " +

Review comment:
       @xuanyuanking - both way works for me, changed to `FullOuter` here.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
##########
@@ -315,40 +321,17 @@ object UnsupportedOperationChecker extends Logging {
                 throwError(s"$joinType join with a streaming DataFrame/Dataset " +
                   "on the right and a static DataFrame/Dataset on the left is not supported")
               } else if (left.isStreaming && right.isStreaming) {
-                val watermarkInJoinKeys = StreamingJoinHelper.isWatermarkInJoinKeys(subPlan)
-
-                val hasValidWatermarkRange =
-                  StreamingJoinHelper.getStateValueWatermark(
-                    left.outputSet, right.outputSet, condition, Some(1000000)).isDefined
-
-                if (!watermarkInJoinKeys && !hasValidWatermarkRange) {
-                  throwError(
-                    s"Stream-stream $joinType join between two streaming DataFrame/Datasets " +
-                    "is not supported without a watermark in the join keys, or a watermark on " +
-                    "the nullable side and an appropriate range condition")
-                }
+                checkForStreamStreamJoinWatermark(j)
               }
 
             // We support streaming right outer joins with static on the left always, and with
             // stream on both sides under the appropriate conditions.
             case RightOuter =>
               if (left.isStreaming && !right.isStreaming) {
-                throwError("Right outer join with a streaming DataFrame/Dataset on the left and " +
+                throwError(s"$RightOuter join with a streaming DataFrame/Dataset on the left and " +

Review comment:
       @xuanyuanking - sure, updated.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
##########
@@ -438,4 +421,32 @@ object UnsupportedOperationChecker extends Logging {
     throw new AnalysisException(
       msg, operator.origin.line, operator.origin.startPosition, Some(operator))
   }
+
+  private def checkForStreamStreamJoinWatermark(join: Join): Unit = {
+    val watermarkInJoinKeys = StreamingJoinHelper.isWatermarkInJoinKeys(join)
+
+    val hasValidWatermarkRange = join.joinType match {

Review comment:
       @xuanyuanking - added back. sorry I was missing that in the first place.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala
##########
@@ -333,13 +339,23 @@ case class StreamingSymmetricHashJoinExec(
           stateFormatVersion match {
             case 1 => matchesWithLeftSideState(new UnsafeRowPair(kv.key, kv.value))
             case 2 => kv.matched
-            case _ =>
-              throw new IllegalStateException("Unexpected state format version! " +
-                s"version $stateFormatVersion")
+            case _ => throwBadStateFormatVersionException()
           }
         }.map(pair => joinedRow.withLeft(nullLeft).withRight(pair.value))
 
         hashJoinOutputIter ++ outerOutputIter
+      case FullOuter =>
+        lazy val isKeyToValuePairMatched = (kv: KeyToValuePair) =>
+          stateFormatVersion match {
+            case 2 => kv.matched
+            case _ => throwBadStateFormatVersionException()
+          }
+        val leftSideOutputIter = leftSideJoiner.removeOldState().filterNot(isKeyToValuePairMatched)

Review comment:
       @xuanyuanking - yeah I agree this is too nit, updated anyway.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala
##########
@@ -382,6 +398,7 @@ case class StreamingSymmetricHashJoinExec(
             leftSideJoiner.removeOldState() ++ rightSideJoiner.removeOldState()
           case LeftOuter => rightSideJoiner.removeOldState()
           case RightOuter => leftSideJoiner.removeOldState()
+          case FullOuter => Iterator.empty

Review comment:
       @xuanyuanking - agree, updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735508760


   **[Test build #131944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131944/testReport)** for PR 30395 at commit [`5957b6f`](https://github.com/apache/spark/commit/5957b6fb2c0d0ab6ff850bc1a482e1363ba51943).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735508760


   **[Test build #131944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131944/testReport)** for PR 30395 at commit [`5957b6f`](https://github.com/apache/spark/commit/5957b6fb2c0d0ab6ff850bc1a482e1363ba51943).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736282615


   All current reviewers are OK with this, so I'm kicking the Github Action build again. Once the test passes I'll merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r533718035



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -409,13 +409,15 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
     streamStreamSupported = false,
     expectedMsg = "is not supported in Update output mode")
 
-  // Full outer joins: only batch-batch is allowed
+  // Full outer joins: stream-batch/batch-stream join are not allowed,
+  // and stream-stream join is allowed 'conditionally' - see below check
   testBinaryOperationInStreamingPlan(
-    "full outer join",
+    "FullOuter join",
     _.join(_, joinType = FullOuter),
     streamStreamSupported = false,

Review comment:
       Please see above discussion - https://github.com/apache/spark/pull/30395#discussion_r532009689 . thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728768550


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131209/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-731983848


   I'm sorry "blocking this" wasn't the main rationalization of requesting. My bad.
   
   I'm OoO this week and don't have a time slot to review this change though - not sure I can handle new non-trivial reviews as I have another PRs to handle as well.
   
   @viirya @xuanyuanking @gaborgsomogyi It'd be nice if you could review this. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735598844


   **[Test build #131944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131944/testReport)** for PR 30395 at commit [`5957b6f`](https://github.com/apache/spark/commit/5957b6fb2c0d0ab6ff850bc1a482e1363ba51943).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735526166






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735028698


   FYI, during reviewing I found confusing method names: `setupWindowedJoinWithRangeCondition` / `setupWindowedSelfJoin`. We should remove `Windowed` there, but not from this PR as these methods exist. 
   
   @c21 Would you mind submitting a new MINOR PR for this? Otherwise I'll do it instead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r533723459



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -409,13 +409,15 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
     streamStreamSupported = false,
     expectedMsg = "is not supported in Update output mode")
 
-  // Full outer joins: only batch-batch is allowed
+  // Full outer joins: stream-batch/batch-stream join are not allowed,
+  // and stream-stream join is allowed 'conditionally' - see below check
   testBinaryOperationInStreamingPlan(
-    "full outer join",
+    "FullOuter join",
     _.join(_, joinType = FullOuter),
     streamStreamSupported = false,

Review comment:
       Ok.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734220136


   **[Test build #131835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131835/testReport)** for PR 30395 at commit [`85ec294`](https://github.com/apache/spark/commit/85ec29412f6d03b93f20fe1255ab2c196c04d654).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728804789


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35812/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728818055


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734118917


   **[Test build #131835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131835/testReport)** for PR 30395 at commit [`85ec294`](https://github.com/apache/spark/commit/85ec29412f6d03b93f20fe1255ab2c196c04d654).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-729058432


   cc @viirya 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736822911


   @viirya - wondering is my reply above addressing your comment? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r532310610



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -411,11 +411,12 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
 
   // Full outer joins: only batch-batch is allowed

Review comment:
       The comment was explaining for "all cases" of full outer joins. I know the test is covered but the valid comment becomes no longer valid and we still leave the comment here.
   
   I meant like below:
   
   ```
   // Full outer joins: stream-batch/batch-stream join are not allowed
   // stream-stream full outer join is allowed 'conditionally' - see below check
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-729085703


   cc @HeartSaVioR and @xuanyuanking as well, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-736916334


   OK test passed. Merging to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728777361


   **[Test build #131213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131213/testReport)** for PR 30395 at commit [`37fa4d3`](https://github.com/apache/spark/commit/37fa4d3b957515289d3ab5a89dff67be143140ae).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728854429






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734118917


   **[Test build #131835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131835/testReport)** for PR 30395 at commit [`85ec294`](https://github.com/apache/spark/commit/85ec29412f6d03b93f20fe1255ab2c196c04d654).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728843347


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35816/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728768505


   **[Test build #131209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131209/testReport)** for PR 30395 at commit [`9111532`](https://github.com/apache/spark/commit/9111532e2528ec8d017fb2e7dc382e0602e70d6c).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728818062


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35812/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734151818






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r532310610



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -411,11 +411,12 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper {
 
   // Full outer joins: only batch-batch is allowed

Review comment:
       The comment was explaining for "all cases" of full outer joins. I know the test is covered but the valid comment becomes no longer valid and we still leave the comment here without fixing the comment.
   
   I meant like below:
   
   ```
   // Full outer joins: stream-batch/batch-stream join are not allowed
   // stream-stream full outer join is allowed 'conditionally' - see below check
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728854429






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734151818






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734234494






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-734105253


   @xuanyuanking - addressed all existing comments and the PR is ready for review again, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-728766200


   **[Test build #131209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131209/testReport)** for PR 30395 at commit [`9111532`](https://github.com/apache/spark/commit/9111532e2528ec8d017fb2e7dc382e0602e70d6c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org