You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/02 06:02:01 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

HeartSaVioR opened a new pull request #28975:
URL: https://github.com/apache/spark/pull/28975


   ### What changes were proposed in this pull request?
   
   This patch fixes the odd join result being occurred from stream-stream join for state store format V2.
   
   There're some spots on V2 path which leverage UnsafeProjection. As the result row is reused, the row should be copied to avoid changing value during reading (or make sure the caller doesn't affect by such behavior) but `SymmetricHashJoinStateManager.removeByValueCondition` violates the case.
   
   This patch makes `KeyWithIndexToValueRowConverterV2.convertValue` copy the row by itself so that callers don't need to take care about it. This patch doesn't change the behavior of `KeyWithIndexToValueRowConverterV2.convertToValueRow` to avoid double-copying, as the caller is expected to store the row which the implementation of state store will call `copy()`.
   
   This patch adds such behavior into each method doc in `KeyWithIndexToValueRowConverter`, so that further contributors can read through and make sure the change / new addition doesn't break the contract.
   
   ### Why are the changes needed?
   
   Stream-stream join with state store format V2 (newly added in Spark 3.0.0) has a serious correctness bug which brings indeterministic result.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, some of Spark 3.0.0 users using stream-stream join from the new checkpoint (as the bug exists to only v2 format path) may encounter wrong join result. This patch will fix it.
   
   ### How was this patch tested?
   
   Reported case is converted to the new UT, and confirmed UT passed. All UTs in StreamingInnerJoinSuite and StreamingOuterJoinSuite passed as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655792610






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653995518






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652830765


   **[Test build #124875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124875/testReport)** for PR 28975 at commit [`82e5a76`](https://github.com/apache/spark/commit/82e5a7683de5293529f07c23f39004630175dc96).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655336024


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r450042481



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       Yes. That wasn't necessary for format V1 as the original row was stored into state store, and state store (strictly saying, the implementation of HDFS state store provider) makes sure these rows are copied version.
   
   For other places, it can propagate to the callers outside of state manager, and looks like these callers don't need to copy the row. (It's super tricky for me to determine whether the copy is necessary or not, if the code is not in a simple loop or stream.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655146151


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655214843






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654501354


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125126/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653998294






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r450016555



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       so this is the only place to do copy?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655669114


   **[Test build #125389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125389/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654093173


   **[Test build #125051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125051/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655137361


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655464967


   **[Test build #125352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125352/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655337862






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654447988






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653998294






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r450162731



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       Yep, also prefer the first approach personally. As the issue was in v2 format, the first version is a straightforward way. 
   @cloud-fan typo? `... unnecessary copies for v1 format`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653998138


   **[Test build #125017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125017/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655364489






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451332628



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       we need to update it. It's not because of storing, but the caller side updates it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654441334






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653997034


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451837421



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")
+      .withWatermark(s"eventTime", "2 minutes")
+
+    val input2 = MemoryStream[(Timestamp, String, String)]
+    val df2 = input2.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as name")

Review comment:
       Same here as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654440716


   **[Test build #125121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125121/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655337317


   **[Test build #125317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125317/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654018986


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125017/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655332055


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125271/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652825259


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655334516






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654018682


   **[Test build #125017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125017/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655662643


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652831426






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654452637


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655128492


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655486493






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652802512






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655663226






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655856331


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655133391


   **[Test build #125258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125258/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654485646


   https://github.com/apache/spark/pull/29017 is created based on @HeartSaVioR 's reporting.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655145969


   **[Test build #125258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125258/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451336655



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       I'm not sure I get it. The problem occurs when caller reads the value lately and there's "another" interaction with the method in the middle of. I agree the sentence in the source code comment is not clear as well though.
   
   Would it be better if we can rephrase as "... during execution, so that caller can safely read the value in any time" ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655363826


   **[Test build #125332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125332/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653931875


   **[Test build #124932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124932/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655133391


   **[Test build #125258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125258/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652825262


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124869/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652802351


   It's reported as left-outer join specific, but given the possibility of impact I took the representation of "stream-stream join" instead of "left/right outer stream-stream join".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652825071


   **[Test build #124869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124869/testReport)** for PR 28975 at commit [`82e5a76`](https://github.com/apache/spark/commit/82e5a7683de5293529f07c23f39004630175dc96).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449729609



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       The disadvantage for the approach is, given the contract is applied on the interface level, row from format V1 now would get unnecessary copy whereas it doesn't need to. It only applies to the evicted rows and ideally end users would need to go with format V2 (to address correctness on left/right outer join), hope it is acceptable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654454203






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655213867


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-656052456


   Thanks all for reviewing and merging!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655461884


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653998138


   **[Test build #125017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125017/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654531441


   **[Test build #125151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125151/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654093722






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449076851



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala
##########
@@ -295,6 +295,10 @@ case class StreamingSymmetricHashJoinExec(
             postJoinFilter(joinedRow.withLeft(leftKeyValue.value).withRight(rightValue))
           }
         }
+
+        // NOTE: we need to make sure `outerOutputIter` is evaluated "after" exhausting all of
+        // elements in `innerOutputIter`, because evaluation of `innerOutputIter` may update
+        // the match flag which the logic for outer join is relying on.

Review comment:
       Just to clarify: this comment is not related to the bug and just to document an existing assumption?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655581970


   I'm also suffering from flaky executions, hope this round will pass.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451332628



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       we need to update the comment. It's not because of storing, but the caller side updates the row.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655334516






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652830765


   **[Test build #124875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124875/testReport)** for PR 28975 at commit [`82e5a76`](https://github.com/apache/spark/commit/82e5a7683de5293529f07c23f39004630175dc96).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451357986



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       SGTM. I was referring to https://github.com/apache/spark/pull/28975#discussion_r450016555




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451617910



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")

Review comment:
       Any specific reason why not use `select`? I don't see any expression here.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")
+      .withWatermark(s"eventTime", "2 minutes")
+
+    val input2 = MemoryStream[(Timestamp, String, String)]
+    val df2 = input2.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as name")

Review comment:
       Same here.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")
+      .withWatermark(s"eventTime", "2 minutes")
+
+    val input2 = MemoryStream[(Timestamp, String, String)]
+    val df2 = input2.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as name")
+      .withWatermark(s"eventTime", "4 minutes")
+
+    val joined = df1.as("left")
+      .join(df2.as("right"),
+        expr(s"""

Review comment:
       Why string interpolation needed?

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")
+      .withWatermark(s"eventTime", "2 minutes")
+
+    val input2 = MemoryStream[(Timestamp, String, String)]
+    val df2 = input2.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as name")
+      .withWatermark(s"eventTime", "4 minutes")
+
+    val joined = df1.as("left")
+      .join(df2.as("right"),
+        expr(s"""
+                |left.id = right.id AND left.eventTime BETWEEN

Review comment:
       Nit: indent




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654440716


   **[Test build #125121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125121/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654599081


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655137361






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654018983






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655461896


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125332/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653994433


   **[Test build #124932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124932/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449729309



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       Looks like we'd prefer to open maximizing performance for all callers. I changed to not copy whenever possible, with adding NOTE in method comments on all impacting methods. be34258




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654599087


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125151/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655461884






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654318196


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655332044


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653007940






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654501352






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653005805


   **[Test build #124875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124875/testReport)** for PR 28975 at commit [`82e5a76`](https://github.com/apache/spark/commit/82e5a7683de5293529f07c23f39004630175dc96).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655337862






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652802913


   cc. @tdas @zsxwing @jose-torres @gaborgsomogyi @xuanyuanking @uncleGen 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655918148






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655331028


   retest this, please
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655133900






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449297581



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       So the problem occurred here:
   
   https://github.com/apache/spark/blob/42f01e314b4874236544cc8b94bef766269385ee/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala#L252-L281
   
   Here we temporarily store `currentValue` via calling `convertValue` which is candidate to remove (and return). After that we adjust the array a bit to not having a hole in between, where it also triggers `convertValue` and "changes" the value in `currentValue` silently if we don't copy.
   
   It's also possible for caller to call copy() but the row to copy is inside of ValueAndMatchPair so I feel it's pretty much easier to miss even with method doc, so I took safer approach rather than performance-wise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653932192






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654485646


   https://github.com/apache/spark/pull/29017 is created based on @HeartSaVioR 's reporting for https://github.com/apache/spark/pull/28975#issuecomment-654447955 .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449295416



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala
##########
@@ -295,6 +295,10 @@ case class StreamingSymmetricHashJoinExec(
             postJoinFilter(joinedRow.withLeft(leftKeyValue.value).withRight(rightValue))
           }
         }
+
+        // NOTE: we need to make sure `outerOutputIter` is evaluated "after" exhausting all of
+        // elements in `innerOutputIter`, because evaluation of `innerOutputIter` may update
+        // the match flag which the logic for outer join is relying on.

Review comment:
       Yes right.
   
   TBH I suspected this first and crafted a patch including the part with new iterator explicitly runs the logic after evaluating innerOutputIter, and later realized current logic already dealt with this properly, because removeOldState() doesn't materialize the candidates and evaluate lazily. This patch contains minimal change.
   
   Worth to mention how it works for someone who may need to touch here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449638973



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       For `KeyWithIndexToValueRowConverterFormatV1`, we don't have this issue because the `covertToValueRow` didn't change the `currentValue`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654500385


   **[Test build #125126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125126/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655331028


   retest this, please
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654659405


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655129657


   **[Test build #125257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125257/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655792610






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655791996


   **[Test build #125403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125403/testReport)** for PR 28975 at commit [`fb63d7e`](https://github.com/apache/spark/commit/fb63d7ed604e9fc31a960e23e9b00298d5aa8760).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449729609



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       The disadvantage for the approach is, given the contract is applied on the interface level, row from format V1 now would get unnecessary copy whereas it doesn't need to. It only applies to the evicted rows and ideally end users would need to go with format V2, hope it is acceptable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451857092



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")
+      .withWatermark(s"eventTime", "2 minutes")
+
+    val input2 = MemoryStream[(Timestamp, String, String)]
+    val df2 = input2.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as name")
+      .withWatermark(s"eventTime", "4 minutes")
+
+    val joined = df1.as("left")
+      .join(df2.as("right"),
+        expr(s"""

Review comment:
       Ah that's not necessary. Will remove.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654315736


   **[Test build #125051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125051/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654454203






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654531704






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653995518


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655458686


   **[Test build #125332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125332/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654440135


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r450086968



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       After seeing the new changes, I think the first version looks better. The caller sides is nested and we still have unnecessary copies for v1 format. What do you think? @viirya




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655214539


   **[Test build #125271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125271/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654093173


   **[Test build #125051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125051/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653931875


   **[Test build #124932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124932/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654501352


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654598049


   **[Test build #125151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125151/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653995524


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124932/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655330669


   **[Test build #125271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125271/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655332044






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451142047



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       Just reverted the latest commit to leave the history and pick the commit selectively according to the decision.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #28975:
URL: https://github.com/apache/spark/pull/28975


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655214843






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654318196






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654091602


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655652599


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125352/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r450086968



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       After seeing the new changes, I think the first version looks better. The caller sides is nested and we still have necessary copies for v1 format. What do you think? @viirya




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654441334






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451140973



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       OK I'll roll back the change. I'll also leave a commit sha so we can do back and forth depending on the decision.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654018983


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655334279


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655146156


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125258/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655464967


   **[Test build #125352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125352/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655918148






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654093722






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655129657


   **[Test build #125257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125257/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655486493






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449685443



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       Yes. KeyWithIndexToValueRowConverterFormatV1 simply uses the row state store will provide, which is expected to be unique instance (as we copy it when putting to state store).
   
   I've had time to go through callers, and I'm still hesitated to let callers deal with copy, because the row to take care is inside of the ValueAndMatchPair, and some of current callers are even re-packaging it to other class, e.g. KeyWithIndexAndValue for getAll(), so not only direct callers should take care of it but caller of the caller may have to take care of it. It would bring too much complication.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652804302


   **[Test build #124869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124869/testReport)** for PR 28975 at commit [`82e5a76`](https://github.com/apache/spark/commit/82e5a7683de5293529f07c23f39004630175dc96).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655485251


   **[Test build #125317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125317/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655363826


   **[Test build #125332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125332/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655214539


   **[Test build #125271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125271/testReport)** for PR 28975 at commit [`e2201ef`](https://github.com/apache/spark/commit/e2201efc4b1f1f32a7a869935d25d94ade55d2d4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654529896


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655130454






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451859757



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")
+      .withWatermark(s"eventTime", "2 minutes")
+
+    val input2 = MemoryStream[(Timestamp, String, String)]
+    val df2 = input2.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as name")
+      .withWatermark(s"eventTime", "4 minutes")
+
+    val joined = df1.as("left")
+      .join(df2.as("right"),
+        expr(s"""
+                |left.id = right.id AND left.eventTime BETWEEN

Review comment:
       The indentation of """ looks vary on the codebase, and I can find same indentation on the codebase.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449297581



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       So the problem occurred here:
   
   https://github.com/apache/spark/blob/42f01e314b4874236544cc8b94bef766269385ee/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala#L252-L281
   
   Here we temporarily store `currentValue` via calling `convertValue` which is candidate to remove. After that we adjust the array a bit to not having a hole in between, where it also triggers `convertValue` and "changes" the value in `currentValue` silently if we don't copy.
   
   It's also possible for caller to call copy() but the row to copy is inside of ValueAndMatchPair so I feel it's pretty much easier to miss even with method doc, so I took safer approach rather than performance-wise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655133900






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451836861



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala
##########
@@ -996,4 +997,47 @@ class StreamingOuterJoinSuite extends StreamTest with StateStoreMetricsTest with
       )
     }
   }
+
+  test("SPARK-32148 stream-stream join regression on Spark 3.0.0") {
+    val input1 = MemoryStream[(Timestamp, String, String)]
+    val df1 = input1.toDF
+      .selectExpr("_1 as eventTime", "_2 as id", "_3 as comment")

Review comment:
       I guess it's pretty much simpler and more readable than `select('_1.as("eventTime"), '_2.as("id"), '_3.as("comment"))` (or even with col(...) if ' notation doesn't work for _1, _2, _3).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449458290



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       I don't have a strong opinion here. Just for reference: The SMJ copies the input row on-demand: only copy the rows when we need to buffer them. See `SortMergeJoinExec.bufferMatchingRows`. It's only a few places we need to buffer the rows, so it's OK to do it in SMJ. I'm not sure about streaming join though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449689477



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       I recall in Spark SQL, there are some places similar and usually we make it as the caller's responsibility to call copy. Is this a code path critical for performance?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449123103



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       I don't know the details, but how likely the caller side will store the values?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655652588






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r451336655



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       I'm not sure I get it. The problem occurs when caller reads the value lately and there's "another" interaction with the method in the middle of. I agree the sentence is not clear as well though.
   
   Would it be better if we can rephrase as "... during execution, so that caller can safely read the value in any time" ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654447955


   **[Test build #125121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125121/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654447999


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125121/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654531704






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652804302


   **[Test build #124869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124869/testReport)** for PR 28975 at commit [`82e5a76`](https://github.com/apache/spark/commit/82e5a7683de5293529f07c23f39004630175dc96).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655856334


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125389/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655364489






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r449729609



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -451,10 +451,25 @@ class SymmetricHashJoinStateManager(
   }
 
   private trait KeyWithIndexToValueRowConverter {
+    /** Defines the schema of the value row (the value side of K-V in state store). */
     def valueAttributes: Seq[Attribute]
 
+    /**
+     * Convert the value row to (actual value, match) pair.
+     *
+     * NOTE: implementations should ensure the result row is NOT reused during execution, as
+     * caller may use the value to store without copy().

Review comment:
       The disadvantage for the approach is, given the contract is applied on the interface level, row from format V1 now would get unnecessary copy whereas it doesn't need to. It only applies to the evicted rows and ideally end users would need to go with format V2, hope it is OK.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655137371


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125257/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654531441


   **[Test build #125151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125151/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654599081






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654453678


   **[Test build #125126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125126/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655466745






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655652588


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655466745






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652825259






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652829396


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655959230


   thanks, merging to master/3.0!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655130454






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655917157


   **[Test build #125403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125403/testReport)** for PR 28975 at commit [`fb63d7e`](https://github.com/apache/spark/commit/fb63d7ed604e9fc31a960e23e9b00298d5aa8760).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654453678


   **[Test build #125126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125126/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654318205


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125051/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655137323


   **[Test build #125257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125257/testReport)** for PR 28975 at commit [`be34258`](https://github.com/apache/spark/commit/be342583629147c537cda14ed1708c3653000b3f).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655791996


   **[Test build #125403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125403/testReport)** for PR 28975 at commit [`fb63d7e`](https://github.com/apache/spark/commit/fb63d7ed604e9fc31a960e23e9b00298d5aa8760).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652802512






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653007940






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-654447988


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655464076


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655856331






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655663226






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-652831426






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655146151






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-655650724


   **[Test build #125352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125352/testReport)** for PR 28975 at commit [`1c011ab`](https://github.com/apache/spark/commit/1c011abf3c6e18ecdee608f76926e1b31162a5cc).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28975:
URL: https://github.com/apache/spark/pull/28975#issuecomment-653932192






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org