You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/29 14:38:01 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

HeartSaVioR opened a new pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747
 
 
   ### What changes were proposed in this pull request?
   
   This patch fixes the bug of UnsafeRow which misses to handle the UDT specifically, in `isFixedLength` and `isMutable`. These methods don't check its SQL type for UDT, treating UDT as variable-length, and non-mutable.
   
   It doesn't bring any issue if UDT is used to represent complicated (non-primitive) type, but when UDT is used to represent some type which is matched with primitive SQL type, it exposes the chance of correctness issues, as these informations represent how the value should be handled.
   
   We got report from user mailing list which suspected as mapGroupsWithState looks like handling UDT incorrectly, but after some investigation it was from GenerateUnsafeRowJoiner in shuffle phase.
   
   https://github.com/apache/spark/blob/0e2ca11d80c3921387d7b077cb64c3a0c06b08d7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala#L32-L43
   
   Here updating position should not happen on fixed-length column, but due to this bug, the value of all UDT would be modified, which actually corrupts the value.
   
   ### Why are the changes needed?
   
   Misclassifying of the type of length for UDT can corrupt the value when the row is presented to the input of GenerateUnsafeRowJoiner, which brings correctness issue.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593271778
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386153349
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -95,6 +95,10 @@ public static int calculateBitSetWidthInBytes(int numFields) {
   }
 
   public static boolean isFixedLength(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isFixedLength(((UserDefinedType) dt).sqlType());
+    }
+
 
 Review comment:
   +1 for the merged form. BTW, can we keep the `DecimalType` as the first `if`? I guess `UserDefinedType` can be `else if` statement in the middle.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386150466
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -103,6 +107,10 @@ public static boolean isFixedLength(DataType dt) {
   }
 
   public static boolean isMutable(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isMutable(((UserDefinedType) dt).sqlType());
+    }
 
 Review comment:
   Same here. Will change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592980980
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119127/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592980783
 
 
   **[Test build #119127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119127/testReport)** for PR 27747 at commit [`0421c6b`](https://github.com/apache/spark/commit/0421c6b30f8cb8e652dac06eb5713e3c3ae1d210).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593421020
 
 
   > Do we have this bug in 2.4?
   
   Yes I confirmed the bug exists in branch-2.3 & branch-2.4, and given both UnsafeRow and GenerateUnsafeRowJoiner are quite old, I guess the bug would exist in old minor versions as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593416396
 
 
   LGTM. Do we have this bug in 2.4?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592952213
 
 
   **[Test build #119127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119127/testReport)** for PR 27747 at commit [`0421c6b`](https://github.com/apache/spark/commit/0421c6b30f8cb8e652dac06eb5713e3c3ae1d210).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593444484
 
 
   It conflicts with 2.4, @HeartSaVioR can you send a new PR for 2.4?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592952309
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23869/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386154074
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
 
 Review comment:
   Actually I missed the existence of SPARK-30986 and created SPARK-30993. I left my issue as it is, because SPARK-30986 is describing only one of issue SPARK-30993 could incur - SPARK-30993 has a broader coverage.
   
   I can maybe add E2E test for SPARK-30986 here (though not sure we want to use two JIRA issues here), but wondering is it preferable to add SS test if it can be reproduced in batch query test. From my side it's easier to add SS test, because SPARK-30986 has it - just need to refine a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386157157
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
+    val schema1 = new StructType(Array(
+      StructField("date", new WrappedDateTimeUDT),
+      StructField("s", StringType),
+      StructField("i", IntegerType)))
+    val proj1 = UnsafeProjection.create(schema1.fields.map(_.dataType))
+    val intRow1 = new GenericInternalRow(Array[Any](
+      LocalDateTime.now().toEpochSecond(ZoneOffset.UTC),
+      UTF8String.fromString("hello"), 1))
+
+    val schema2 = new StructType(Array(StructField("i", IntegerType)))
+    val proj2 = UnsafeProjection.create(schema2.fields.map(_.dataType))
+    val intRow2 = new GenericInternalRow(Array[Any](2))
+
+    testConcat(schema1, proj1.apply(intRow1), schema2, proj2.apply(intRow2))
 
 Review comment:
   I've left comment regarding this. https://github.com/apache/spark/pull/27747#discussion_r386154074
   
   It's no-brainer to have E2E test with SS mapWithState as we already got report from actual code (and I confirmed it fails), but if we would like to have simpler batch query to reproduce I may need to try out.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593245582
 
 
   Just added E2E UT. I'm still hearing voices on the change of UnsafeRow before making changes there, though it seems to be minor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593060735
 
 
   Can you add some end-2-end tests for that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593283035
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593399610
 
 
   **[Test build #119158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119158/testReport)** for PR 27747 at commit [`d724fa4`](https://github.com/apache/spark/commit/d724fa48dc8d9b419e46efd2a9d7858003908245).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593246643
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23895/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593166413
 
 
   Btw I just indicated UserDefinedType is intentionally marked as `private[spark]` as community will bring revised version of UDT in 2.x. Even now we are preparing Spark 3.0 it doesn't look like so.
   
   Given this is reported in user@ mailing list and I can easily google and see SO posts doing this, end users still use this with package hack, because there's no alternative. Even in Spark 1.x it was developer API so we may be able to say we haven't made it public and stable API, but we can't block end users using the API and complain about bugs if we don't provide a good alternative.
   
   Is there any plan to revise UDT, or do we have a good alternative to guide not to use UDT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593431277
 
 
   thanks, merging to master/3.0!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593400543
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386080439
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -95,6 +95,10 @@ public static int calculateBitSetWidthInBytes(int numFields) {
   }
 
   public static boolean isFixedLength(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isFixedLength(((UserDefinedType) dt).sqlType());
+    }
+
 
 Review comment:
   nit: how about this format?
   ```
       if (dt instanceof UserDefinedType) {
         return isFixedLength(((UserDefinedType) dt).sqlType());
       } else if (dt instanceof DecimalType) {
         return ((DecimalType) dt).precision() <= Decimal.MAX_LONG_DIGITS();
       } else {
         return mutableFieldTypes.contains(dt);
       }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593400543
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593283045
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23900/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593246229
 
 
   **[Test build #119153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119153/testReport)** for PR 27747 at commit [`d724fa4`](https://github.com/apache/spark/commit/d724fa48dc8d9b419e46efd2a9d7858003908245).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386157157
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
+    val schema1 = new StructType(Array(
+      StructField("date", new WrappedDateTimeUDT),
+      StructField("s", StringType),
+      StructField("i", IntegerType)))
+    val proj1 = UnsafeProjection.create(schema1.fields.map(_.dataType))
+    val intRow1 = new GenericInternalRow(Array[Any](
+      LocalDateTime.now().toEpochSecond(ZoneOffset.UTC),
+      UTF8String.fromString("hello"), 1))
+
+    val schema2 = new StructType(Array(StructField("i", IntegerType)))
+    val proj2 = UnsafeProjection.create(schema2.fields.map(_.dataType))
+    val intRow2 = new GenericInternalRow(Array[Any](2))
+
+    testConcat(schema1, proj1.apply(intRow1), schema2, proj2.apply(intRow2))
 
 Review comment:
   I've left comment regarding this. https://github.com/apache/spark/pull/27747#discussion_r386154074
   
   It's no-brainer to have E2E test with SS mapWithState as we already got report from actual code (and I confirmed it failed without this patch and it passed with this patch), but if we would like to have simpler batch query to reproduce I may need to try out.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593167469
 
 
   Yes. There were several partial attempts to improve UDTs, but it's not achieved yet. I don't see any roadmap on it. For this part, we can't block end users using the API and filing a bug report. Mostly, the bug report might be closed because it's not officially supported by the community. However, thanks to the bug report and patch contribution like this, we can enhance the situation by fixing some bugs like this.
   > but we can't block end users using the API and complain about bugs if we don't provide a good alternative.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386408647
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##########
 @@ -287,4 +306,22 @@ class UserDefinedTypeSuite extends QueryTest with SharedSparkSession with Parque
     checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
       Seq(Row("array<double>")))
   }
+
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
+    def concatFoo(a: FooWithDate, b: FooWithDate): FooWithDate = {
+      FooWithDate(b.date, a.s + b.s, a.i)
+    }
+
+    UDTRegistration.register(classOf[LocalDateTime].getName, classOf[LocalDateTimeUDT].getName)
+
+    // remove sub-millisecond part as we only use millis based timestamp while serde
+    val date = LocalDateTime.ofEpochSecond(LocalDateTime.now().toEpochSecond(ZoneOffset.UTC),
+      0, ZoneOffset.UTC)
+    val inputDS = List(FooWithDate(date, "Foo", 1), FooWithDate(date, "Foo", 3),
+      FooWithDate(date, "Foo", 3)).toDS()
+    val agg = inputDS.groupByKey(x => x.i).mapGroups((_, iter) => iter.reduce(concatFoo))
+    val result = agg.collect()
+
+    assert(result.toSet === Set(FooWithDate(date, "FooFoo", 3), FooWithDate(date, "Foo", 1)))
 
 Review comment:
   what was the result before?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386155382
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -95,6 +95,10 @@ public static int calculateBitSetWidthInBytes(int numFields) {
   }
 
   public static boolean isFixedLength(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isFixedLength(((UserDefinedType) dt).sqlType());
+    }
+
 
 Review comment:
   Ordering doesn't matter by syntax, but it affects readability.
   
   Assume we are reading the code line by line when UDT parameter comes in here, we can simply find that it does recursive call with sqlType - then we can just go through with its sqlType without going back. If it is placed in the middle, once we read how UDT is handled like this we have to go back on first to follow how the sqlType is handled. That's why I previously put it separately, to represent "early-return".
   
   One alternative approach: maybe it is more verbose, but adding while loop to extract the data type till the data type is UDT would be showing intention cleaner. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386399596
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -95,6 +95,10 @@ public static int calculateBitSetWidthInBytes(int numFields) {
   }
 
   public static boolean isFixedLength(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isFixedLength(((UserDefinedType) dt).sqlType());
+    }
+
 
 Review comment:
   I'm fine with the current code. It's easier to read code when we return earlier for some special cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386080469
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -103,6 +107,10 @@ public static boolean isFixedLength(DataType dt) {
   }
 
   public static boolean isMutable(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isMutable(((UserDefinedType) dt).sqlType());
+    }
 
 Review comment:
   ditto; how about the format, `if ... else ...`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593167885
 
 
   cc @viirya because we was the author of #16478 .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386155382
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -95,6 +95,10 @@ public static int calculateBitSetWidthInBytes(int numFields) {
   }
 
   public static boolean isFixedLength(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isFixedLength(((UserDefinedType) dt).sqlType());
+    }
+
 
 Review comment:
   Ordering doesn't matter by syntax, but it affects readability.
   
   Assume we are reading the code line by line when UDT parameter comes in here, we can simply find that it does recursive call with sqlType - then we can just go through with its sqlType without going back. If it is placed in the middle, once we read how UDT is handled like this we have to go back on first to follow how the sqlType is handled.
   
   One alternative approach: maybe it is more verbose, but adding while loop to extract the data type till the data type is UDT would be showing intention cleaner. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592980979
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592952309
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23869/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593431277
 
 
   thanks, merging to master/3.0/2.4!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593271658
 
 
   **[Test build #119153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119153/testReport)** for PR 27747 at commit [`d724fa4`](https://github.com/apache/spark/commit/d724fa48dc8d9b419e46efd2a9d7858003908245).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593271785
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119153/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593460924
 
 
   Thanks all for reviewing and merging! #27761 is ready for branch-2.4.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593167943
 
 
   Oops. Sorry, I clicked a wrong button mistakenly, @HeartSaVioR .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386156665
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
+    val schema1 = new StructType(Array(
+      StructField("date", new WrappedDateTimeUDT),
+      StructField("s", StringType),
+      StructField("i", IntegerType)))
+    val proj1 = UnsafeProjection.create(schema1.fields.map(_.dataType))
+    val intRow1 = new GenericInternalRow(Array[Any](
+      LocalDateTime.now().toEpochSecond(ZoneOffset.UTC),
+      UTF8String.fromString("hello"), 1))
+
+    val schema2 = new StructType(Array(StructField("i", IntegerType)))
+    val proj2 = UnsafeProjection.create(schema2.fields.map(_.dataType))
+    val intRow2 = new GenericInternalRow(Array[Any](2))
+
+    testConcat(schema1, proj1.apply(intRow1), schema2, proj2.apply(intRow2))
 
 Review comment:
   Can we have end-to-end test for this issue? Is it easily to reproduce it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593573423
 
 
   Thank you all!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593283035
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593246643
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23895/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592952213
 
 
   **[Test build #119127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119127/testReport)** for PR 27747 at commit [`0421c6b`](https://github.com/apache/spark/commit/0421c6b30f8cb8e652dac06eb5713e3c3ae1d210).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593271785
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119153/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593246636
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592980979
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386413634
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##########
 @@ -287,4 +306,22 @@ class UserDefinedTypeSuite extends QueryTest with SharedSparkSession with Parque
     checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
       Seq(Row("array<double>")))
   }
+
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
+    def concatFoo(a: FooWithDate, b: FooWithDate): FooWithDate = {
+      FooWithDate(b.date, a.s + b.s, a.i)
+    }
+
+    UDTRegistration.register(classOf[LocalDateTime].getName, classOf[LocalDateTimeUDT].getName)
+
+    // remove sub-millisecond part as we only use millis based timestamp while serde
+    val date = LocalDateTime.ofEpochSecond(LocalDateTime.now().toEpochSecond(ZoneOffset.UTC),
+      0, ZoneOffset.UTC)
+    val inputDS = List(FooWithDate(date, "Foo", 1), FooWithDate(date, "Foo", 3),
+      FooWithDate(date, "Foo", 3)).toDS()
+    val agg = inputDS.groupByKey(x => x.i).mapGroups((_, iter) => iter.reduce(concatFoo))
+    val result = agg.collect()
+
+    assert(result.toSet === Set(FooWithDate(date, "FooFoo", 3), FooWithDate(date, "Foo", 1)))
 
 Review comment:
   The output without the patch is follow:
   
   ```
   Expected :Set(FooWithDate(2020-03-02T06:05:40,FooFoo,3), FooWithDate(2020-03-02T06:05:40,Foo,1))
   Actual   :Set(FooWithDate(3108-12-26T09:51:48,FooFoo,3), FooWithDate(3108-12-26T09:51:48,Foo,1))
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592952307
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592980980
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119127/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR opened a new pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR opened a new pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747
 
 
   ### What changes were proposed in this pull request?
   
   This patch fixes the bug of UnsafeRow which misses to handle the UDT specifically, in `isFixedLength` and `isMutable`. These methods don't check its SQL type for UDT, always treating UDT as variable-length, and non-mutable.
   
   It doesn't bring any issue if UDT is used to represent complicated type, but when UDT is used to represent some type which is matched with fixed length of SQL type, it exposes the chance of correctness issues, as these informations sometimes decide how the value should be handled.
   
   We got report from user mailing list which suspected as mapGroupsWithState looks like handling UDT incorrectly, but after some investigation it was from GenerateUnsafeRowJoiner in shuffle phase.
   
   https://github.com/apache/spark/blob/0e2ca11d80c3921387d7b077cb64c3a0c06b08d7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala#L32-L43
   
   Here updating position should not happen on fixed-length column, but due to this bug, the value of UDT having fixed-length as sql type would be modified, which actually corrupts the value.
   
   ### Why are the changes needed?
   
   Misclassifying of the type of length for UDT can corrupt the value when the row is presented to the input of GenerateUnsafeRowJoiner, which brings correctness issue.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386155965
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
 
 Review comment:
   Please let me know if we would like to use SPARK-30986 instead - I can simply mark SPARK-30993 as duplicated and change the PR title as well as the code. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593048277
 
 
   As the code looks to be ancient I'm not sure who to cc.
   cc. to @cloud-fan @kiszk @maropu @dongjoon-hyun @HyukjinKwon for broader attention.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593282468
 
 
   **[Test build #119158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119158/testReport)** for PR 27747 at commit [`d724fa4`](https://github.com/apache/spark/commit/d724fa48dc8d9b419e46efd2a9d7858003908245).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593282468
 
 
   **[Test build #119158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119158/testReport)** for PR 27747 at commit [`d724fa4`](https://github.com/apache/spark/commit/d724fa48dc8d9b419e46efd2a9d7858003908245).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593400553
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119158/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593283045
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23900/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-592952307
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593168716
 
 
   > Oops. Sorry, I clicked a wrong button mistakenly,
   
   No worries :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593246229
 
 
   **[Test build #119153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119153/testReport)** for PR 27747 at commit [`d724fa4`](https://github.com/apache/spark/commit/d724fa48dc8d9b419e46efd2a9d7858003908245).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593271778
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593246636
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386153722
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
 
 Review comment:
   BTW, @HeartSaVioR . How do you want to handle SPARK-30986 after SPARK-30993?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386157157
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 ##########
 @@ -99,6 +101,23 @@ class GenerateUnsafeRowJoinerSuite extends SparkFunSuite {
     testConcatOnce(N, N, variable)
   }
 
+  test("SPARK-30993: UserDefinedType matched to fixed length SQL type shouldn't be corrupted") {
+    val schema1 = new StructType(Array(
+      StructField("date", new WrappedDateTimeUDT),
+      StructField("s", StringType),
+      StructField("i", IntegerType)))
+    val proj1 = UnsafeProjection.create(schema1.fields.map(_.dataType))
+    val intRow1 = new GenericInternalRow(Array[Any](
+      LocalDateTime.now().toEpochSecond(ZoneOffset.UTC),
+      UTF8String.fromString("hello"), 1))
+
+    val schema2 = new StructType(Array(StructField("i", IntegerType)))
+    val proj2 = UnsafeProjection.create(schema2.fields.map(_.dataType))
+    val intRow2 = new GenericInternalRow(Array[Any](2))
+
+    testConcat(schema1, proj1.apply(intRow1), schema2, proj2.apply(intRow2))
 
 Review comment:
   I've left comment regarding this. https://github.com/apache/spark/pull/27747#discussion_r386154074
   
   It's no-brainer to have E2E test with SS mapWithState as we already got report from actual code, but if we would like to have simpler batch query to reproduce I may need to try out.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593400553
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119158/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593279836
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#issuecomment-593245582
 
 
   Just added E2E UT. I'm still hearing voices on the change of UnsafeRow before making changes there, though it seems to be minor.
   
   EDIT: Also which JIRA issue we use as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27747: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27747#discussion_r386150444
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##########
 @@ -95,6 +95,10 @@ public static int calculateBitSetWidthInBytes(int numFields) {
   }
 
   public static boolean isFixedLength(DataType dt) {
+    if (dt instanceof UserDefinedType) {
+      return isFixedLength(((UserDefinedType) dt).sqlType());
+    }
+
 
 Review comment:
   I used this to emphasize UDT is handled specially (UDT type vs other types), but it's semantically same and I have no strong voice. Will change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org