You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/08 00:15:03 UTC

[GitHub] [spark] bersprockets opened a new pull request #35430: [SPARK-38133][SQL] UnsafeRow should treat TIMESTAMP_NTZ as mutable and fixed width

bersprockets opened a new pull request #35430:
URL: https://github.com/apache/spark/pull/35430


   ### What changes were proposed in this pull request?
   
   Add TimestampNTZType to UnsafeRow's list of mutable fields.
   
   ### Why are the changes needed?
   
   Assume this data:
   ```
   create or replace temp view v1 as
   select * from values
     (1, timestamp_ntz'2012-01-01 00:00:00', 10000),
     (2, timestamp_ntz'2012-01-01 00:00:00', 20000),
     (1, timestamp_ntz'2012-01-01 00:00:00', 5000),
     (1, timestamp_ntz'2013-01-01 00:00:00', 48000),
     (2, timestamp_ntz'2013-01-01 00:00:00', 30000)
     as data(a, b, c);
   ```
   The following query produces incorrect results:
   ```
   select *
   from v1
   pivot (
     sum(c)
     for a in (1, 2)
   );
   ```
   The timestamp_ntz values are corrupted:
   ```
   2012-01-01 19:05:19.476736	15000	20000
   2013-01-01 19:05:19.476736	48000	30000
   ```
   Because `UnsafeRow.isFixedLength` returns `false` for data type `TimestampNTZType`, `GenerateUnsafeRowJoiner` generates code for the `TIMESTAMP_NTZ` field as though it was a variable length field (it adds an offset to the Long value, thus corrupting the timestamp value).
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35430: [SPARK-38133][SQL] UnsafeRow should treat TIMESTAMP_NTZ as mutable and fixed width

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35430:
URL: https://github.com/apache/spark/pull/35430#issuecomment-1032092575


   cc @sunchao , @viirya , @huaxingao too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on a change in pull request #35430: [SPARK-38133][SQL] UnsafeRow should treat TIMESTAMP_NTZ as mutable and fixed width

Posted by GitBox <gi...@apache.org>.
bersprockets commented on a change in pull request #35430:
URL: https://github.com/apache/spark/pull/35430#discussion_r801168017



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala
##########
@@ -187,7 +187,7 @@ sealed trait BufferSetterGetterUtils {
               row.setNullAt(ordinal)
             }
 
-        case TimestampType =>
+        case TimestampType | TimestampNTZType =>

Review comment:
       This change is unrelated to the bug. However, once I added `TimestampNTZType` to `UnsafeRow.mutableFieldTypes`, the several UDAF-related unit tests in `HashAggregationQuerySuite` pick up this data type and try to test it. However, BufferSetterGetterUtils could not handle this data type (and actually, now that I look, I made the change only for the setter, not the getter).
   
   Alternatively, I could filter out TimestampNTZType in the two HashAggregationQuerySuite unit tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35430: [SPARK-38133][SQL] UnsafeRow should treat TIMESTAMP_NTZ as mutable and fixed width

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35430:
URL: https://github.com/apache/spark/pull/35430#issuecomment-1032101918


   Thanks for cc'ing me. @gengliangwang too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on a change in pull request #35430: [SPARK-38133][SQL] UnsafeRow should treat TIMESTAMP_NTZ as mutable and fixed width

Posted by GitBox <gi...@apache.org>.
bersprockets commented on a change in pull request #35430:
URL: https://github.com/apache/spark/pull/35430#discussion_r801168017



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala
##########
@@ -187,7 +187,7 @@ sealed trait BufferSetterGetterUtils {
               row.setNullAt(ordinal)
             }
 
-        case TimestampType =>
+        case TimestampType | TimestampNTZType =>

Review comment:
       This change is unrelated to the bug. However, since I added `TimestampNTZType` to `UnsafeRow.mutableFieldTypes`, two UDAF-related unit tests in `HashAggregationQuerySuite` pick up this data type from that list and try to test it. However, BufferSetterGetterUtils could not handle this data type (and actually, now that I look, I made the change only for the setter, not the getter).
   
   Alternatively, I could filter out TimestampNTZType in the two HashAggregationQuerySuite unit tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #35430: [SPARK-38133][SQL] UnsafeRow should treat TIMESTAMP_NTZ as mutable and fixed width

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #35430:
URL: https://github.com/apache/spark/pull/35430


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org