You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "gengliangwang (via GitHub)" <gi...@apache.org> on 2024/03/18 18:57:37 UTC

[PR] [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ [spark]

gengliangwang opened a new pull request, #45571:
URL: https://github.com/apache/spark/pull/45571

### What changes were proposed in this pull request?

Currently, Parquet TimestampNTZ type columns can be read as TimestampLTZ, while reading TimestampLTZ as TimestampNTZ will cause errors. This makes it impossible to read parquet files containing both TimestampLTZ and TimestampNTZ as TimestampNTZ.

To make the data type system on Parquet simpler, this PR allows reading TimestampLTZ as TimestampNTZ in the Parquet data source.

### Why are the changes needed?

* Make it possible to read parquet files containing both TimestampLTZ and TimestampNTZ as TimestampNTZ
* Make the data type system on Parquet simpler

### Does this PR introduce _any_ user-facing change?

Yes, Parquet TimestampLTZ type column are now allowed to be read as TimestampNTZ

### How was this patch tested?

UT
### Was this patch authored or co-authored using generative AI tooling?

No

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org