You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Steve Carlin (Jira)" <ji...@apache.org> on 2022/10/07 23:04:00 UTC
[jira] [Created] (HIVE-26612) Hive cannot read parquet files with int64 (TIMESTAMP_MILLIS)

Steve Carlin created HIVE-26612:
-----------------------------------

             Summary: Hive cannot read parquet files with int64 (TIMESTAMP_MILLIS)
                 Key: HIVE-26612
                 URL: https://issues.apache.org/jira/browse/HIVE-26612
             Project: Hive
          Issue Type: Bug
          Components: Database/Schema
            Reporter: Steve Carlin


If a parquet file has a Type of "int64 eventtime (TIMESTAMP(MILLIS,true))", the following error is produced:

exec.Task: Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet
java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:624)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:531)
        at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:197)
        at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:98)

The parquet file can be created with the following steps (through spark):



spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

[1]
val df = Seq(
(1, Timestamp.valueOf("2014-01-01 23:00:01")),
(1, Timestamp.valueOf("2014-11-30 12:40:32")),
(2, Timestamp.valueOf("2016-12-29 09:54:00")),
(2, Timestamp.valueOf("2016-05-09 10:12:43"))
).toDF("typeid","eventtime")

[2]
[root@c4839-node3 test_parquet2]# parquet-tools schema part-00001-6c90b794-90b9-4cc0-afc5-2e49a4e96bad-c000.snappy.parquet
message spark_schema {
required int32 typeid;
optional int64 eventtime (TIMESTAMP(MILLIS,true));
}

[3]
[root@c4839-node3 test_parquet1]# parquet-tools schema part-00001-cb1aeebb-ec87-4273-82ec-911c4fb605b6-c000.snappy.parquet
message spark_schema {
required int32 typeid;
optional int96 eventtime;
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)