You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Jeffrey Charles (Jira)" <ji...@apache.org> on 2021/02/10 14:55:00 UTC

[jira] [Created] (FLINK-21350) ParquetInputFormat incorrectly interprets timestamps encoded in microseconds as timestamps encoded in milliseconds

Jeffrey Charles created FLINK-21350:
---------------------------------------

             Summary: ParquetInputFormat incorrectly interprets timestamps encoded in microseconds as timestamps encoded in milliseconds
                 Key: FLINK-21350
                 URL: https://issues.apache.org/jira/browse/FLINK-21350
             Project: Flink
          Issue Type: Bug
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
    Affects Versions: 1.12.1, 1.12.0
            Reporter: Jeffrey Charles


Given a parquet file with a schema that has a field with a physical type of INT64 and a logical type of TIMESTAMP_MICROS, all of the ParquetInputFormat sub-classes deserialize the timestamp as tens of thousands of years in the future.

Looking at the code in [https://github.com/apache/flink/blob/release-1.12.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/RowConverter.java#L326,] it looks to me like the row converter is interpreting the field value as if it contained milliseconds and not microseconds. Specifically both millisecond and microsecond processing share the same code path to instantiate a java.sql.timestamp which takes a millisecond value in its constructor and the microsecond case statement is passing it a value in microseconds. I tested a change locally where I divide the value by 1000 in the microseconds case statement and that results in a timestamp with the expected value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)