You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/06/08 22:36:00 UTC

[jira] [Work logged] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

     [ https://issues.apache.org/jira/browse/HIVE-25219?focusedWorklogId=608808&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608808 ]

ASF GitHub Bot logged work on HIVE-25219:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jun/21 22:35
            Start Date: 08/Jun/21 22:35
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request #2370:
URL: https://github.com/apache/hive/pull/2370


   ### What changes were proposed in this pull request?
   1. Add new read/write config properties to control legacy zone conversions in Avro.
   2. Exploit file metadata and property to choose between new/old conversion rules.
   
   ### Why are the changes needed?
   Provide the end-users the possibility to write backward compatible timestamps in Parquet files so that files can be read correctly by older versions.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   1. New qtests for writting Avro timestamps (`avro_write_legacy_timestamp.q`, `avro_write_new_timestamp.q`)
   2. Manual tests
   * Export Avro table with current Hive version setting `hive.avro.timestamp.write.legacy.conversion.enabled=true`
   * Read from external Parquet table with Hive 2 (commit 324f9fa)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 608808)
    Remaining Estimate: 0h
            Time Spent: 10m

> Backward incompatible timestamp serialization in Avro for certain timezones
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-25219
>                 URL: https://issues.apache.org/jira/browse/HIVE-25219
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 3.1.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to US/Pacific.
> At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)