You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Werner Daehn (Jira)" <ji...@apache.org> on 2020/12/16 15:25:00 UTC

[jira] [Resolved] (AVRO-2950) LocalDateTime-millis and -micros is bound to lead to wrong data

     [ https://issues.apache.org/jira/browse/AVRO-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Werner Daehn resolved AVRO-2950.
--------------------------------
    Resolution: Won't Fix

Two qualified comments render my fear obsolete. 

> LocalDateTime-millis and -micros is bound to lead to wrong data
> ---------------------------------------------------------------
>
>                 Key: AVRO-2950
>                 URL: https://issues.apache.org/jira/browse/AVRO-2950
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: logical types
>    Affects Versions: 1.10.0
>            Reporter: Werner Daehn
>            Priority: Critical
>
> The recent addition of LocalDateTime Logical Types I find it extremely dangerous. It will lead to wrong data for many users without noticing.
> While I understand the idea and reason, the oversight in my opinion is the the difference between Hadoop Files and Avro Messages: Hadoop is for data storage, Avro is for data exchange. Hadoop runs in a single cluster and it has a well defined time zone. Thus LocalDateTime does have a meaning. Avro is used to exchange data between systems. Serializing data on a system in time zone 1 and loading it into the Hadoop cluster located in time zone 2 will lead to wrong data with an high likelihood.
> Example: Kafka Connect Producer is running in US (PST) and Hadoop in UK (GMT).
> User 1 expectation: In Hadoop the data is in LocalDateTime meaning in GMT. The Java data types Date, java.sql.Timestamp and LocalDateTime are used, which all are data types without a time zone information. Thus they return correct data if the loaded data has the meaning of UK-time. The Kafka Producer does not know the time zone of Hadoop.
> User 2 expectation: In Hadoop the data belongs to an office and has an implicit time zone hence. It is the time zone of the office location. In that case a LocalDateTime is meant as the time as seen on the office clock.
> As these two cases cannot be distinguished from each other and people tend to think locally, we are inviting people to produce wrong data.
>  
> The better logical type would have been for the Java ZonedDateTime. Then the producer and consumer are in sync. The producer is loading data in PST time zone and the consumer can read the data as GMT times. If he wants the local office times, he has to add the office timezone offsets.
> LocalDateTime was introduced here: https://issues.apache.org/jira/browse/AVRO-2328
>  
> Can you please open the discussion on this item to make sure you are fully aware of the implications and still want to go with it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)