You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Vladimir Kralik (Jira)" <ji...@apache.org> on 2020/12/11 15:28:00 UTC

[jira] [Commented] (AVRO-2950) LocalDateTime-millis and -micros is bound to lead to wrong data

    [ https://issues.apache.org/jira/browse/AVRO-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247967#comment-17247967 ] 

Vladimir Kralik commented on AVRO-2950:
---------------------------------------

Hello [~wdaehn],

 

I've experience from another side. As I live in small Europe country.  So we have only one timezone. AVRO is used as a exchange format for date between many small computers which are outside of my control. Owner of that computers are (less than) inexperienced user, so they don't care about correct setting of timezone. It's easier for them to change local-clock as to set correct timezone. So sometimes it's GMT, UTC, GMT+2, CET, CEST, ... If there is a time part, it's ok, because it's not significant, if that event occur at 08:00 or at 10:00 local time. But in case, that there is no time part, code automatically add 00:00. And this can make significant issue, because setting of incorrect timezone can lead to change of date-part, and changing of date has an impact to money ( computation rules depends on date-part only ).

As I know, that all of that data are always produced/consumed only in my country, so setting another timezone is mistake.

With the previous `timestamp-millis` which is stored internally as a `long`, it's not possible to realize which (possible incorrect) timezone was used during serialisation, I've got only number. With `local-timestamp-millis` doesn't matter what timezone was used, because that `long` number is always computed against local clock ( maybe with incorrect timezone ), and local clock during deserialisation (with correct timezone).

 

So I really appreciate new `local-timestamp-millis` logical type, mapped to Java `LocalDateTime`.

 

 

> LocalDateTime-millis and -micros is bound to lead to wrong data
> ---------------------------------------------------------------
>
>                 Key: AVRO-2950
>                 URL: https://issues.apache.org/jira/browse/AVRO-2950
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: logical types
>    Affects Versions: 1.10.0
>            Reporter: Werner Daehn
>            Priority: Critical
>
> The recent addition of LocalDateTime Logical Types I find it extremely dangerous. It will lead to wrong data for many users without noticing.
> While I understand the idea and reason, the oversight in my opinion is the the difference between Hadoop Files and Avro Messages: Hadoop is for data storage, Avro is for data exchange. Hadoop runs in a single cluster and it has a well defined time zone. Thus LocalDateTime does have a meaning. Avro is used to exchange data between systems. Serializing data on a system in time zone 1 and loading it into the Hadoop cluster located in time zone 2 will lead to wrong data with an high likelihood.
> Example: Kafka Connect Producer is running in US (PST) and Hadoop in UK (GMT).
> User 1 expectation: In Hadoop the data is in LocalDateTime meaning in GMT. The Java data types Date, java.sql.Timestamp and LocalDateTime are used, which all are data types without a time zone information. Thus they return correct data if the loaded data has the meaning of UK-time. The Kafka Producer does not know the time zone of Hadoop.
> User 2 expectation: In Hadoop the data belongs to an office and has an implicit time zone hence. It is the time zone of the office location. In that case a LocalDateTime is meant as the time as seen on the office clock.
> As these two cases cannot be distinguished from each other and people tend to think locally, we are inviting people to produce wrong data.
>  
> The better logical type would have been for the Java ZonedDateTime. Then the producer and consumer are in sync. The producer is loading data in PST time zone and the consumer can read the data as GMT times. If he wants the local office times, he has to add the office timezone offsets.
> LocalDateTime was introduced here: https://issues.apache.org/jira/browse/AVRO-2328
>  
> Can you please open the discussion on this item to make sure you are fully aware of the implications and still want to go with it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)