You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Kurt Young <yk...@gmail.com> on 2021/01/21 04:11:42 UTC

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

cc this to user & user-zh mailing list because this will affect lots of
users, and also quite a lot of users
were asking questions around this topic.

Let me try to understand this from user's perspective.

Your proposal will affect five functions, which are:

   - PROCTIME()
   - NOW()
   - CURRENT_DATE
   - CURRENT_TIME
   - CURRENT_TIMESTAMP

Before the changes, as I am writing this reply, the local time here is
*2021-01-21
12:03:35 (Beijing time, UTC+8)*.
And I tried these 5 functions in sql client, and got:

*Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
CURRENT_TIME;*

*+-------------------------+-------------------------+-------------------------+--------------+--------------+*

*|                  EXPR$0 |                  EXPR$1 |
CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |*

*+-------------------------+-------------------------+-------------------------+--------------+--------------+*

*| 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |
2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |*

*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
After the changes, the expected behavior will change to:

*Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
CURRENT_TIME;*

*+-------------------------+-------------------------+-------------------------+--------------+--------------+*

*|                  EXPR$0 |                  EXPR$1 |
CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |*

*+-------------------------+-------------------------+-------------------------+--------------+--------------+*

*| 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |
2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |*

*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
The return type of now(), proctime() and CURRENT_TIMESTAMP still be
TIMESTAMP;

Best,
Kurt


On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xb...@gmail.com> wrote:

> I found above example format may mess up in different mail client, I post
> a picture here[1].
>
> Best,
> Leonard
>
> [1]
> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
> <
> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>
>
>
> > 在 2021年1月19日,16:22,Leonard Xu <xb...@gmail.com> 写道:
> >
> > Hi, all
> >
> > I want to start the discussion about correcting time-related function
> behavior in Flink SQL, this is a tricky topic but I think it’s time to
> address it.
> >
> > Currently some temporal function behaviors are wired to users.
> > 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a
> timezone offset with the wall-clock time in users' local time zone, users
> need to add their local time zone offset manually to get expected local
> timestamp(e.g: Users in Germany need to +1h to get expected local
> timestamp).
> >
> > 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get
> wall-clock timestamp in local time zone, and thus they need write UDF in
> their SQL just for implementing a simple filter like WHERE date_col =
> CURRENT_DATE.
> >
> > 3. Another common case  is the time window  with day interval based on
> PROCTIME(), user plan to put all data from one day into the same window,
> but the window is assigned using timestamp in UTC+0 timezone rather than
> the session timezone which leads to the window starts with an offset(e.g:
> Users in China need to add -8h in their business sql start and then +8h
> when output the result, the conversion like a magic for users).
> >
> > These problems come from that lots of time-related functions like
> PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are
> returning time values based on UTC+0 time zone.
> >
> > This topic will lead to a comparison of the three types, i.e.
> TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and
> TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I
> wrote a document[1] to help understand them better. You can also know the
> tree timestamp types behavior in Hadoop ecosystem from the reference link
> int the doc.
> >
> >
> > I Invested all Flink time-related functions current behavior and
> compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I
> made an excel [2] to organize them well, we can use it for the next
> discussion. Please let me know if I missed something.
> > From my investigation, I think we need to correct the behavior of
> function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to
> correct them, we can change the function return type or function return
> value or change return type and return value both. All of those way are
> valid because SQL:2011 does not specify the function return type and every
> SQL engine vendor has its own implementation. For example the
> CURRENT_TIMESTAMP function,
> >
> > FLINK current behavior        existed problem other vendors' behavior
> proposed change
> > CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
> > TIMESTAMP(0) NOT NULL
> >
> > #session timezone: UTC
> > 2020-12-28T23:52:52
> >
> > #session timezone: UTC+8
> > 2020-12-28T23:52:52
> >
> > wall clock:
> > UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but
> user expects current timestamp in session time zone      In MySQL, Spark,
> the function NOW() and CURRENT_TIMESTAMP return current timestamp value in
> session time zone,the return type is TIMESTAMP
> >
> > In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current
> timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
> >
> > In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return
> current timestamp in session time zone,the return type is TIMESTAMP WITH
> LOCAL TIME ZONE Flink should return current timestamp in session time zone,
> the return type should be TIMESTAMP
> >
> >
> > I tend to only change the return value for these problematic functions
> and introduce an option for compatibility consideration, what do you think?
> >
> >
> > Looking forward to your feedback.
> >
> > Best,
> > Leonard
> >
> > [1]
> https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
> <
> https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>
>
> > [2]
> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
> <
> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>
>
>
>

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

Posted by Leonard Xu <xb...@gmail.com>.
Thanks @Timo for the detailed reply.

Let’s continue the technical discussion on the former FLIP discussion thread[1]:  [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior. 

We can share our decision to user and user-zh on this thread once we have a final proposal.

Best,
Leonard

[1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-162-Consistent-Flink-SQL-time-function-behavior-tc48116.html

> 在 2021年1月22日,00:53,Timo Walther <tw...@apache.org> 写道:
> 
> Hi Leonard,
> 
> thanks for working on this topic. I agree that time handling is not easy in Flink at the moment. We added new time data types (and some are still not supported which even further complicates things like TIME(9)). We should definitely improve this situation for users.
> 
> This is a pretty opinionated topic and it seems that the SQL standard is not really deciding this but is at least supporting. So let me express my opinion for the most important functions:
> 
> LOCALDATE / LOCALTIME / LOCALTIMESTAMP
> 
> --> uses session time zone, returns DATE/TIME/TIMESTAMP
> 
> I think those are the most obvious ones because the LOCAL indicates that the locality should be materialized into the result and any time zone information (coming from session config or data) is not important afterwards.
> 
> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
> 
> --> uses session time zone, returns DATE/TIME/TIMESTAMP
> 
> I'm very sceptical about this behavior. Almost all mature systems (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a data type with some degree of time zone information encoded. In a globalized world with businesses spanning different regions, I think we should do this as well. There should be a difference between CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose which behavior they prefer for their pipeline.
> 
> If we would design this from scatch, I would suggest the following:
> 
> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / LOCALTIME for materialized timestamp parts
> 
> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to materialize all session time information into every record. It it the most generic data type and allows to cast to all other timestamp data types. This generic ability can be used for filter predicates as well either through implicit or explicit casting.
> 
> PROCTIME/ROWTIME should be time functions based on a long value. Both System.currentMillis() and our watermark system work on long values. Those should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation should always happen based on UTC. We discussed it in a different thread, but we should allow PROCTIME globally. People need a way to create instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not considered in the current design doc. Many pipelines contain UTC timestamps and thus it should be easy to create one. Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with this type because we should remember that TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data types as casting target [1]. We could allow TIMESTAMP WITH TIME ZONE in the future for ROWTIME.
> 
> In any case, windows should simply adapt their behavior to the passed timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by considering the current session time zone.
> 
> If we would like to design this with less effort required, we could think about returning TIMESTAMP WITH LOCAL TIME ZONE also for CURRENT_TIMESTAMP.
> 
> 
> I will try to involve more people into this discussion.
> 
> Thanks,
> Timo
> 
> [1] https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 <https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3>
> 
> On 21.01.21 15:32, Leonard Xu wrote:
>>> Before the changes, as I am writing this reply, the local time here is 2021-01-21 12:03:35 (Beijing time, UTC+8).
>>> And I tried these 5 functions in sql client, and got:
>>> 
>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> After the changes, the expected behavior will change to:
>>> 
>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still be TIMESTAMP;
>> To Kurt, thanks  for the intuitive case, it really clear, you’re wright that I want to propose to change the return value of these functions. It’s the most important part of the topic from user's perspective.
>>> I think this definitely deserves a FLIP.
>> To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start the FLIP discussion soon.
>>>> If use the default Flink SQL,&nbsp; the window time range of the
>>>> statistics is incorrect, then the statistical results will naturally be
>>>> incorrect.
>> To zhisheng, sorry to hear that this problem influenced your production jobs,  Could you share your SQL pattern?  we can have more inputs and try to resolve them.
>> Best,
>> Leonard
>>> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>> I found above example format may mess up in different mail client, I post a picture here[1].
>>> 
>>> Best,
>>> Leonard
>>> 
>>> [1] https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>>>
>>> 
>>>> 在 2021年1月19日,16:22,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>> 
>>>> Hi, all
>>>> 
>>>> I want to start the discussion about correcting time-related function behavior in Flink SQL, this is a tricky topic but I think it’s time to address it.
>>>> 
>>>> Currently some temporal function behaviors are wired to users.
>>>> 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a timezone offset with the wall-clock time in users' local time zone, users need to add their local time zone offset manually to get expected local timestamp(e.g: Users in Germany need to +1h to get expected local timestamp).
>>>> 
>>>> 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get wall-clock timestamp in local time zone, and thus they need write UDF in their SQL just for implementing a simple filter like WHERE date_col =  CURRENT_DATE.
>>>> 
>>>> 3. Another common case  is the time window  with day interval based on PROCTIME(), user plan to put all data from one day into the same window, but the window is assigned using timestamp in UTC+0 timezone rather than the session timezone which leads to the window starts with an offset(e.g: Users in China need to add -8h in their business sql start and then +8h when output the result, the conversion like a magic for users).
>>>> 
>>>> These problems come from that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone.
>>>> 
>>>> This topic will lead to a comparison of the three types, i.e. TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I wrote a document[1] to help understand them better. You can also know the tree timestamp types behavior in Hadoop ecosystem from the reference link int the doc.
>>>> 
>>>> 
>>>> I Invested all Flink time-related functions current behavior and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] to organize them well, we can use it for the next discussion. Please let me know if I missed something.
>>>> From my investigation, I think we need to correct the behavior of function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them, we can change the function return type or function return value or change return type and return value both. All of those way are valid because SQL:2011 does not specify the function return type and every SQL engine vendor has its own implementation. For example the CURRENT_TIMESTAMP function,
>>>> 
>>>> FLINK current behavior        existed problem other vendors' behavior proposed change
>>>> CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
>>>> TIMESTAMP(0) NOT NULL
>>>> 
>>>> #session timezone: UTC
>>>> 2020-12-28T23:52:52
>>>> 
>>>> #session timezone: UTC+8
>>>> 2020-12-28T23:52:52
>>>> 
>>>> wall clock:
>>>> UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but user expects current timestamp in session time zone      In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP
>>>> 
>>>> In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
>>>> 
>>>> In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE Flink should return current timestamp in session time zone, the return type should be TIMESTAMP
>>>> 
>>>> 
>>>> I tend to only change the return value for these problematic functions and introduce an option for compatibility consideration, what do you think?
>>>> 
>>>> 
>>>> Looking forward to your feedback.
>>>> 
>>>> Best,
>>>> Leonard
>>>> 
>>>> [1] https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing><https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing><https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>>>
>>>> [2] https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing><https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>>>


Re: [DISCUSS] Correct time-related function behavior in Flink SQL

Posted by Timo Walther <tw...@apache.org>.
Hi Leonard,

thanks for working on this topic. I agree that time handling is not easy 
in Flink at the moment. We added new time data types (and some are still 
not supported which even further complicates things like TIME(9)). We 
should definitely improve this situation for users.

This is a pretty opinionated topic and it seems that the SQL standard is 
not really deciding this but is at least supporting. So let me express 
my opinion for the most important functions:

LOCALDATE / LOCALTIME / LOCALTIMESTAMP

--> uses session time zone, returns DATE/TIME/TIMESTAMP

I think those are the most obvious ones because the LOCAL indicates that 
the locality should be materialized into the result and any time zone 
information (coming from session config or data) is not important 
afterwards.

CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP

--> uses session time zone, returns DATE/TIME/TIMESTAMP

I'm very sceptical about this behavior. Almost all mature systems 
(Oracle, Postgres) and new high quality systems (Presto, Snowflake) use 
a data type with some degree of time zone information encoded. In a 
globalized world with businesses spanning different regions, I think we 
should do this as well. There should be a difference between 
CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose 
which behavior they prefer for their pipeline.

If we would design this from scatch, I would suggest the following:

- drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / 
LOCALTIME for materialized timestamp parts

- CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to 
materialize all session time information into every record. It it the 
most generic data type and allows to cast to all other timestamp data 
types. This generic ability can be used for filter predicates as well 
either through implicit or explicit casting.

PROCTIME/ROWTIME should be time functions based on a long value. Both 
System.currentMillis() and our watermark system work on long values. 
Those should return TIMESTAMP WITH LOCAL TIME ZONE because the main 
calculation should always happen based on UTC. We discussed it in a 
different thread, but we should allow PROCTIME globally. People need a 
way to create instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not 
considered in the current design doc. Many pipelines contain UTC 
timestamps and thus it should be easy to create one. Also, both 
CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with this type because we 
should remember that TIMESTAMP WITH LOCAL TIME ZONE accepts all 
timestamp data types as casting target [1]. We could allow TIMESTAMP 
WITH TIME ZONE in the future for ROWTIME.

In any case, windows should simply adapt their behavior to the passed 
timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined 
by considering the current session time zone.

If we would like to design this with less effort required, we could 
think about returning TIMESTAMP WITH LOCAL TIME ZONE also for 
CURRENT_TIMESTAMP.


I will try to involve more people into this discussion.

Thanks,
Timo

[1] 
https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3

On 21.01.21 15:32, Leonard Xu wrote:
> 
> 
>> Before the changes, as I am writing this reply, the local time here is 2021-01-21 12:03:35 (Beijing time, UTC+8).
>> And I tried these 5 functions in sql client, and got:
>>
>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> After the changes, the expected behavior will change to:
>>
>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>> The return type of now(), proctime() and CURRENT_TIMESTAMP still be TIMESTAMP;
> 
> To Kurt, thanks  for the intuitive case, it really clear, you’re wright that I want to propose to change the return value of these functions. It’s the most important part of the topic from user's perspective.
> 
>> I think this definitely deserves a FLIP.
> To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start the FLIP discussion soon.
> 
>>> If use the default Flink SQL,&nbsp; the window time range of the
>>> statistics is incorrect, then the statistical results will naturally be
>>> incorrect.
> To zhisheng, sorry to hear that this problem influenced your production jobs,  Could you share your SQL pattern?  we can have more inputs and try to resolve them.
> 
> 
> Best,
> Leonard
> 
> 
> 
>> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
>> I found above example format may mess up in different mail client, I post a picture here[1].
>>
>> Best,
>> Leonard
>>
>> [1] https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>>
>>
>>> 在 2021年1月19日,16:22,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> 写道:
>>>
>>> Hi, all
>>>
>>> I want to start the discussion about correcting time-related function behavior in Flink SQL, this is a tricky topic but I think it’s time to address it.
>>>
>>> Currently some temporal function behaviors are wired to users.
>>> 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a timezone offset with the wall-clock time in users' local time zone, users need to add their local time zone offset manually to get expected local timestamp(e.g: Users in Germany need to +1h to get expected local timestamp).
>>>
>>> 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get wall-clock timestamp in local time zone, and thus they need write UDF in their SQL just for implementing a simple filter like WHERE date_col =  CURRENT_DATE.
>>>
>>> 3. Another common case  is the time window  with day interval based on PROCTIME(), user plan to put all data from one day into the same window, but the window is assigned using timestamp in UTC+0 timezone rather than the session timezone which leads to the window starts with an offset(e.g: Users in China need to add -8h in their business sql start and then +8h when output the result, the conversion like a magic for users).
>>>
>>> These problems come from that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone.
>>>
>>> This topic will lead to a comparison of the three types, i.e. TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I wrote a document[1] to help understand them better. You can also know the tree timestamp types behavior in Hadoop ecosystem from the reference link int the doc.
>>>
>>>
>>> I Invested all Flink time-related functions current behavior and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] to organize them well, we can use it for the next discussion. Please let me know if I missed something.
>>>  From my investigation, I think we need to correct the behavior of function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them, we can change the function return type or function return value or change return type and return value both. All of those way are valid because SQL:2011 does not specify the function return type and every SQL engine vendor has its own implementation. For example the CURRENT_TIMESTAMP function,
>>>
>>> FLINK current behavior        existed problem other vendors' behavior proposed change
>>> CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
>>> TIMESTAMP(0) NOT NULL
>>>
>>> #session timezone: UTC
>>> 2020-12-28T23:52:52
>>>
>>> #session timezone: UTC+8
>>> 2020-12-28T23:52:52
>>>
>>> wall clock:
>>> UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but user expects current timestamp in session time zone      In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP
>>>
>>> In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
>>>
>>> In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE Flink should return current timestamp in session time zone, the return type should be TIMESTAMP
>>>
>>>
>>> I tend to only change the return value for these problematic functions and introduce an option for compatibility consideration, what do you think?
>>>
>>>
>>> Looking forward to your feedback.
>>>
>>> Best,
>>> Leonard
>>>
>>> [1] https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>>
>>> [2] https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>>
>>
> 
> 


Re: [DISCUSS] Correct time-related function behavior in Flink SQL

Posted by Leonard Xu <xb...@gmail.com>.

> Before the changes, as I am writing this reply, the local time here is 2021-01-21 12:03:35 (Beijing time, UTC+8).
> And I tried these 5 functions in sql client, and got:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> After the changes, the expected behavior will change to:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> The return type of now(), proctime() and CURRENT_TIMESTAMP still be TIMESTAMP;

To Kurt, thanks  for the intuitive case, it really clear, you’re wright that I want to propose to change the return value of these functions. It’s the most important part of the topic from user's perspective.

> I think this definitely deserves a FLIP.
To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start the FLIP discussion soon.

>> If use the default Flink SQL,&nbsp; the window time range of the
>> statistics is incorrect, then the statistical results will naturally be
>> incorrect.
To zhisheng, sorry to hear that this problem influenced your production jobs,  Could you share your SQL pattern?  we can have more inputs and try to resolve them.


Best,
Leonard



> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
> I found above example format may mess up in different mail client, I post a picture here[1].
> 
> Best,
> Leonard
> 
> [1] https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>> 
> 
> > 在 2021年1月19日,16:22,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> 写道:
> > 
> > Hi, all
> > 
> > I want to start the discussion about correcting time-related function behavior in Flink SQL, this is a tricky topic but I think it’s time to address it. 
> > 
> > Currently some temporal function behaviors are wired to users.
> > 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a timezone offset with the wall-clock time in users' local time zone, users need to add their local time zone offset manually to get expected local timestamp(e.g: Users in Germany need to +1h to get expected local timestamp). 
> > 
> > 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get wall-clock timestamp in local time zone, and thus they need write UDF in their SQL just for implementing a simple filter like WHERE date_col =  CURRENT_DATE. 
> > 
> > 3. Another common case  is the time window  with day interval based on PROCTIME(), user plan to put all data from one day into the same window, but the window is assigned using timestamp in UTC+0 timezone rather than the session timezone which leads to the window starts with an offset(e.g: Users in China need to add -8h in their business sql start and then +8h when output the result, the conversion like a magic for users). 
> > 
> > These problems come from that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone.
> > 
> > This topic will lead to a comparison of the three types, i.e. TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I wrote a document[1] to help understand them better. You can also know the tree timestamp types behavior in Hadoop ecosystem from the reference link int the doc.
> > 
> > 
> > I Invested all Flink time-related functions current behavior and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] to organize them well, we can use it for the next discussion. Please let me know if I missed something.
> > From my investigation, I think we need to correct the behavior of function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them, we can change the function return type or function return value or change return type and return value both. All of those way are valid because SQL:2011 does not specify the function return type and every SQL engine vendor has its own implementation. For example the CURRENT_TIMESTAMP function,
> > 
> > FLINK current behavior        existed problem other vendors' behavior proposed change
> > CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
> > TIMESTAMP(0) NOT NULL
> > 
> > #session timezone: UTC
> > 2020-12-28T23:52:52
> > 
> > #session timezone: UTC+8
> > 2020-12-28T23:52:52
> > 
> > wall clock:
> > UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but user expects current timestamp in session time zone      In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP
> > 
> > In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
> > 
> > In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE Flink should return current timestamp in session time zone, the return type should be TIMESTAMP
> > 
> > 
> > I tend to only change the return value for these problematic functions and introduce an option for compatibility consideration, what do you think?
> > 
> > 
> > Looking forward to your feedback.
> > 
> > Best,
> > Leonard
> > 
> > [1] https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>> 
> > [2] https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>> 
> 


Re: [DISCUSS] Correct time-related function behavior in Flink SQL

Posted by Leonard Xu <xb...@gmail.com>.

> Before the changes, as I am writing this reply, the local time here is 2021-01-21 12:03:35 (Beijing time, UTC+8).
> And I tried these 5 functions in sql client, and got:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> After the changes, the expected behavior will change to:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> The return type of now(), proctime() and CURRENT_TIMESTAMP still be TIMESTAMP;

To Kurt, thanks  for the intuitive case, it really clear, you’re wright that I want to propose to change the return value of these functions. It’s the most important part of the topic from user's perspective.

> I think this definitely deserves a FLIP.
To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start the FLIP discussion soon.

>> If use the default Flink SQL,&nbsp; the window time range of the
>> statistics is incorrect, then the statistical results will naturally be
>> incorrect.
To zhisheng, sorry to hear that this problem influenced your production jobs,  Could you share your SQL pattern?  we can have more inputs and try to resolve them.


Best,
Leonard



> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
> I found above example format may mess up in different mail client, I post a picture here[1].
> 
> Best,
> Leonard
> 
> [1] https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>> 
> 
> > 在 2021年1月19日,16:22,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> 写道:
> > 
> > Hi, all
> > 
> > I want to start the discussion about correcting time-related function behavior in Flink SQL, this is a tricky topic but I think it’s time to address it. 
> > 
> > Currently some temporal function behaviors are wired to users.
> > 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a timezone offset with the wall-clock time in users' local time zone, users need to add their local time zone offset manually to get expected local timestamp(e.g: Users in Germany need to +1h to get expected local timestamp). 
> > 
> > 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get wall-clock timestamp in local time zone, and thus they need write UDF in their SQL just for implementing a simple filter like WHERE date_col =  CURRENT_DATE. 
> > 
> > 3. Another common case  is the time window  with day interval based on PROCTIME(), user plan to put all data from one day into the same window, but the window is assigned using timestamp in UTC+0 timezone rather than the session timezone which leads to the window starts with an offset(e.g: Users in China need to add -8h in their business sql start and then +8h when output the result, the conversion like a magic for users). 
> > 
> > These problems come from that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone.
> > 
> > This topic will lead to a comparison of the three types, i.e. TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I wrote a document[1] to help understand them better. You can also know the tree timestamp types behavior in Hadoop ecosystem from the reference link int the doc.
> > 
> > 
> > I Invested all Flink time-related functions current behavior and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] to organize them well, we can use it for the next discussion. Please let me know if I missed something.
> > From my investigation, I think we need to correct the behavior of function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them, we can change the function return type or function return value or change return type and return value both. All of those way are valid because SQL:2011 does not specify the function return type and every SQL engine vendor has its own implementation. For example the CURRENT_TIMESTAMP function,
> > 
> > FLINK current behavior        existed problem other vendors' behavior proposed change
> > CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
> > TIMESTAMP(0) NOT NULL
> > 
> > #session timezone: UTC
> > 2020-12-28T23:52:52
> > 
> > #session timezone: UTC+8
> > 2020-12-28T23:52:52
> > 
> > wall clock:
> > UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but user expects current timestamp in session time zone      In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP
> > 
> > In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
> > 
> > In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE Flink should return current timestamp in session time zone, the return type should be TIMESTAMP
> > 
> > 
> > I tend to only change the return value for these problematic functions and introduce an option for compatibility consideration, what do you think?
> > 
> > 
> > Looking forward to your feedback.
> > 
> > Best,
> > Leonard
> > 
> > [1] https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>> 
> > [2] https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>> 
> 


Re: [DISCUSS] Correct time-related function behavior in Flink SQL

Posted by Leonard Xu <xb...@gmail.com>.

> Before the changes, as I am writing this reply, the local time here is 2021-01-21 12:03:35 (Beijing time, UTC+8).
> And I tried these 5 functions in sql client, and got:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> After the changes, the expected behavior will change to:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> |                  EXPR$0 |                  EXPR$1 |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
> +-------------------------+-------------------------+-------------------------+--------------+--------------+
> The return type of now(), proctime() and CURRENT_TIMESTAMP still be TIMESTAMP;

To Kurt, thanks  for the intuitive case, it really clear, you’re wright that I want to propose to change the return value of these functions. It’s the most important part of the topic from user's perspective.

> I think this definitely deserves a FLIP.
To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start the FLIP discussion soon.

>> If use the default Flink SQL,&nbsp; the window time range of the
>> statistics is incorrect, then the statistical results will naturally be
>> incorrect.
To zhisheng, sorry to hear that this problem influenced your production jobs,  Could you share your SQL pattern?  we can have more inputs and try to resolve them.


Best,
Leonard



> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
> I found above example format may mess up in different mail client, I post a picture here[1].
> 
> Best,
> Leonard
> 
> [1] https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>> 
> 
> > 在 2021年1月19日,16:22,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> 写道:
> > 
> > Hi, all
> > 
> > I want to start the discussion about correcting time-related function behavior in Flink SQL, this is a tricky topic but I think it’s time to address it. 
> > 
> > Currently some temporal function behaviors are wired to users.
> > 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a timezone offset with the wall-clock time in users' local time zone, users need to add their local time zone offset manually to get expected local timestamp(e.g: Users in Germany need to +1h to get expected local timestamp). 
> > 
> > 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get wall-clock timestamp in local time zone, and thus they need write UDF in their SQL just for implementing a simple filter like WHERE date_col =  CURRENT_DATE. 
> > 
> > 3. Another common case  is the time window  with day interval based on PROCTIME(), user plan to put all data from one day into the same window, but the window is assigned using timestamp in UTC+0 timezone rather than the session timezone which leads to the window starts with an offset(e.g: Users in China need to add -8h in their business sql start and then +8h when output the result, the conversion like a magic for users). 
> > 
> > These problems come from that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone.
> > 
> > This topic will lead to a comparison of the three types, i.e. TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I wrote a document[1] to help understand them better. You can also know the tree timestamp types behavior in Hadoop ecosystem from the reference link int the doc.
> > 
> > 
> > I Invested all Flink time-related functions current behavior and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake,  I made an excel [2] to organize them well, we can use it for the next discussion. Please let me know if I missed something.
> > From my investigation, I think we need to correct the behavior of function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to correct them, we can change the function return type or function return value or change return type and return value both. All of those way are valid because SQL:2011 does not specify the function return type and every SQL engine vendor has its own implementation. For example the CURRENT_TIMESTAMP function,
> > 
> > FLINK current behavior        existed problem other vendors' behavior proposed change
> > CURRENT_TIMESTAMP     CURRENT_TIMESTAMP
> > TIMESTAMP(0) NOT NULL
> > 
> > #session timezone: UTC
> > 2020-12-28T23:52:52
> > 
> > #session timezone: UTC+8
> > 2020-12-28T23:52:52
> > 
> > wall clock:
> > UTC+8: 2020-12-29 07:52:52    Wrong value:returns UTC timestamp, but user expects current timestamp in session time zone      In MySQL, Spark, the function NOW() and CURRENT_TIMESTAMP return current timestamp value in session time zone,the return type is TIMESTAMP
> > 
> > In Pg, Presto, the function NOW() and LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH TIME ZONE
> > 
> > In Snowflake, the function CURRENT_TIMESTAMP / LOCALTIMESTAMP return current timestamp in session time zone,the return type is TIMESTAMP WITH LOCAL TIME ZONE Flink should return current timestamp in session time zone, the return type should be TIMESTAMP
> > 
> > 
> > I tend to only change the return value for these problematic functions and introduce an option for compatibility consideration, what do you think?
> > 
> > 
> > Looking forward to your feedback.
> > 
> > Best,
> > Leonard
> > 
> > [1] https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing> <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing <https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>> 
> > [2] https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>> 
>