You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by 林挺滨 <li...@gmail.com> on 2021/10/14 05:33:12 UTC

[DISCUSS] Over Aggregation add supports CURRENT_TIMESTAMP as the upper boundary in RANGE intervals .

In our scenario, it is often necessary to calculate the user's aggregated
indicators in the most recent period of time.
For example, if I need to calculate the user's recharge amount in the most
recent day, I can do it through the following SQL code.
---------------------------------------------------------------------
CREATE TEMPORARY VIEW purchase as
select user_id, purchase_price, __ts__
from
raw_purchase;

CREATE TEMPORARY VIEW purchase_expire as
select user_id, 0 as purchase_price,
SESSION_ROWTIME (__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND) as __ts__
from
purchase as T
GROUP BY SESSION (T.__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND), user_id,
__ts__;

CREATE TEMPORARY VIEW total_purchase as
select * from purchase
union all
select * from purchase_expire;

select user_id, SUM(purchase_price) OVER (
PARTITION BY user_id
ORDER BY __ts__
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
)
from total_purchase;
-----------------------------------------

If the "Over Aggregation CURRENT_TIMESTAMP" is supported, the above code
can be replaced by the following simple code,and the simple code is easier
to understand.
-----------------------------------------
select user_id, SUM(purchase_price) OVER (
PARTITION BY user_id
ORDER BY __ts__
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT_TIMESTAMP
)
from raw_purchase;
-----------------------------------------

I have seen the implementation of the RowTimeRangeBoundedPrecedingFunction
class. It is very simple to add support for CURRENT_TIMESTAMP.

Re: [DISCUSS] Over Aggregation add supports CURRENT_TIMESTAMP as the upper boundary in RANGE intervals .

Posted by 林挺滨 <li...@gmail.com>.
Microsoft SQL Server mainly processes offline data, so it is normal that
there is no CURRENT_TIMESTAMP. Just like watermark is also a concept that
exists only in real-time processing engines. I think CURRENT_TIMESTAMP is
essentially the same as watermark, and both are used to trigger the window.

Timo Walther <tw...@apache.org> 于2021年11月1日周一 下午7:14写道:

> Hi,
>
> this is an interesting idea. But as far as I can see, by looking at
> other SQL engines like Microsoft SQL Server:
>
>
> https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15
>
> The range is a well-defined set of keywords and CURRENT_TIMESTAMP is not
> listed there.
>
> Regards,
> Timo
>
>
> On 01.11.21 08:22, 林挺滨 wrote:
> > Can anyone give me some advice or information about this feature?
> >
> > 林挺滨 <li...@gmail.com> 于2021年10月14日周四 下午1:33写道:
> >
> >> In our scenario, it is often necessary to calculate the user's
> aggregated
> >> indicators in the most recent period of time.
> >> For example, if I need to calculate the user's recharge amount in the
> most
> >> recent day, I can do it through the following SQL code.
> >> ---------------------------------------------------------------------
> >> CREATE TEMPORARY VIEW purchase as
> >> select user_id, purchase_price, __ts__
> >> from
> >> raw_purchase;
> >>
> >> CREATE TEMPORARY VIEW purchase_expire as
> >> select user_id, 0 as purchase_price,
> >> SESSION_ROWTIME (__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND) as
> __ts__
> >> from
> >> purchase as T
> >> GROUP BY SESSION (T.__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND),
> >> user_id, __ts__;
> >>
> >> CREATE TEMPORARY VIEW total_purchase as
> >> select * from purchase
> >> union all
> >> select * from purchase_expire;
> >>
> >> select user_id, SUM(purchase_price) OVER (
> >> PARTITION BY user_id
> >> ORDER BY __ts__
> >> RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
> >> )
> >> from total_purchase;
> >> -----------------------------------------
> >>
> >> If the "Over Aggregation CURRENT_TIMESTAMP" is supported, the above code
> >> can be replaced by the following simple code,and the simple code is
> easier
> >> to understand.
> >> -----------------------------------------
> >> select user_id, SUM(purchase_price) OVER (
> >> PARTITION BY user_id
> >> ORDER BY __ts__
> >> RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT_TIMESTAMP
> >> )
> >> from raw_purchase;
> >> -----------------------------------------
> >>
> >> I have seen the implementation of the
> RowTimeRangeBoundedPrecedingFunction
> >> class. It is very simple to add support for CURRENT_TIMESTAMP.
> >>
> >
> >
>
>

-- 
祝您  身体健康
         工作愉快!

林挺滨

Re: [DISCUSS] Over Aggregation add supports CURRENT_TIMESTAMP as the upper boundary in RANGE intervals .

Posted by Timo Walther <tw...@apache.org>.
Hi,

this is an interesting idea. But as far as I can see, by looking at 
other SQL engines like Microsoft SQL Server:

https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15

The range is a well-defined set of keywords and CURRENT_TIMESTAMP is not 
listed there.

Regards,
Timo


On 01.11.21 08:22, 林挺滨 wrote:
> Can anyone give me some advice or information about this feature?
> 
> 林挺滨 <li...@gmail.com> 于2021年10月14日周四 下午1:33写道:
> 
>> In our scenario, it is often necessary to calculate the user's aggregated
>> indicators in the most recent period of time.
>> For example, if I need to calculate the user's recharge amount in the most
>> recent day, I can do it through the following SQL code.
>> ---------------------------------------------------------------------
>> CREATE TEMPORARY VIEW purchase as
>> select user_id, purchase_price, __ts__
>> from
>> raw_purchase;
>>
>> CREATE TEMPORARY VIEW purchase_expire as
>> select user_id, 0 as purchase_price,
>> SESSION_ROWTIME (__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND) as __ts__
>> from
>> purchase as T
>> GROUP BY SESSION (T.__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND),
>> user_id, __ts__;
>>
>> CREATE TEMPORARY VIEW total_purchase as
>> select * from purchase
>> union all
>> select * from purchase_expire;
>>
>> select user_id, SUM(purchase_price) OVER (
>> PARTITION BY user_id
>> ORDER BY __ts__
>> RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
>> )
>> from total_purchase;
>> -----------------------------------------
>>
>> If the "Over Aggregation CURRENT_TIMESTAMP" is supported, the above code
>> can be replaced by the following simple code,and the simple code is easier
>> to understand.
>> -----------------------------------------
>> select user_id, SUM(purchase_price) OVER (
>> PARTITION BY user_id
>> ORDER BY __ts__
>> RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT_TIMESTAMP
>> )
>> from raw_purchase;
>> -----------------------------------------
>>
>> I have seen the implementation of the RowTimeRangeBoundedPrecedingFunction
>> class. It is very simple to add support for CURRENT_TIMESTAMP.
>>
> 
> 


Re: [DISCUSS] Over Aggregation add supports CURRENT_TIMESTAMP as the upper boundary in RANGE intervals .

Posted by 林挺滨 <li...@gmail.com>.
Can anyone give me some advice or information about this feature?

林挺滨 <li...@gmail.com> 于2021年10月14日周四 下午1:33写道:

> In our scenario, it is often necessary to calculate the user's aggregated
> indicators in the most recent period of time.
> For example, if I need to calculate the user's recharge amount in the most
> recent day, I can do it through the following SQL code.
> ---------------------------------------------------------------------
> CREATE TEMPORARY VIEW purchase as
> select user_id, purchase_price, __ts__
> from
> raw_purchase;
>
> CREATE TEMPORARY VIEW purchase_expire as
> select user_id, 0 as purchase_price,
> SESSION_ROWTIME (__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND) as __ts__
> from
> purchase as T
> GROUP BY SESSION (T.__ts__, INTERVAL '1' DAY + INTERVAL '1' SECOND),
> user_id, __ts__;
>
> CREATE TEMPORARY VIEW total_purchase as
> select * from purchase
> union all
> select * from purchase_expire;
>
> select user_id, SUM(purchase_price) OVER (
> PARTITION BY user_id
> ORDER BY __ts__
> RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
> )
> from total_purchase;
> -----------------------------------------
>
> If the "Over Aggregation CURRENT_TIMESTAMP" is supported, the above code
> can be replaced by the following simple code,and the simple code is easier
> to understand.
> -----------------------------------------
> select user_id, SUM(purchase_price) OVER (
> PARTITION BY user_id
> ORDER BY __ts__
> RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT_TIMESTAMP
> )
> from raw_purchase;
> -----------------------------------------
>
> I have seen the implementation of the RowTimeRangeBoundedPrecedingFunction
> class. It is very simple to add support for CURRENT_TIMESTAMP.
>


-- 
祝您  身体健康
         工作愉快!

林挺滨