You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by 953396112 <13...@qq.com.INVALID> on 2021/10/07 14:06:16 UTC

回复:[DISCUSS] Apply materialized recognition to streaming compute of time-window aggregate.

Hi Xurenhe
&nbsp; &nbsp; &nbsp;Browse the design documents roughly. It seems that your materialized view scene is specialized, not universal and not applicable to Calcite. Now the materialized view recognition framework is extensible. You can customize the materialized view recognition rules and use the `RelOptMaterializations#useMaterializedViews` in your project.


Best,
Zhaohui Xu


------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <xurenhe19910131@gmail.com&gt;;
发送时间:&nbsp;2021年10月7日(星期四) 晚上9:30
收件人:&nbsp;"dev"<dev@calcite.apache.org&gt;;

主题:&nbsp;[DISCUSS] Apply materialized recognition to streaming compute of time-window aggregate.



Hello, calcite developers!

Framework of Flink is widely used in our company for data cleaning and
simple processing. And we found that some tasks can be rewritten and
re-arranged by the materialized recognition, which reduces the overall
computing resources.
Calcite has the SQL semantics of streaming compute. We hope to discuss the
integration of streaming compute and materialized recognition with the
Calcite community.


Let me briefly describe it through the following example:
-- basic streaming table.
-- pay_time is marked for event_time

&gt; create table bill_tbl(
&gt; bill_id bigint not null,
&gt; product_id bigint not null,
&gt; consumer_id bigint not null,
&gt; consumer_age int not null,
&gt; payment double,
&gt; pay_time bigint not null)



Task1:

&gt; create view t1(product_id, consumer_age, pay_sum, cnt) as
&gt; select product_id, consumer_age, sum(payment) as pay_sum, count(1) as cnt,
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; tumble_rowtime(pay_time, interval '1' hour) as row_time
&gt; from bill_tbl
&gt; group by tumble(pay_time, interval '1' hour),
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; product_id, consumer_age


Task2:

&gt; create view t2(consumer_age, pay_sum) as
&gt; select consumer_age, sum(payment)
&gt; from bill_tbl
&gt; group by tumble(pay_time, interval '1' hour),
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; consumer_age


Task3:

&gt; create view t3(consumer_age, pay_sum) as
&gt; select consumer_age, sum(payment)
&gt; from bill_tbl
&gt; group by hop(pay_time, interval '1' hour, interval '2' hour),
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; consumer_age


---------------------------------------------------------------------------
*After rewriting by mv-matching, task2 and task3 could be based on task1*

Task2:

&gt; create view t2 (consumer_age, pay_sum) as
&gt; select consumer_age, sum(pay_sum)
&gt; from t1
&gt; group by tumble(row_time, interval '1' hour),
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; consumer_age


Task3:

&gt; create view t3 (consumer_age, pay_sum) as
&gt; select consumer_age, sum(pay_sum)
&gt; from t1
&gt; group by hop(row_time, interval '1' hour, interval '2' hour),
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; consumer_age


---------------------------------------------------------------------------

Here, I have a simple document. Welcome to give me some advice
DOC:
https://docs.google.com/document/d/1LtSgxhwvnpk2uAXFQiMSX-aQjnkJ6KLrXwXJX-VbsbE/edit


Would love to hear your thoughts!
Xurenhe