You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "liupengcheng (Jira)" <ji...@apache.org> on 2020/08/04 03:03:00 UTC

[jira] [Comment Edited] (CALCITE-4146) Implement EMIT Syntax

    [ https://issues.apache.org/jira/browse/CALCITE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170512#comment-17170512 ] 

liupengcheng edited comment on CALCITE-4146 at 8/4/20, 3:02 AM:
----------------------------------------------------------------

>hmmm the EMIT will be at end of SQL query and it will propagate through every relational operator. However, technically, I feel like >only TableFuncitonScanRel that works with a stream will be affected. You can think of that a macro batch streaming system >implements it. In that system, EMTI controls the size and frequency of each macro batch (from TableFuncitonScanRel), and all >other join, sort, aggregate are applied on each macro batch.

Hi, [~amaliujia],
Yes, I know this worked like a micro batch, but in real implementation for streams, some operators upon a TableFunctionScanRel should know where is end of the micro batch and when it should perform the calculation and EMIT data to it's upstream(e.g. two stream join of windowTableFunctionScanRel), but others may not(e.g., map like operations). So my doubt is that will the EMIT syntax affect both the TableFunctionScanRel and all it's downstream operators? 
E.g. The following example:
```
select * from windowTableFunctionScanRel1 t1 join windowTableFunctionScanRel2 t2 on  t1.id = t2.id EMIT after watermark;
```
There are probably two implementations:
1. The EMIT is only bind to the window join and the windowTableFunctionScanRel just worked like a map(append some extra window attributes)?
2. The EMIT is bind to windowTableFunctionScanRel and all the downstream operators of windowTableFunctionScanRel . So The windowTableFunctionScanRel will buffer data and EMIT as the specified strategy(e.g. every 1 minute), and so do the window join.

which one is preferred?




was (Author: liupengcheng):
>hmmm the EMIT will be at end of SQL query and it will propagate through every relational operator. However, technically, I feel like >only TableFuncitonScanRel that works with a stream will be affected. You can think of that a macro batch streaming system >implements it. In that system, EMTI controls the size and frequency of each macro batch (from TableFuncitonScanRel), and all >other join, sort, aggregate are applied on each macro batch.

Hi, [~amaliujia],
Yes, I know this worked like a micro batch, but in real implementation for streams, operators upon a TableFunctionScanRel should know where is end of the micro batch and when it should perform the calculation and EMIT data to it's upstream(e.g. two stream join of windowTableFunctionScanRel), so my doubt is that will the EMIT syntax affect both the TableFunctionScanRel and it's downstream operators? 
E.g. The following example:
```
select * from windowTableFunctionScanRel1 t1 join windowTableFunctionScanRel2 t2 where t1.id = t2.id EMIT after watermark;
```
There are probably two implementations:
1. The EMIT is only bind to the window join and the windowTableFunctionScanRel just worked like a map(append some extra window attributes)?
2. The EMIT is bind to windowTableFunctionScanRel and all the downstream operators of windowTableFunctionScanRel . So The windowTableFunctionScanRel will buffer data and EMIT as the specified strategy(e.g. every 1 minute), and so do the window join.

which one is preferred?



> Implement EMIT Syntax
> ---------------------
>
>                 Key: CALCITE-4146
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4146
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>
> The goal is to support the following syntax:
> {code:sql}
> SELECT clause
> FROM TUMBLE/HOP/SESSION
> EMIT AFTER WATERMARK
> {code}
> note that "EMIT AFTER WATERMARK" is the new thing.
> "EMIT AFTER WATERMARK" is proposed in [One SQL to Rule Them All|https://arxiv.org/pdf/1905.12133.pdf]. This idea proposes a way to allow streaming SQL queries control materialization latency. More specifically, it means emit elements in a window once the watermark passes the end of that window.
> There are more context discussed in [CALCITE-3272|https://issues.apache.org/jira/browse/CALCITE-3272?focusedCommentId=17166580&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17166580] and the [EMIT syntax proposal for event-timestamp semantic windowing|https://lists.apache.org/thread.html/r5bd9a6f7af2c0cd81aecd4de512fd889fbf15f112cc3704f188b1d4f%40%3Cdev.calcite.apache.org%3E] email thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)