You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Shaoxuan Wang (JIRA)" <ji...@apache.org> on 2017/01/17 03:38:26 UTC

[jira] [Comment Edited] (FLINK-5386) Refactoring Window Clause

    [ https://issues.apache.org/jira/browse/FLINK-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824844#comment-15824844 ] 

Shaoxuan Wang edited comment on FLINK-5386 at 1/17/17 3:37 AM:
---------------------------------------------------------------

[~sunjincheng121], thanks for the updates.

Hi [~fhueske],
The major reason we propose this change is that window is a clause but not an operator from semantic point of view.
This changes give the flexibility to users such that they still can put window clause and groupby close (just move the window definition before groupby) if they want.
I think your have raised a good question on "scope of window" for batch window on a certain column (which could be removed by some operators). We should make sure this will still work. We will check the design and add the test cases for this.



was (Author: shaoxuanwang):
[~sunjincheng121], thanks for the updates.

Hi [~fhueske],
The major reason we propose this change is because of the row window. For row window, there could be no groupby keys. As the current proposal in FLIP11, the tableAPI is as follows:
{code}
  .window(RowWindow as ‘x)
  .select(‘b.count over ‘x as ‘xcnt, ‘x.start, ‘x.end)
{code}
If we want to partition the data and trigger the result using window function, we have to translate the .window operator to a kind of grouping by query plan, which is a little weird. With this proposal, groupby operator will be able to not only groupby keys, but also window clause. I think this is the correct semantic. The above example will be written in this way: 
{code}
  .window(RowWindow as ‘x)
  .groupby('x)
  .select(‘b.count over ‘x as ‘xcnt, ‘x.start, ‘x.end)
{code}
What do you think?

This changes give more flexibility to users such that they can still put window clause and groupby close (just move the window definition before groupby) if they want.
I think your have raised a good question on "scope of window" for batch window on a certain column (which could be removed by some operators). We should make sure this will still work. We will check the design and add the test cases for this.


> Refactoring Window Clause
> -------------------------
>
>                 Key: FLINK-5386
>                 URL: https://issues.apache.org/jira/browse/FLINK-5386
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> Similar to the SQL, window clause is defined "as" a symbol which is explicitly used in groupby/over. We are proposing to refactor the way to write groupby+window tableAPI as follows: 
> {code}
> val windowedTable = table
>  .window(Slide over 10.milli every 5.milli as 'w1)
>  .window(Tumble over 5.milli  as 'w2)
>  .groupBy('w1, 'key)
>  .select('string, 'int.count as 'count, 'w1.start)
>  .groupBy( 'w2, 'key)
>  .select('string, 'count.sum as sum2)
>  .window(Tumble over 5.milli  as 'w3)
>  .groupBy( 'w3) // windowAll
>  .select('sum2, 'w3.start, 'w3.end)
> {code}
> In this way, we can remove both GroupWindowedTable and the window() method in GroupedTable which makes the API a bit clean. In addition, for row-window, we anyway need to define window clause as a symbol. This change will make the API of window and row-window consistent, example for row-window:
> {code}
>   .window(RowXXXWindow as ‘x, RowYYYWindow as ‘y)
>   .select(‘a, ‘b.count over ‘x as ‘xcnt, ‘c.count over ‘y as ‘ycnt, ‘x.start, ‘x.end)
> {code}
> What do you think? [~fhueske] [~twalthr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)