You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "sunjincheng (JIRA)" <ji...@apache.org> on 2017/05/03 02:23:04 UTC
[jira] [Comment Edited] (FLINK-6428) Add support DISTINCT in dataStream SQL

    [ https://issues.apache.org/jira/browse/FLINK-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994172#comment-15994172 ] 

sunjincheng edited comment on FLINK-6428 at 5/3/17 2:22 AM:
------------------------------------------------------------

Hi [~rtudoran], Thanks for pay attention to this JIRA.

In standard database there are two situations can using `DISTINCT` keyword. 
*  in `SELECT Clause`, e.g.: `SELECT DISTINCT name FROM table` 
*  in `AGG Clause`, e.g.: `COUNT([ALL|DISTINCT] expression)`,`SUM([ALL|DISTINCT] expression)`, etc. 

First up, [FLINK-6249 | https://issues.apache.org/jira/browse/FLINK-6249] talk about  `AGG Clause`. And in this JIRA. talk about `SELECT Clause`.

Next up, we talk about growing elements, the limitations tend to be back-end storage(flink state). In theory, external storage is infinitely large (user can control and expect), this point of view, the infinite STREAM of the DISTINCT can be supported.In addition, external storage, for example: RocksDB, the user can set the TTL according to the actual amount of business data to ensure that external storage is working properly.

So, IMO. we can support `DISTINCT` feature in `SELECT Clause`, And reminds the user to pay attention to the control of external storage. What do you think?

Thanks,
SunJincheng


was (Author: sunjincheng121):
Hi [~rtudoran], Thanks for pay attention to this JIRA.

In standard database there are two situations can using `DISTINCT` keyword. 
*  in `SELECT Clause`, e.g.: `SELECT DISTINCT name FROM table` 
*  in `AGG Clause`, e.g.: `COUNT([ALL|DISTINCT] expression)`,`SUM([ALL|DISTINCT] expression)`, etc. 

First up, [FLINK-6249 | https://issues.apache.org/jira/browse/FLINK-6249] talk about  `AGG Clause`. And in this JIRA. talk about `SELECT Clause`.

Next up, we talk about growing elements, the limitations tend to be back-end storage(flink state). In theory, external storage is infinitely large (user can control and expect), this point of view, the infinite STREAM of the DISTINCT can be supported.In addition, external storage, for example: RocksDB, the user can set the TTL according to the actual amount of business data to ensure that external storage is working properly.

So, IMO. we can support `DISTINCT` feature, And reminds the user to pay attention to the control of external storage. What do you think?

Thanks,
SunJincheng

> Add support DISTINCT in dataStream SQL
> --------------------------------------
>
>                 Key: FLINK-6428
>                 URL: https://issues.apache.org/jira/browse/FLINK-6428
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> Add support DISTINCT in dataStream SQL as follow:
> DATA:
> {code}
> (name, age)
> (kevin, 28),
> (sunny, 6),
> (jack, 6)
> {code}
> SQL:
> {code}
> SELECT DISTINCT age FROM MyTable"
> {code}
> RESULTS:
> {code}
> 28, 6
> {code}
> [~fhueske] do we need this feature?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)