You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kurt Young (Jira)" <ji...@apache.org> on 2020/01/07 06:48:00 UTC

[jira] [Commented] (FLINK-15497) Streaming TopN operator doesn't reduce outputs when rank number is not required

    [ https://issues.apache.org/jira/browse/FLINK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009400#comment-17009400 ] 

Kurt Young commented on FLINK-15497:
------------------------------------

cc [~jark]  [~jinyu.zj]

> Streaming TopN operator doesn't reduce outputs when rank number is not required 
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-15497
>                 URL: https://issues.apache.org/jira/browse/FLINK-15497
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.9.1
>            Reporter: Kurt Young
>            Priority: Major
>             Fix For: 1.9.2, 1.10.0
>
>
> As we described in the doc: [https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#top-n]
> when rank number is not required, we can reduce some output, like unnecessary retract messages. 
> Here is an example which can re-produce:
> {code:java}
> val data = List(
>   ("aaa", 97.0, 200.0),
>   ("bbb", 67.0, 200.0),
>   ("bbb", 162.0, 200.0)
> )
> val ds = failingDataSource(data).toTable(tEnv, 'guid, 'a, 'b)
> tEnv.registerTable("T", ds)
> val aggreagtedTable = tEnv.sqlQuery(
>   """
>     |select guid,
>     |    sum(a) as reached_score,
>     |    sum(b) as max_score,
>     |    sum(a) / sum(b) as score
>     |from T group by guid
>     |""".stripMargin
> )
> tEnv.registerTable("T2", aggreagtedTable)
> val sql =
>   """
>     |SELECT guid, reached_score, max_score, score
>     |FROM (
>     |  SELECT *,
>     |      ROW_NUMBER() OVER (ORDER BY score DESC) as rank_num
>     |  FROM T2)
>     |WHERE rank_num <= 5
>   """.stripMargin
> val sink = new TestingRetractSink
> tEnv.sqlQuery(sql).toRetractStream[Row].addSink(sink).setParallelism(1)
> env.execute()
> {code}
> In this case, the output is:
> {code:java}
> (true,aaa,97.0,200.0,0.485)
> (true,bbb,67.0,200.0,0.335) 
> (false,bbb,67.0,200.0,0.335) 
> (true,bbb,229.0,400.0,0.5725) 
> (false,aaa,97.0,200.0,0.485) 
> (true,aaa,97.0,200.0,0.485)
> {code}
> But the last 2 messages are unnecessary. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)