You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2020/01/08 10:25:00 UTC
[jira] [Assigned] (FLINK-15497) Streaming TopN operator doesn't
reduce outputs when rank number is not required
[ https://issues.apache.org/jira/browse/FLINK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jark Wu reassigned FLINK-15497:
-------------------------------
Assignee: Jing Zhang
> Streaming TopN operator doesn't reduce outputs when rank number is not required
> --------------------------------------------------------------------------------
>
> Key: FLINK-15497
> URL: https://issues.apache.org/jira/browse/FLINK-15497
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.9.1
> Reporter: Kurt Young
> Assignee: Jing Zhang
> Priority: Major
> Fix For: 1.9.2, 1.10.0
>
>
> As we described in the doc: [https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#top-n]
> when rank number is not required, we can reduce some output, like unnecessary retract messages.
> Here is an example which can re-produce:
> {code:java}
> val data = List(
> ("aaa", 97.0, 200.0),
> ("bbb", 67.0, 200.0),
> ("bbb", 162.0, 200.0)
> )
> val ds = failingDataSource(data).toTable(tEnv, 'guid, 'a, 'b)
> tEnv.registerTable("T", ds)
> val aggreagtedTable = tEnv.sqlQuery(
> """
> |select guid,
> | sum(a) as reached_score,
> | sum(b) as max_score,
> | sum(a) / sum(b) as score
> |from T group by guid
> |""".stripMargin
> )
> tEnv.registerTable("T2", aggreagtedTable)
> val sql =
> """
> |SELECT guid, reached_score, max_score, score
> |FROM (
> | SELECT *,
> | ROW_NUMBER() OVER (ORDER BY score DESC) as rank_num
> | FROM T2)
> |WHERE rank_num <= 5
> """.stripMargin
> val sink = new TestingRetractSink
> tEnv.sqlQuery(sql).toRetractStream[Row].addSink(sink).setParallelism(1)
> env.execute()
> {code}
> In this case, the output is:
> {code:java}
> (true,aaa,97.0,200.0,0.485)
> (true,bbb,67.0,200.0,0.335)
> (false,bbb,67.0,200.0,0.335)
> (true,bbb,229.0,400.0,0.5725)
> (false,aaa,97.0,200.0,0.485)
> (true,aaa,97.0,200.0,0.485)
> {code}
> But the last 2 messages are unnecessary.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)