You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2020/01/21 03:06:00 UTC

[jira] [Resolved] (FLINK-15497) Streaming TopN operator doesn't reduce outputs when rank number is not required

     [ https://issues.apache.org/jira/browse/FLINK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jark Wu resolved FLINK-15497.
-----------------------------
    Fix Version/s:     (was: 1.9.2)
       Resolution: Fixed

1.11.0: bf0945769278079f0e004080b78e5dfe87f7c320
1.11.0: 9c7fc0d4eb91fc486f8497ab07e55aa7e15222de

> Streaming TopN operator doesn't reduce outputs when rank number is not required 
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-15497
>                 URL: https://issues.apache.org/jira/browse/FLINK-15497
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.9.1
>            Reporter: Kurt Young
>            Assignee: Jing Zhang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> As we described in the doc: [https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#top-n]
> when rank number is not required, we can reduce some output, like unnecessary retract messages. 
> Here is an example which can re-produce:
> {code:java}
> val data = List(
>   ("aaa", 97.0, 200.0),
>   ("bbb", 67.0, 200.0),
>   ("bbb", 162.0, 200.0)
> )
> val ds = failingDataSource(data).toTable(tEnv, 'guid, 'a, 'b)
> tEnv.registerTable("T", ds)
> val aggreagtedTable = tEnv.sqlQuery(
>   """
>     |select guid,
>     |    sum(a) as reached_score,
>     |    sum(b) as max_score,
>     |    sum(a) / sum(b) as score
>     |from T group by guid
>     |""".stripMargin
> )
> tEnv.registerTable("T2", aggreagtedTable)
> val sql =
>   """
>     |SELECT guid, reached_score, max_score, score
>     |FROM (
>     |  SELECT *,
>     |      ROW_NUMBER() OVER (ORDER BY score DESC) as rank_num
>     |  FROM T2)
>     |WHERE rank_num <= 5
>   """.stripMargin
> val sink = new TestingRetractSink
> tEnv.sqlQuery(sql).toRetractStream[Row].addSink(sink).setParallelism(1)
> env.execute()
> {code}
> In this case, the output is:
> {code:java}
> (true,aaa,97.0,200.0,0.485)
> (true,bbb,67.0,200.0,0.335) 
> (false,bbb,67.0,200.0,0.335) 
> (true,bbb,229.0,400.0,0.5725) 
> (false,aaa,97.0,200.0,0.485) 
> (true,aaa,97.0,200.0,0.485)
> {code}
> But the last 2 messages are unnecessary. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)