You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Chengxiang Li (JIRA)" <ji...@apache.org> on 2015/08/20 05:40:45 UTC

[jira] [Commented] (FLINK-2549) Add topK operator for DataSet

    [ https://issues.apache.org/jira/browse/FLINK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704244#comment-14704244 ] 

Chengxiang Li commented on FLINK-2549:
--------------------------------------

The basic idea of implementation is as following:
# In map stage, sort and pick top K elements in each partition.
# A single reduce task handle all map output, sort and pick top K elements as the final result.

To fully manage the memory used for this operator, we may need a customized PriorityQueue which is built upon MemoryManager of Flink to sort unpredictable size elements with fixed size memory, as discussed at [here|https://github.com/apache/flink/pull/949#issuecomment-132692640].

> Add topK operator for DataSet
> -----------------------------
>
>                 Key: FLINK-2549
>                 URL: https://issues.apache.org/jira/browse/FLINK-2549
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core, Java API, Scala API
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>
> topK is a common operation for user, it would be great to have it in Flink. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)