You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:34:45 UTC

[jira] [Resolved] (SPARK-17691) Add aggregate function to collect list with maximum number of elements

     [ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-17691.
----------------------------------
    Resolution: Incomplete

> Add aggregate function to collect list with maximum number of elements
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17691
>                 URL: https://issues.apache.org/jira/browse/SPARK-17691
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Assaf Mendelson
>            Priority: Minor
>              Labels: bulk-closed
>
> One of the aggregate functions we have today is the collect_list function. This is a useful tool to do a "catch all" aggregation which doesn't really fit anywhere else.
> The problem with collect_list is that it is unbounded. I would like to see a means to do a collect_list where we limit the maximum number of elements.
> I would see that the input for this would be the maximum number of elements to use and the method of choosing (pick whatever, pick the top N, pick the bottom B)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org