You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Assaf Mendelson (JIRA)" <ji...@apache.org> on 2016/09/27 13:34:20 UTC

[jira] [Created] (SPARK-17691) Add aggregate function to collect list with maximum number of elements

Assaf Mendelson created SPARK-17691:
---------------------------------------

             Summary: Add aggregate function to collect list with maximum number of elements
                 Key: SPARK-17691
                 URL: https://issues.apache.org/jira/browse/SPARK-17691
             Project: Spark
          Issue Type: New Feature
            Reporter: Assaf Mendelson
            Priority: Minor


One of the aggregate functions we have today is the collect_list function. This is a useful tool to do a "catch all" aggregation which doesn't really fit anywhere else.

The problem with collect_list is that it is unbounded. I would like to see a means to do a collect_list where we limit the maximum number of elements.

I would see that the input for this would be the maximum number of elements to use and the method of choosing (pick whatever, pick the top N, pick the bottom B)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org