You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2016/02/06 21:50:40 UTC

[jira] [Commented] (HIVE-11022) Support collecting lists in user defined order

    [ https://issues.apache.org/jira/browse/HIVE-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135999#comment-15135999 ] 

Ruslan Dautkhanov commented on HIVE-11022:
------------------------------------------

1) Would it be possible to make COLLECT_LIST_SORTED() more generic so it would work with STRUCTs too?
  E.g. you could produce a list of structs that are sorted by one of the columns? We have nested collections of structs quite often, one of them attributes being dates.
2) It would be great to have the same for COLLECT_SET_SORTED(), which would work exactly like COLLECT_LIST_SORTED by elements are deduped first.

> Support collecting lists in user defined order
> ----------------------------------------------
>
>                 Key: HIVE-11022
>                 URL: https://issues.apache.org/jira/browse/HIVE-11022
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>            Reporter: Michael Haeusler
>
> Hive currently supports aggregation of lists "in order of input rows" with the UDF collect_list. Unfortunately, the order is not well defined when map-side aggregations are used.
> Hive could support collecting lists in user-defined order by providing a UDF
> COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list of values sorted in a user defined order. An optional limit parameter can restrict this to the n first values within that order.
> Especially in the limit case, this can be efficiently pre-aggregated and reduces the amount of data transferred to reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)