You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Hans Zeller (JIRA)" <ji...@apache.org> on 2016/12/23 19:24:58 UTC

[jira] [Resolved] (TRAFODION-2392) Avoid a costly sort for highly reducing TMUDFs

     [ https://issues.apache.org/jira/browse/TRAFODION-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hans Zeller resolved TRAFODION-2392.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1-incubating

Fix checked in on 12/23/2016 with https://github.com/apache/incubator-trafodion/pull/882

> Avoid a costly sort for highly reducing TMUDFs
> ----------------------------------------------
>
>                 Key: TRAFODION-2392
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2392
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: sql-cmp
>    Affects Versions: 2.0-incubating
>         Environment: Any
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>             Fix For: 2.1-incubating
>
>
> When an input table with a PARTITION BY is specified in a TMUDF, the Trafodion optimizer ensures that the input rows are sorted on (a permutation of) the PARTITION BY columns, so that each parallel TMUDF instance sees the input rows of such a logical partition in contiguous rows. This way the TMUDF can process each group separately.
> This is usually a good way to process the data, except when we are dealing with a large input table and a TMUDF that highly reduces the input data. In that case it may be better to maintain a hash table of groups in the TMUDF and to avoid the costly sort of the input table.
> My proposal is to add a new function type to UDRInvocationInfo.FunctionType, called REDUCER_NC (for Non-Contiguous). Setting the function type to this new type would indicate to the optimizer not to request a sort order on the partitioning columns.
> The table below shows how the function type and PARTITION BY and ORDER BY clauses would determine the effective sort order produced by the optimizer:
> ||Function type||PARTITION BY||ORDER BY||Data is sorted by||
> |REDUCER (existing)|a,b|c,d|a,b,c,d|
> |REDUCER (existing)|a,b|<empty>|a,b|
> |REDUCER_NC (proposed)|a,b|c,d|c,d|
> |REDUCER_NC (proposed)|a,b|<empty>|<no sort>|
> In all other aspects, REDUCER and REDUCER_NC function types would behave the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)