You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/09/19 17:20:00 UTC

[jira] [Comment Edited] (ARROW-17484) [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for aggregates

    [ https://issues.apache.org/jira/browse/ARROW-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606681#comment-17606681 ] 

Weston Pace edited comment on ARROW-17484 at 9/19/22 5:19 PM:
--------------------------------------------------------------

Aggregate functions typically have very small outputs compared to the input (e.g. the sum of 1 million rows is a single value) and so it very often makes sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have to cast the entire array of inputs (e.g. the 1 million rows) and this could be rather costly.

Finally, we are mirroring SQL here (which is not, by itself, necessarily a good thing, but it is worth noting).  From the [postgres docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double precision for floating-point arguments, otherwise the same as the argument data type
{quote}





was (Author: westonpace):
Aggregate functions typically have very small outputs compared to the input (e.g. the sum of 1 million rows is a single value) and so it very often makes sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have to cast the entire array of inputs (e.g. the 1 million rows) and this could be rather costly.

Finally, we are mirroring SQL here (which is not, necessarily a good thing, but worth noting).  From the [postgres docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double precision for floating-point arguments, otherwise the same as the argument data type
{quote}




> [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for aggregates
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-17484
>                 URL: https://issues.apache.org/jira/browse/ARROW-17484
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> The current Substrait to Aggregate deserializer doesn't take the plan provided output type as the output type of the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)