You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Richard Tia (Jira)" <ji...@apache.org> on 2022/07/12 20:40:00 UTC
[jira] [Created] (ARROW-17061) [Python] Acero consumer is unable to consume count function from substrait query plan
Richard Tia created ARROW-17061:
-----------------------------------
Summary: [Python] Acero consumer is unable to consume count function from substrait query plan
Key: ARROW-17061
URL: https://issues.apache.org/jira/browse/ARROW-17061
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Richard Tia
SQL
{code:java}
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
'{}'
where
l_shipdate <= date '1998-12-01' - interval '120' day (3)
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus {code}
The substrait plan generated from SQL, using Isthmus.
substrait count:
[https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml]
Running the substrait plan with Acero returns this error:
{code:java}
E pyarrow.lib.ArrowInvalid: JsonToBinaryStream returned INVALID_ARGUMENT:(relations[0].root.input.sort.input.aggregate.measures[7].measure) arguments: Cannot find field.
{code}
From substrait query plan:
relations[0].root.input.sort.input.aggregate.measures[7].measure
{code:java}
"measure": {
"functionReference": 7,
"args": [],
"sorts": [],
"phase": "AGGREGATION_PHASE_INITIAL_TO_RESULT",
"outputType": {
"i64": {
"typeVariationReference": 0,
"nullability": "NULLABILITY_REQUIRED"
}
},
"invocation": "AGGREGATION_INVOCATION_ALL",
"arguments": []
} {code}
{code:java}
"extensionFunction": {
"extensionUriReference": 3,
"functionAnchor": 7,
"name": "count:opt"
} {code}
Count is a unary function and should be consumable, but isn't in this case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)