You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Richard Tia (Jira)" <ji...@apache.org> on 2022/07/12 20:40:00 UTC

[jira] [Created] (ARROW-17061) [Python] Acero consumer is unable to consume count function from substrait query plan

Richard Tia created ARROW-17061:
-----------------------------------

             Summary: [Python] Acero consumer is unable to consume count function from substrait query plan
                 Key: ARROW-17061
                 URL: https://issues.apache.org/jira/browse/ARROW-17061
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Richard Tia


SQL
{code:java}
select
  l_returnflag,
  l_linestatus,
  sum(l_quantity) as sum_qty,
  sum(l_extendedprice) as sum_base_price,
  sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
  avg(l_quantity) as avg_qty,
  avg(l_extendedprice) as avg_price,
  avg(l_discount) as avg_disc,
  count(*) as count_order
from
  '{}'
where
  l_shipdate <= date '1998-12-01' - interval '120' day (3)
group by
  l_returnflag,
  l_linestatus
order by
  l_returnflag,
  l_linestatus {code}
The substrait plan generated from SQL, using Isthmus.

 

substrait count: 

[https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml]

 

Running the substrait plan with Acero returns this error:
{code:java}
E   pyarrow.lib.ArrowInvalid: JsonToBinaryStream returned INVALID_ARGUMENT:(relations[0].root.input.sort.input.aggregate.measures[7].measure) arguments: Cannot find field.
 {code}
 

From substrait query plan:

relations[0].root.input.sort.input.aggregate.measures[7].measure
{code:java}
"measure": {
  "functionReference": 7,
  "args": [],
  "sorts": [],
  "phase": "AGGREGATION_PHASE_INITIAL_TO_RESULT",
  "outputType": {
    "i64": {
      "typeVariationReference": 0,
      "nullability": "NULLABILITY_REQUIRED"
    }
  },
  "invocation": "AGGREGATION_INVOCATION_ALL",
  "arguments": []
} {code}
{code:java}
"extensionFunction": {
  "extensionUriReference": 3,
  "functionAnchor": 7,
  "name": "count:opt"
} {code}
Count is a unary function and should be consumable, but isn't in this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)