You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/11/10 06:26:00 UTC
[jira] [Created] (IMPALA-6179) Constant argument to UDAF not accessible in merge phase of distributed execution

Alexander Behm created IMPALA-6179:
--------------------------------------

             Summary: Constant argument to UDAF not accessible in merge phase of distributed execution
                 Key: IMPALA-6179
                 URL: https://issues.apache.org/jira/browse/IMPALA-6179
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
    Affects Versions: Impala 2.10.0
            Reporter: Alexander Behm
         Attachments: 0001-Add-ConstTest-agg-fn.patch

While adding a new built-in aggregation function I noticed that constant arguments are not accessible in the merge phase of the aggregation, i.e.,  in Init(), Merge() or Finalize() of the merge phase.

In this example the 2nd constant argument is only accessible in the per-agg phase, but not in the merge agg phase.
{code}
[localhost:21000] > explain select const_test(bigint_col, 0.1) from functional.alltypes;
Query: explain select const_test(bigint_col, 0.1) from functional.alltypes
+----------------------------------------------+
| Explain String                               |
+----------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B |
| Per-Host Resource Estimates: Memory=148.00MB |
| Codegen disabled by planner                  |
|                                              |
| PLAN-ROOT SINK                               |
| |                                            |
| 03:AGGREGATE [FINALIZE]                      |
| |  output: const_test:merge(bigint_col, 0.1) |
| |                                            |
| 02:EXCHANGE [UNPARTITIONED]                  |
| |                                            |
| 01:AGGREGATE                                 |
| |  output: const_test(bigint_col, 0.1)       |
| |                                            |
| 00:SCAN HDFS [functional.alltypes]           |
|    partitions=24/24 files=24 size=478.45KB   |
+----------------------------------------------+
{code}

With num_nodes=1 the constant argument is accessible in the Finalize() of the single aggregation phase:
[code}
[localhost:21000] > explain select const_test(bigint_col, 0.1) from functional.alltypes;
Query: explain select const_test(bigint_col, 0.1) from functional.alltypes
+----------------------------------------------+
| Explain String                               |
+----------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B |
| Per-Host Resource Estimates: Memory=138.00MB |
| Codegen disabled by planner                  |
|                                              |
| PLAN-ROOT SINK                               |
| |                                            |
| 01:AGGREGATE [FINALIZE]                      |
| |  output: const_test(bigint_col, 0.1)       |
| |                                            |
| 00:SCAN HDFS [functional.alltypes]           |
|    partitions=24/24 files=24 size=478.45KB   |
+----------------------------------------------+
{code}

I've attached a patch produced with git format-patch for to repro the above.

It's not clear whether this behavior is a bug or intended. I can see arguments both ways:
* An aggregation function can take any number of arguments.
* The Update() phase produces a single output slot.
* The Merge() phase consumes the single slot produced by the Update() phase and produces another output slot.
* It would be quite convenient to have access to constant arguments of the original SQL invocation in all phases of the aggregation.
* However, this seems semantically at odds with non-constant arguments. For non-constant arguments one would expect the Update() to aggregate/store whatever state is needed for Merge() in the single output slot. So why should that be different for constant arguments?
* How would the planner decide which arguments to forward to the Merge() phase? What would the BE aggregation function signatures look like? Today, the Merge() phase always takes a single SlotRef as input.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)