You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/23 22:09:00 UTC

[jira] [Updated] (IMPALA-2875) Optimize subplans when the following plan nodes do not require parent rows.

     [ https://issues.apache.org/jira/browse/IMPALA-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong updated IMPALA-2875:
----------------------------------
    Priority: Major  (was: Critical)

> Optimize subplans when the following plan nodes do not require parent rows.
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-2875
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2875
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.3.0
>            Reporter: Alexander Behm
>            Priority: Major
>              Labels: nested_types, performance, planner
>
> Consider the following query that references nested collections and its plan:
> Query:
> {code}
> select count(*) from tpch_nested_parquet.customer c, c.c_orders.o_lineitems l
> where c.c_mktsegment = "AUTOMOBILE"
> group by l.l_returnflag
> {code}
> Plan:
> {code}
> +------------------------------------------------------------------------------------+
> | Explain String                                                                     |
> +------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=304.00MB VCores=2                          |
> | WARNING: The following tables are missing relevant table and/or column statistics. |
> | tpch_nested_parquet.customer                                                       |
> |                                                                                    |
> | 08:EXCHANGE [UNPARTITIONED]                                                        |
> | |                                                                                  |
> | 07:AGGREGATE [FINALIZE]                                                            |
> | |  output: count:merge(*)                                                          |
> | |  group by: l.l_returnflag                                                        |
> | |                                                                                  |
> | 06:EXCHANGE [HASH(l.l_returnflag)]                                                 |
> | |                                                                                  |
> | 05:AGGREGATE                                                                       |
> | |  output: count(*)                                                                |
> | |  group by: l.l_returnflag                                                        |
> | |                                                                                  |
> | 01:SUBPLAN                                                                         |
> | |                                                                                  |
> | |--04:NESTED LOOP JOIN [CROSS JOIN]                                                |
> | |  |                                                                               |
> | |  |--02:SINGULAR ROW SRC                                                          |
> | |  |                                                                               |
> | |  03:UNNEST [c.c_orders.o_lineitems l]                                            |
> | |                                                                                  |
> | 00:SCAN HDFS [tpch_nested_parquet.customer c]                                      |
> |    partitions=1/1 files=4 size=554.13MB                                            |
> |    predicates: c.c_mktsegment = 'AUTOMOBILE'                                       |
> +------------------------------------------------------------------------------------+
> {code}
> In execution, we spend a lot of time evaluating and resetting the nested-loop join.
> However, for this query the plan after the subplan node does not need the parent rows at all, so we could improve this query by only having an unnest node inside the subplan.
> This optimization is a special case of projection trimming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org