You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Daniel Becker (Jira)" <ji...@apache.org> on 2022/05/02 07:38:00 UTC

[jira] [Resolved] (IMPALA-11067) Unify struct subexpressions in rows

     [ https://issues.apache.org/jira/browse/IMPALA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Becker resolved IMPALA-11067.
------------------------------------
    Resolution: Fixed

> Unify struct subexpressions in rows
> -----------------------------------
>
>                 Key: IMPALA-11067
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11067
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Major
>              Labels: complextype, nested_types
>
> If a column is given multiple times in the select list, it is not duplicated under the hood in the row because we recognise that multiple columns in the result reference the same actual column, therefore the row size does not increase:
>  
> {code:java}
> explain select id, outer_struct from functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct from functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=64B cardinality=5                                 |
> +---------------------------------------------------------------+
> {code}
> With the id column duplicated:
>  
> {code:java}
> explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs;
> Query: explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=64B cardinality=5                                 |
> +---------------------------------------------------------------+
> {code}
> However, if we query a struct and a subfield of the same struct, we do not reuse the existing slot in the row but duplicate the subexpression, increasing the row size:
>  
> {code:java}
> explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=80B cardinality=5                                 |
> +---------------------------------------------------------------+
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)