You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Daniel Becker (Jira)" <ji...@apache.org> on 2022/01/04 10:51:00 UTC

[jira] [Created] (IMPALA-11067) Unify struct subexpressions in rows

Daniel Becker created IMPALA-11067:
--------------------------------------

             Summary: Unify struct subexpressions in rows
                 Key: IMPALA-11067
                 URL: https://issues.apache.org/jira/browse/IMPALA-11067
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Daniel Becker


If a column is given multiple times in the select list, it is not duplicated under the hood in the row because we recognise that multiple columns in the result reference the same actual column, therefore the row size does not increase:

 
{code:java}
explain select id, outer_struct from functional_orc_def.complextypes_nested_structs;
Query: explain select id, outer_struct from functional_orc_def.complextypes_nested_structs
+---------------------------------------------------------------+
| Explain String                                                |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
| Per-Host Resource Estimates: Memory=20MB                      |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
|    HDFS partitions=1/1 files=1 size=1.18KB                    |
|    row-size=64B cardinality=5                                 |
+---------------------------------------------------------------+
{code}
With the id column duplicated:

 
{code:java}
explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs;
Query: explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs
+---------------------------------------------------------------+
| Explain String                                                |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
| Per-Host Resource Estimates: Memory=20MB                      |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
|    HDFS partitions=1/1 files=1 size=1.18KB                    |
|    row-size=64B cardinality=5                                 |
+---------------------------------------------------------------+
{code}

However, if we query a struct and a subfield of the same struct, we do not reuse the existing slot in the row but duplicate the subexpression, increasing the row size:

 
{code:java}
explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs;
Query: explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs
+---------------------------------------------------------------+
| Explain String                                                |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.09MB Threads=2    |
| Per-Host Resource Estimates: Memory=20MB                      |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
|    HDFS partitions=1/1 files=1 size=1.18KB                    |
|    row-size=80B cardinality=5                                 |
+---------------------------------------------------------------+
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org