You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2021/07/26 07:54:30 UTC

[Impala-ASF-CR] WiP: IMPALA-9495: Support struct in select list for ORC tables

Gabor Kaszab has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17638 )

Change subject: WiP: IMPALA-9495: Support struct in select list for ORC tables
......................................................................

WiP: IMPALA-9495: Support struct in select list for ORC tables

This patch implements the functionality to allow structs in the select
list. When displaying the value of a struct it is formatted into a JSON
value and returned as a string. An example of such a value:

SELECT struct_col FROM some_table;
'{"int_struct_member":12,"string_struct_member":"string value"}'

-- Changes related to tuple and slot descriptors:
When providing a struct in the select list there is going to be a
SlotDescriptor for the struct slot in the topmost TupleDescriptor.
Additionally, another TupleDesriptor is created to hold SlotDescriptors
for each of the struct's children. The struct SlotDescriptor points to the
newly introduced TupleDescriptor using 'itemTupleId'.
The offsets for the children of the struct is calculated from the beginning
of the topmost TupleDescriptor and not from the TupleDescriptor that
directly holds the struct's children. The null indicator bytes as well are
stored on the level of the topmost TupleDescriptor.

-- Changes related to scalar expressions:
A struct in the select list is translated into an expression tree where the
top of this tree is a SlotRef for the struct itself and its children in the
tree are SlotRefs for the members of the struct. When evaluating a struct
SlotRef after the null checks the evaluation is delegated to the children
SlotRefs.

-- Internal representation of a struct:
When scanning a struct the rowbatch will hold the values of the struct's
children as if they were queried one by one directly in the select list.

E.g. Taking the following table:
CREATE TABLE tbl (id int, s struct<a:int,b:string>) STORED AS ORC

And running the following query:
SELECT id, s FROM tbl;

After scanning the row batch will hold the following values:
(note the biggest size comes first)
 1: The pointer for the string in s.b
 2: The length for the string in s.b
 3: The int value for s.a
 4: The int value of id
 5: A single null byte for all the slots: id, s, s.a, s.b

When evaluating a struct as a SlotRef a newly introduced StructVal will be
used to refer to the actual values of a struct in the row batch. This
StructVal holds a vector of pointers where each pointer represents a member
of the struct. Following the above example the StructVal would keep two
pointers, one to point to an IntVal and one to point to a StringVal.

-- Restrictions:
  - Codegen support is not included in this patch.
  - Only ORC file format is supported by this patch.
  - Only HS2 client supports returning structs. Beeswax support is not
    implemented as it is going to be deprecated anyway. Currently we receive
    an error when trying to query a struct through Beeswax.

Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exprs/expr-value.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-expr.inline.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/service/hs2-util.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/query-result-set.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M be/src/util/debug-util.cc
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/StructType.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java
A testdata/ComplexTypesTbl/structs.orc
A testdata/ComplexTypesTbl/structs.parq
A testdata/ComplexTypesTbl/structs_nested.orc
A testdata/ComplexTypesTbl/structs_nested.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test
M tests/query_test/test_nested_types.py
59 files changed, 1,565 insertions(+), 337 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17638/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>