You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2023/04/14 09:59:37 UTC
[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19660
Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
As a first stage of IMPALA-10939, this change implements support for
ORDER BY for top-level collections that only contain fixed length types.
For these types the implementation is almost the same as the existing
handling of strings.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
TODO: Implement codegen.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests in test_nested_types.py that have arrays and maps in the
sorting tuple
- TODO: We should also test MAPs on a big table to test spilling but
with the data generator expects an AVRO schema and in AVRO all struct
keys are strings which are var-len.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
A testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list-no-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list-no-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_sort.py
22 files changed, 868 insertions(+), 189 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/3
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 10:
Build Failed
https://jenkins.impala.io/job/gerrit-code-review-checks/13047/ : Initial code review checks failed. See linked job for details on the failure.
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 16:27:27 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 11:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/13048/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 16:53:15 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 13:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9323/ DRY_RUN=true
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:28:42 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 13:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/13066/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:48:59 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests that have arrays and/or maps in the sorting tuple in
test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n}.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/runtime/types.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
31 files changed, 1,099 insertions(+), 281 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/13
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 13: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:27:31 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 9:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/19660/9/testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test
File testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test:
http://gerrit.cloudera.org:8080/#/c/19660/9/testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test@4
PS9, Line 4: order by id limit 5
there is one more case that could be tested: partitioned top n
The examples I found:
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds-partitioned-topn.test
I don't know whether it uses code paths that are untested at the moment, but it would be good to add some basic coverage.
http://gerrit.cloudera.org:8080/#/c/19660/9/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:
http://gerrit.cloudera.org:8080/#/c/19660/9/tests/query_test/test_queries.py@150
PS9, Line 150: self.run_test_case('QueryTest/sort-parquet-orc', vector)
: self.run_test_case('QueryTest/top-n-parquet-orc', vector)
nit: I think that complex or nested type is more descriptive than parquet-orc
note that there are plans to add array support in Kudu
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 07:45:43 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 8:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/13002/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 16:19:22 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests in test_nested_types.py that have arrays and maps in the
sorting tuple
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/sort-parquet-orc.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
26 files changed, 1,001 insertions(+), 251 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/9
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 13: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 19:46:05 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests that have arrays and/or maps in the sorting tuple in
test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n}.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
29 files changed, 1,098 insertions(+), 277 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/12
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 3:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG@16
PS2, Line 16:
> I don't think we have infrastructure to only disable codegen for the sorter
I think that is enough to simply return an error during codegen - codegen errors do not result in query failure just simply reverting to interpreted path and adding a message to the profile
https://github.com/apache/impala/blob/175dae33d9a224ef69decc2d3887e847755d4c9e/be/src/exec/sort-node.cc#L123
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:36:29 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 3:
Build Failed
https://jenkins.impala.io/job/gerrit-code-review-checks/12799/ : Initial code review checks failed. See linked job for details on the failure.
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:20:04 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests that have arrays and/or maps in the sorting tuple in
test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n}.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Reviewed-on: http://gerrit.cloudera.org:8080/19660
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/runtime/types.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
31 files changed, 1,099 insertions(+), 281 deletions(-)
Approvals:
Csaba Ringhofer: Looks good to me, approved
Impala Public Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 7:
(4 comments)
http://gerrit.cloudera.org:8080/#/c/19660/7/be/src/runtime/collection-value.h
File be/src/runtime/collection-value.h:
http://gerrit.cloudera.org:8080/#/c/19660/7/be/src/runtime/collection-value.h@60
PS7, Line 60: int64_t byte_size;
int32_t should be enough to store this, as the whole row should fit to that size AFAIK
I don't mind using int64_t, but some comment could be added about the expected byte sizes
http://gerrit.cloudera.org:8080/#/c/19660/7/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:
http://gerrit.cloudera.org:8080/#/c/19660/7/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@407
PS7, Line 407: // TODO: Should we include datetime?
: return itemType.isBoolean() ||
: itemType.isScalarType(PrimitiveType.TINYINT) ||
: itemType.isScalarType(PrimitiveType.SMALLINT) ||
: itemType.isScalarType(PrimitiveType.INT) ||
: itemType.isScalarType(PrimitiveType.BIGINT) ||
: itemType.isScalarType(PrimitiveType.FLOAT) ||
: itemType.isScalarType(PrimitiveType.DOUBLE) ||
: itemType.isDecimal() ||
: itemType.isTimestamp() ||
: itemType.isDate();
could be clearer and shorter with !itemType.isStringType() && !itemType.isStringType().isComplexType()
http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
File testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test:
http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test@38
PS7, Line 38: # Sort a collection.
About the location of the tests: adding them to sort.test could ensure that they are run with a wider set of sorting related options.
http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test@39
PS7, Line 39:
Does the patch also support ordey by limit N?
Can you also add some tests for it, even if it is not supported?
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 May 2023 11:00:44 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 3:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG@16
PS2, Line 16:
> I think that the queries should be able to run without disabling codegen wi
I don't think we have infrastructure to only disable codegen for the sorter, I don't know how difficult that would be... But I think we should try adding codegen in this change.
http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG@27
PS2, Line 27: keys are strings which are var-len.
> I think that it is ok to disable MAP sorting for now in the FE. My understa
Sorting MAPs actually works, I added a test for it, but only on a small table. The problem is that our data generator can't generate maps with only fixed len data because the key has to be a string.
http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:
http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/sorter.cc@232
PS2, Line 232:
> 'None' refers to the previous sentence, which changed. Could you rephrase t
Done
http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/sorter.cc@234
PS2, Line 234: // is here.
: DCHECK_EQ(&var_len_pages_.back(), cur_var_len_page);
: uint8_t* var_data_ptr = cur_var_len_page->AllocateBytes(total_var_len);
: if (INITIAL_RUN) {
: CopyVarLenData(string_values, coll_values_and_sizes, var_data_ptr);
: } else {
: CopyVarLenDataConvertOffset(string_values, coll_values_and_sizes,
: var_len_pages_.size() - 1, cur_var_len_page->data(), var_data_ptr);
: }
: }
: ++num_tuples_;
: ++*num_processed;
: ++cur_input_index;
: }
:
: // If there are still rows left to process, get a new page for the fixed-length
: // tuples. If the run is already too long, return.
: if (cur_input_index < batch->num_rows()) {
: bool added;
: RETURN_IF_ERROR(TryAddPage(add_mode, &fixed_len_pages_, &added));
: if (!added) return Status::OK();
: cur_fixed_len_page = &fixed_len_pages_.back();
: }
: }
: return Status::OK();
> This check should be done outside this hot loop, e.g. during initialization
We used to allow "illegal" tuples if they are NULL (see L259 in PS2). For this we need to read the actual data that is not available at initialisation. On the other hand I don't think it's important that we allow these NULL tuples, especially as the FE will not allow them.
I moved this check to Run::Init() and removed the exception to allow NULL tuples.
Also, this is only a DCHECK in the end, so the compiler should optimise it out in release builds.
http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:
http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/tuple.cc@236
PS2, Line 236:
RawValue::Write now handles NULL values, we don't have to handle it here. If the value is NULL then 'string_values' and 'collection_values' will remain empty and we don't enter the IF-THEN blocks.
http://gerrit.cloudera.org:8080/#/c/19660/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:
http://gerrit.cloudera.org:8080/#/c/19660/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@369
PS2, Line 369: MapType mapType = (MapType) type;
> MAP sorting is not supported at the moment, right?
No, it works, we just don't have a large table to test it on. I added a small table.
http://gerrit.cloudera.org:8080/#/c/19660/2/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:
http://gerrit.cloudera.org:8080/#/c/19660/2/tests/query_test/test_sort.py@312
PS2, Line 312: query_result = self.execute_query(quer
> This is not clear to me - are we comparing only single a column? Shouldn't
You're right, I changed it to compare whole rows (and transposition is not needed).
--
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:01:47 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )
Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................
Patch Set 8:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/19660/8/testdata/ComplexTypesTbl/simple_arrays_big.parq
File testdata/ComplexTypesTbl/simple_arrays_big.parq:
http://gerrit.cloudera.org:8080/#/c/19660/8/testdata/ComplexTypesTbl/simple_arrays_big.parq@1
PS8, Line 1: PAR1 â â ,À¸ % % ( Q R &