You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2023/04/14 09:59:37 UTC

[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19660


Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
ORDER BY for top-level collections that only contain fixed length types.
For these types the implementation is almost the same as the existing
handling of strings.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

TODO: Implement codegen.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests in test_nested_types.py that have arrays and maps in the
   sorting tuple
 - TODO: We should also test MAPs on a big table to test spilling but
   with the data generator expects an AVRO schema and in AVRO all struct
   keys are strings which are var-len.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
A testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list-no-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list-no-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_sort.py
22 files changed, 868 insertions(+), 189 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 10:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13047/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 16:27:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13048/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 16:53:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9323/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:28:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13066/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:48:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests that have arrays and/or maps in the sorting tuple in
   test_queries.py::TestQueries::{test_sort,
       test_top_n,test_partitioned_top_n}.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/runtime/types.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
31 files changed, 1,099 insertions(+), 281 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/13
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 13: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:27:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19660/9/testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test
File testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test:

http://gerrit.cloudera.org:8080/#/c/19660/9/testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test@4
PS9, Line 4:  order by id limit 5
there is one more case that could be tested: partitioned top n
The examples I found:
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test

https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds-partitioned-topn.test

I don't know whether it uses code paths that are untested at the moment, but it would be good to add some basic coverage.


http://gerrit.cloudera.org:8080/#/c/19660/9/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

http://gerrit.cloudera.org:8080/#/c/19660/9/tests/query_test/test_queries.py@150
PS9, Line 150:       self.run_test_case('QueryTest/sort-parquet-orc', vector)
             :       self.run_test_case('QueryTest/top-n-parquet-orc', vector)
nit: I think that complex or nested type is more descriptive than parquet-orc

note that there are plans to add array support in Kudu



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 07:45:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13002/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 16:19:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests in test_nested_types.py that have arrays and maps in the
   sorting tuple

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/sort-parquet-orc.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-parquet-orc.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
26 files changed, 1,001 insertions(+), 251 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 13: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 19:46:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests that have arrays and/or maps in the sorting tuple in
   test_queries.py::TestQueries::{test_sort,
       test_top_n,test_partitioned_top_n}.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
29 files changed, 1,098 insertions(+), 277 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/12
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG@16
PS2, Line 16: 
> I don't think we have infrastructure to only disable codegen for the sorter
I think that is enough to simply return an error during codegen - codegen errors do not result in query failure just simply reverting to interpreted path and adding a message to the profile
https://github.com/apache/impala/blob/175dae33d9a224ef69decc2d3887e847755d4c9e/be/src/exec/sort-node.cc#L123



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:36:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12799/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:20:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests that have arrays and/or maps in the sorting tuple in
   test_queries.py::TestQueries::{test_sort,
       test_top_n,test_partitioned_top_n}.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Reviewed-on: http://gerrit.cloudera.org:8080/19660
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/runtime/types.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
31 files changed, 1,099 insertions(+), 281 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19660/7/be/src/runtime/collection-value.h
File be/src/runtime/collection-value.h:

http://gerrit.cloudera.org:8080/#/c/19660/7/be/src/runtime/collection-value.h@60
PS7, Line 60:   int64_t byte_size;
int32_t should be enough to store this, as the whole row should fit to that size AFAIK

I don't mind using int64_t, but some comment could be added about the expected byte sizes


http://gerrit.cloudera.org:8080/#/c/19660/7/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/19660/7/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@407
PS7, Line 407:     // TODO: Should we include datetime?
             :     return itemType.isBoolean() ||
             :         itemType.isScalarType(PrimitiveType.TINYINT) ||
             :         itemType.isScalarType(PrimitiveType.SMALLINT) ||
             :         itemType.isScalarType(PrimitiveType.INT) ||
             :         itemType.isScalarType(PrimitiveType.BIGINT) ||
             :         itemType.isScalarType(PrimitiveType.FLOAT) ||
             :         itemType.isScalarType(PrimitiveType.DOUBLE) ||
             :         itemType.isDecimal() ||
             :         itemType.isTimestamp() ||
             :         itemType.isDate();
could be clearer and shorter with !itemType.isStringType() && !itemType.isStringType().isComplexType()


http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
File testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test@38
PS7, Line 38: # Sort a collection.
About the location of the tests: adding them to sort.test could ensure that they are run with a wider set of sorting related options.


http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test@39
PS7, Line 39: 
Does the patch also support ordey by limit N? 
Can you also add some tests for it, even if it is not supported?



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 May 2023 11:00:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG@16
PS2, Line 16: 
> I think that the queries should be able to run without disabling codegen wi
I don't think we have infrastructure to only disable codegen for the sorter, I don't know how difficult that would be... But I think we should try adding codegen in this change.


http://gerrit.cloudera.org:8080/#/c/19660/2//COMMIT_MSG@27
PS2, Line 27:    keys are strings which are var-len.
> I think that it is ok to disable MAP sorting for now in the FE. My understa
Sorting MAPs actually works, I added a test for it, but only on a small table. The problem is that our data generator can't generate maps with only fixed len data because the key has to be a string.


http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/sorter.cc@232
PS2, Line 232: 
> 'None' refers to the previous sentence, which changed. Could you rephrase t
Done


http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/sorter.cc@234
PS2, Line 234:         // is here.
             :         DCHECK_EQ(&var_len_pages_.back(), cur_var_len_page);
             :         uint8_t* var_data_ptr = cur_var_len_page->AllocateBytes(total_var_len);
             :         if (INITIAL_RUN) {
             :           CopyVarLenData(string_values, coll_values_and_sizes, var_data_ptr);
             :         } else {
             :           CopyVarLenDataConvertOffset(string_values, coll_values_and_sizes,
             :               var_len_pages_.size() - 1, cur_var_len_page->data(), var_data_ptr);
             :         }
             :       }
             :       ++num_tuples_;
             :       ++*num_processed;
             :       ++cur_input_index;
             :     }
             : 
             :     // If there are still rows left to process, get a new page for the fixed-length
             :     // tuples. If the run is already too long, return.
             :     if (cur_input_index < batch->num_rows()) {
             :       bool added;
             :       RETURN_IF_ERROR(TryAddPage(add_mode, &fixed_len_pages_, &added));
             :       if (!added) return Status::OK();
             :       cur_fixed_len_page = &fixed_len_pages_.back();
             :     }
             :   }
             :   return Status::OK();
> This check should be done outside this hot loop, e.g. during initialization
We used to allow "illegal" tuples if they are NULL (see L259 in PS2). For this we need to read the actual data that is not available at initialisation. On the other hand I don't think it's important that we allow these NULL tuples, especially as the FE will not allow them.

I moved this check to Run::Init() and removed the exception to allow NULL tuples.

Also, this is only a DCHECK in the end, so the compiler should optimise it out in release builds.


http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/19660/2/be/src/runtime/tuple.cc@236
PS2, Line 236: 
RawValue::Write now handles NULL values, we don't have to handle it here. If the value is NULL then 'string_values' and 'collection_values' will remain empty and we don't enter the IF-THEN blocks.


http://gerrit.cloudera.org:8080/#/c/19660/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/19660/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@369
PS2, Line 369:         MapType mapType = (MapType) type;
> MAP sorting is not supported at the moment, right?
No, it works, we just don't have a large table to test it on. I added a small table.


http://gerrit.cloudera.org:8080/#/c/19660/2/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/19660/2/tests/query_test/test_sort.py@312
PS2, Line 312:     query_result = self.execute_query(quer
> This is not clear to me - are we comparing only single a column? Shouldn't 
You're right, I changed it to compare whole rows (and transposition is not needed).



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:01:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19660/8/testdata/ComplexTypesTbl/simple_arrays_big.parq
File testdata/ComplexTypesTbl/simple_arrays_big.parq:

http://gerrit.cloudera.org:8080/#/c/19660/8/testdata/ComplexTypesTbl/simple_arrays_big.parq@1
PS8, Line 1: PAR1€â	€â	,À¸%%(QR&vUQcE=BJ7=RÅ]­02"‡^+^7N=.*K%\0v~d	1K0W<C`=(__H
https://github.com/apache/impala/blob/master/testdata/data/README ?
also, would it be very hard to generate this during dataload? There are some files bigger than this in the repo, but avoiding adding it would be nice


http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
File testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/19660/7/testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test@38
PS7, Line 38: # Same collection used twice in a select list.
> Moved to sort.test.
Oops, no I am worried that this will cause problems when sort.test is run with other table formats than Parquet/ORC
https://github.com/apache/impala/blob/63d13a35f35874822daf167d763ed683f1ec48ef/tests/query_test/test_queries.py#L46

Can you separate the sort related tests to another .test file and running it if the file format is not ORC/Parquet?


http://gerrit.cloudera.org:8080/#/c/19660/7/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/19660/7/tests/query_test/test_sort.py@296
PS7, Line 296:     if cls.exploration_strategy() == 'core':
             :       cls.ImpalaTestMatrix.add_constraint(lambda v:
             :           v.get_value('table_format').file_format == 'parquet')
Having more file formats doesn't seem useful as functional_parquet.simple_arrays_big is baked in in the query



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 06:56:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19660/12/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/19660/12/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@397
PS12, Line 397: && !itemType.isVarchar()
              :         && !itemType.isScalarType(PrimitiveType.STRING);
We should also not allow BINARY, right?
Something like isVarlenStringType could be added that checks for string + varchar + binary



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 14:10:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13064/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 13:50:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: WIP - IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/raw-value.h@175
PS3, Line 175:       const SlotDescriptor* slot_desc, MemPool* pool, std::vector<StringValue*>* string_values,
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/sorter.cc@731
PS3, Line 731:                                  // This must be the first slot with var len data for the tuple. Var len data
line too long (109 > 90)


http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/sorter.cc@776
PS3, Line 776:                                  // This must be the first slot with var len data for the tuple. Var len data
line too long (109 > 90)


http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/tuple.cc@250
PS3, Line 250:       for (const pair<CollectionValue*, int64_t>& collection_val_pair : collection_values) {
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/tuple.cc@370
PS3, Line 370:   // void MaterializeExprs(Tuple* opaque_tuple, TupleRow* row, const TupleDescriptor& desc,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/19660/3/be/src/runtime/tuple.cc@372
PS3, Line 372:   //     StringValue** non_null_string_values, CollValueAndSize* non_null_collection_values,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19660/3/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/19660/3/tests/query_test/test_sort.py@25
PS3, Line 25: def split_result_rows(result):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/19660/3/tests/query_test/test_sort.py@296
PS3, Line 296: \
flake8: E502 the backslash is redundant between brackets



-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 10:00:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12809/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Apr 2023 12:45:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests in test_nested_types.py that have arrays and maps in the
   sorting tuple
 - TODO: We should also test MAPs on a big table to test spilling but
   with the data generator expects an AVRO schema and in AVRO all struct
   keys are strings which are var-len.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_sort.py
20 files changed, 856 insertions(+), 221 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13019/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 13:43:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests in test_nested_types.py that have arrays and maps in the
   sorting tuple

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/sort.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_sort.py
23 files changed, 953 insertions(+), 249 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests that have arrays and/or maps in the sorting tuple in
   test_queries.py::TestQueries::{test_sort,
       test_top_n,test_partitioned_top_n}.

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n-complex.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
29 files changed, 1,045 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/11
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19660 )

Change subject: IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list
......................................................................

IMPALA-12019: Support ORDER BY for arrays of fixed length types in select list

As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.

Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.

Also refactored the RawValue::Write*() functions to have a clearer
interface.

Testing:
 - Added a new test table that contains many rows with arrays. This is
   queried in a new test added in test_sort.py, to ensure that we handle
   spilling correctly.
 - Added tests in test_nested_types.py that have arrays and maps in the
   sorting tuple

Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
---
M be/src/exec/hash-table.cc
M be/src/runtime/collection-value.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
A testdata/ComplexTypesTbl/simple_arrays_big.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
A testdata/workloads/functional-query/queries/QueryTest/sort-complex.test
A testdata/workloads/functional-query/queries/QueryTest/top-n-complex.test
M tests/query_test/test_nested_types.py
M tests/query_test/test_queries.py
M tests/query_test/test_sort.py
26 files changed, 1,001 insertions(+), 251 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/19660/10
-- 
To view, visit http://gerrit.cloudera.org:8080/19660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Gerrit-Change-Number: 19660
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>