You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Norbert Luksa (Code Review)" <ge...@cloudera.org> on 2019/09/12 13:45:21 UTC

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14080


Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comperator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comperator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

Currently, strings, varchars, floats and doubles are not
supported.

Testing:
 * Added unit tests.
 * Currently, some tests are missing.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
16 files changed, 630 insertions(+), 58 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comperator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comperator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

Currently, strings, varchars, floats and doubles are not
supported.

Testing:
 * Added unit tests.
 * Currently, some tests are missing.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
16 files changed, 776 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 26:

The verification failed due to a the flaky test_exchange_mem_usage_scaling and AuthorizationStmtTest.testSelect.
Run an exhaustive test, it passed: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/6472/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 08:55:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comperator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comperator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

Currently, strings, varchars, floats and doubles are not
supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
16 files changed, 784 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 30: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 30
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:29:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 25:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5222/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 12:36:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5479/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 18
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Jan 2020 11:35:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 22: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Feb 2020 13:29:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 23:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 23
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4829/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 16:18:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5480/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Jan 2020 11:37:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 28: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5363/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 17:49:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5268/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 13
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Dec 2019 11:14:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 24: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 24
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:05:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319
PS19, Line 319:   // The algorithm requires all values having a common type, without loss of data.
              :   // This means we have to find the biggest type.
              :   int max_size = ordering_exprs_[0]->type().GetByteSize();
              :   for (int i = 1; i < ordering_exprs_.size(); ++i) {
> nit: the mask could be calculated in GetSharedRepresentation() instead of p
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423
PS19, Line 423: 
> Local variable U val shadows patameter void* val.
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424
PS19, Line 424: 
> nit: please add comment about it, something like "we copy the bytes from th
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434
PS19, Line 434: alue = *reinterpret_cast<const T*>(val);
> It will only have the value of the first char of the string.
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435
PS19, Line 435: tmp, &floating_value, sizeof(T));
> replace with 'sizeof(U) - std::min(sizeof(U), type.len)'?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Jan 2020 17:06:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc@94
PS7, Line 94:     Tuple* tuple_mem = Tuple::Create(sizeof(char) + GetSize(args...), &expr_perm_pool_);
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@314
PS5, Line 314: 
> nit: Can we come up with a better name? Maybe GetZDimensionValue() or somet
Done


http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@334
PS5, Line 334: turn Comp
> Maybe you could add a DCHECK(false); as well, and maybe a TODO comment. If 
Done


http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@383
PS5, Line 383: rn
> nit: since you use 'lhs' and 'rhs' at other places, maybe rename 'v1' and '
Done


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@209
PS7, Line 209: Status TupleRowLexicalComparator::CodegenCompare(LlvmCodeGen* codegen,
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@323
PS7, Line 323:   constexpr uint64_t mask64 = 0x8000000000000000;
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@395
PS7, Line 395:     case TYPE_TIMESTAMP: {
> line too long (91 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Sep 2019 12:58:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 27: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 27
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 16:06:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 16:

Rebased.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Jan 2020 09:24:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,109 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/20
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Sep 2019 13:39:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,002 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/12
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 12
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5478/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Jan 2020 10:08:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5267/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Dec 2019 11:14:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 21: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/21/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/14080/21/be/src/util/tuple-row-compare.h@190
PS21, Line 190:   /// INT_MAX would be 111..111.
nit: you could mention null values



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 14:59:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Anonymous Coward (520) has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@360
PS9, Line 360:     if (less_msb(msd_lhs ^ msd_rhs, lhsi ^ rhsi)) {
             :       msd_lhs = lhsi;
             :       msd_rhs = rhsi;
             :     }
This means the column that uses most bits will likely be the dominating column. e.g. if two columns are selected, one uses 8 bits and the other uses 4 bits, then the column using 8 bits will likely to determine the sorting order. Do you have the design doc covering the detail of all types of data?



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 29 Oct 2019 00:06:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4553/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Sep 2019 14:25:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5019/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 12
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 09:00:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 26: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:16:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 27: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5356/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 27
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 20:29:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/19
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 29: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:28:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/16
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34
PS14, Line 34: getting great results
> Could you provide some basic statistics?
Done


http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35
PS14, Line 35: One negative is the sorting itself, taking
             :    4-7 more times than lexical sorting.
> You could emphasize that it only affects the writes.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Jan 2020 10:52:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@360
PS9, Line 360:       msd_lhs = lhsi;
             :       msd_rhs = rhsi;
             :     }
             :   }
> This means the column that uses most bits will likely be the dominating col
Hi, sorry for replying late.
Uploaded a patch set where the smaller types are shifted up, and won't be dominated by the bigger columns.
(I do not have a design doc that covered this particular part.)



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Dec 2019 10:45:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5587/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Feb 2020 11:01:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 23: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 23
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Feb 2020 18:17:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/common/global-flags.cc@275
PS9, Line 275: DEFINE_bool(unlock_zorder_sort, false,
             :     "(Experimental) If true, enables using ZORDER option for SORT BY.");
I think we can enable it by default. Or maybe in a follow-up commit, since some tests also need to be moved from custom cluster tests to query tests.


http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@325
PS9, Line 325: ((uint128_t) -1) / 2 + 1
nit: how about (uint128_t)1 << 127? Or you could use SetBit from bit-util.h



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Oct 2019 16:23:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 25: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:15:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5494/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Jan 2020 17:52:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc@94
PS7, Line 94:     uint8_t* tuple_row_mem = expr_perm_pool_.Allocate(sizeof(char*) + sizeof(int32_t*) * 2);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@209
PS7, Line 209: Status TupleRowLexicalComparator::CodegenCompare(LlvmCodeGen* codegen, llvm::Function** fn) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@323
PS7, Line 323:   constexpr uint128_t mask128 = ((uint128_t) -1) / 2 + 1; //0x80000000000000000000000000000000;
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@395
PS7, Line 395:       const uint128_t nanoseconds = static_cast<uint128_t>(ts->time().total_nanoseconds());
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Sep 2019 13:46:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 26:

The verification failed because the AllTypeTest added too many columns, with more slot size than possible. This resulted in a bitshift overflow when initialising a SlotRef. Added a comment and DCHECK, and removed some not too important columns from the test to prevent this issue from happening.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:25:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 29:

Thanks Zoltan, included the header.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:16:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,060 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/13
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 13
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 28: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 11:35:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 19:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319
PS19, Line 319:   // The masks are used for setting the sign bit correctly.
              :   constexpr uint32_t mask32 = (uint32_t)1 << 31;
              :   constexpr uint64_t mask64 = (uint64_t)1 << 63;
              :   constexpr uint128_t mask128 = (uint128_t)1 << 127;
nit: the mask could be calculated in GetSharedRepresentation() instead of passing it over


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423
PS19, Line 423: val
Local variable U val shadows patameter void* val.


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424
PS19, Line 424:       BitUtil::ByteSwap(&val, string_value->ptr, len);
nit: please add comment about it, something like "we copy the bytes from the string but swap the bytes because of integer endianess."


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434
PS19, Line 434: static_cast<U>(*reinterpret_cast<const char*>(val)
It will only have the value of the first char of the string.

I see there are tests for chars, but do we have tests for fixed size strings, e.g. CHAR(5)?


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435
PS19, Line 435: (sizeof(U) > 8 ? sizeof(U) * 8 - 64 : 0)
replace with 'sizeof(U) - std::min(sizeof(U), type.len)'?



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Jan 2020 14:41:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 29:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5363/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 13:00:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 26:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5340/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:16:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14080/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/8//COMMIT_MSG@12
PS8, Line 12: The commit adds a Comperator based on Z-ordering. See in detail:
Nit: comparator. Also on line 15.


http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/partial-sort-node.cc
File be/src/exec/partial-sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/partial-sort-node.cc@54
PS9, Line 54:   sorting_order_ = (TSortingOrder::type)tnode.sort_node.sort_info.sorting_order;
I think we're trying to avoid C-style casts.


http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/sort-node.cc@50
PS9, Line 50:   sorting_order_ = (TSortingOrder::type)tnode.sort_node.sort_info.sorting_order;
I think we're trying to avoid C-style casts.



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Oct 2019 11:48:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,119 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/29
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5549/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 16:37:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   A only drawback is the sorting itself, taking ~4 times more
   than lexical sorting. Note however, that this is a one-time
   thing to do, sorting only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/18
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 18
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 26: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5340/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Feb 2020 19:08:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 21:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@42
PS20, Line 42: desc
> nit: add underscore suffix
Done


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@164
PS20, Line 164: 
> nit: double have
Done


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@167
PS20, Line 167: teComperator(ColumnType(TYPE_BOOLEAN
> nit: please add comment about the layout of tuple_row_mem.
Done


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@172
PS20, Line 172:  This function is responsible for only the char 
> Don't we need to set both slots as not nulls?
As discussed offline, we do not even have to set these, since by default the slots are not nullable. However this pointed out that we do not test nulls, so added a case for the IntIntTest for them.


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@180
PS20, Line 180: memcpy
> nit: use DCHECK_EQ instead
Done


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h@187
PS20, Line 187: We transform the original a and b values to their "sha
> nit: The shared representation has an important property that could be ment
Done, copied your description to the comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 15:51:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 28:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5363/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 11:35:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 28:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5366/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 17:51:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 30:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5430/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 30
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:29:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 23: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 23
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 27:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5356/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 27
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 18 Feb 2020 16:06:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 20: Code-Review+1

(5 comments)

Found some nits, but I think it's almost done :)

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@164
PS20, Line 164: have
nit: double have


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@167
PS20, Line 167: sizeof(char*) + sizeof(int32_t*) * 2
nit: please add comment about the layout of tuple_row_mem.


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@172
PS20, Line 172: tuple_mem->SetNotNull(NullIndicatorOffset(0,1));
Don't we need to set both slots as not nulls?


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@180
PS20, Line 180: DCHECK
nit: use DCHECK_EQ instead


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h@187
PS20, Line 187: The basic concept of getting the shared representation
nit: The shared representation has an important property that could be mentioned. Namely that we transform the original a and b values to their "shared representation" a' and b' in a way that if a < b then a' is lexically less than b' regarding to their bits. Thus for ints INT_MIN would be 0, INT_MIN+1 would be 1, and so on, and in the end INT_MAX would be 111..111.



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jan 2020 14:44:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,127 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/21
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5285/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 24
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:05:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 28:

I checked the verify job failure. TL;DR: include runtime/timestamp-value.inline.h in tuple-row-compare-test.cc

I think it fails because the verify also does an SO build and when the linker creates the executable for tuple-row-compare-test it doesn't find the symbol 'impala::TimestampValue::FromDaysSinceUnixEpoch(long)' in the linked shared objects. When we do a static build the test is linked against a much bigger static library that contains the symbol.


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Feb 2020 11:34:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,128 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/22
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 24: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5285/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 24
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Feb 2020 15:00:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Reviewed-on: http://gerrit.cloudera.org:8080/14080
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,119 insertions(+), 95 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 31
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 14: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34
PS14, Line 34: getting great results
Could you provide some basic statistics?


http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35
PS14, Line 35: One negative is the sorting itself, taking
             :    4-7 more times than lexical sorting.
You could emphasize that it only affects the writes.



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Jan 2020 14:27:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 28:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5366/


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Feb 2020 22:18:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 30: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 30
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 17:27:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,059 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/14
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@42
PS20, Line 42: desc
nit: add underscore suffix



-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 14:58:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
......................................................................

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.       1st      ~99%               ~90%
   Z-ord.     1st      ~99%               ~89%
   Lex.       2nd      ~25%               0%
   Z-ord.     2nd      ~97%               0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,118 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/25
-- 
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>