You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Andrew Sherman (Code Review)" <ge...@cloudera.org> on 2019/03/06 16:19:00 UTC

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Andrew Sherman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12680


Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................

IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

IMPALA-6658 changed RleEncoder to have the ability to use run lengths
other than 8. It seemed that a slightly more complex RleEncoder could
save a small amount of disk space by using the longer run lengths, in
particular for bit width of 1. We now see a performance regression on a
simple ETL query.  Overall it seems that the costs of IMPALA-6658 exceed
the benefits. This change removes IMPALA-6658.

The strategy for this was that the change to rle-encoding.h, which
contains the code, was undone using 'git revert'. I removed the test
changes in rle-test.cc that rely on different encoding lengths. This
allows us to keep some useful new tests that were written as part of
IMPALA-6658

TESTING:

Ran all end-to-end tests.

Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
---
M be/src/exec/parquet/parquet-bool-decoder-test.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
3 files changed, 139 insertions(+), 383 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/12680/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12680/1/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/12680/1/be/src/util/rle-encoding.h@64
PS1, Line 64:  For 1 bit-width values, that point is 8 values.  They require 2 bytes
            : /// for both the repeated encoding or the literal encoding.  This value can always
            : /// be computed based on the bit-width.
Maybe it could be mentioned that the optimal can be 16/24 in some cases, but we did not implement it because we are unsure about its benefits.



-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:39:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................

IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

IMPALA-6658 changed RleEncoder to have the ability to use run lengths
other than 8. It seemed that a slightly more complex RleEncoder could
save a small amount of disk space by using the longer run lengths, in
particular for bit width of 1. We now see a performance regression on a
simple ETL query.  Overall it seems that the costs of IMPALA-6658 exceed
the benefits. This change removes IMPALA-6658.

The strategy for this was that the change to rle-encoding.h, which
contains the code, was undone using 'git revert'. I removed the test
changes in rle-test.cc that rely on different encoding lengths. This
allows us to keep some useful new tests that were written as part of
IMPALA-6658

TESTING:

Ran all end-to-end tests.

Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
---
M be/src/exec/parquet/parquet-bool-decoder-test.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
3 files changed, 141 insertions(+), 383 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/12680/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2387/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Mar 2019 20:07:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Mar 2019 21:18:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 1:

Thanks Csaba for the review


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Mar 2019 21:41:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2372/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Mar 2019 17:02:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Mar 2019 16:53:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Mar 2019 16:52:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3894/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Mar 2019 16:53:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.
......................................................................

IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

IMPALA-6658 changed RleEncoder to have the ability to use run lengths
other than 8. It seemed that a slightly more complex RleEncoder could
save a small amount of disk space by using the longer run lengths, in
particular for bit width of 1. We now see a performance regression on a
simple ETL query.  Overall it seems that the costs of IMPALA-6658 exceed
the benefits. This change removes IMPALA-6658.

The strategy for this was that the change to rle-encoding.h, which
contains the code, was undone using 'git revert'. I removed the test
changes in rle-test.cc that rely on different encoding lengths. This
allows us to keep some useful new tests that were written as part of
IMPALA-6658

TESTING:

Ran all end-to-end tests.

Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Reviewed-on: http://gerrit.cloudera.org:8080/12680
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/parquet-bool-decoder-test.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
3 files changed, 141 insertions(+), 383 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>