You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/11/21 22:32:55 UTC

[kudu-CR] KUDU-1751 (part 1): Change default int encoding to BIT SHUFFLE

Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5169

Change subject: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE
......................................................................

KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
---
M src/kudu/cfile/type_encodings.cc
1 file changed, 10 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-1751 (part 1): Change default int encoding to BIT SHUFFLE

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#5).

Change subject: WIP: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE
......................................................................

WIP: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

Should fix KUDU-1600 before committing this.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
2 files changed, 11 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1751: Change default encodings

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1751: Change default encodings
......................................................................


Patch Set 10: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-1751 (part 1): Change default int encoding to BIT SHUFFLE

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#2).

Change subject: WIP: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE
......................................................................

WIP: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

Should fix KUDU-1600 before committing this.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
---
M src/kudu/cfile/type_encodings.cc
1 file changed, 10 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1751: Change default encodings

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1751: Change default encodings
......................................................................


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1751 (part 1): Change default int encoding to BIT SHUFFLE

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE
......................................................................


Patch Set 1: Code-Review-1

should fix KUDU-1600 before this is committed

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1751: Change default encodings

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-1751: Change default encodings
......................................................................


KUDU-1751: Change default encodings

* Change numeric (int/float/double) encodings to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

* Change BINARY encodings to DICT_ENCODING

This is the default in Parquet, and we've seen that it's a common reason
that Kudu performs poorly. This automatically falls back to
non-dict-encoded for high-cardinality data.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Reviewed-on: http://gerrit.cloudera.org:8080/5169
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
M src/kudu/tablet/diskrowset-test.cc
3 files changed, 14 insertions(+), 14 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1751: Change default INT and BINARY encodings

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1751: Change default INT and BINARY encodings
......................................................................


Patch Set 7:

(3 comments)

Test failures look legit.

http://gerrit.cloudera.org:8080/#/c/5169/7//COMMIT_MSG
Commit Message:

Line 7: KUDU-1751: Change default INT and BINARY encodings
I think it's fair to say 'Change default encodings', since the default for every type is changing.


PS7, Line 9: i
encodings


PS7, Line 9: INT
The floating point types are also changing.


-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1751 (part 1): Change default int encoding to BIT SHUFFLE

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE
......................................................................


Patch Set 1:

well, these test failures are concerning... various end-to-end test failed with this change, meaning that the bitshuffle encoding probably has some correctness issues.

Seems like we have a substantial coverage gap for the non-plain encodings, we should add more randomization and/or parameterization of encodings and compression to the test code ASAP.

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1751: Change default INT and BINARY encodings

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#7).

Change subject: KUDU-1751: Change default INT and BINARY encodings
......................................................................

KUDU-1751: Change default INT and BINARY encodings

* Change INT incodings to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

* Change BINARY encodings to DICT_ENCODING

This is the default in Parquet, and we've seen that it's a common reason
that Kudu performs poorly. This automatically falls back to
non-dict-encoded for high-cardinality data.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
2 files changed, 12 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-1751 (part 1): Change default int encoding to BIT SHUFFLE

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#4).

Change subject: WIP: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE
......................................................................

WIP: KUDU-1751 (part 1): Change default int encoding to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

Should fix KUDU-1600 before committing this.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
2 files changed, 11 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1751: Change default encodings

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#8).

Change subject: KUDU-1751: Change default encodings
......................................................................

KUDU-1751: Change default encodings

* Change numeric (int/float/double) encodings to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

* Change BINARY encodings to DICT_ENCODING

This is the default in Parquet, and we've seen that it's a common reason
that Kudu performs poorly. This automatically falls back to
non-dict-encoded for high-cardinality data.

Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
M src/kudu/tablet/diskrowset-test.cc
3 files changed, 14 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1751: Change default encodings

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1751: Change default encodings
......................................................................


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be13333ff450a6cb2b2862f7a
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No