You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Haijie Hong (Code Review)" <ge...@cloudera.org> on 2016/11/22 17:05:47 UTC

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Haijie Hong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5176

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<uint32_t> so that it can store different hash values. IN-list predicates can only be prune when hash component contains 1 column, since Kudu doesn't support multi-column IN-list predicate currently.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
2 files changed, 154 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates B

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#3).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates B
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates
B

I change optional<uint32_t> to vector<uint32_t> so that it can store different
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
2 files changed, 272 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates B

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#4).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates B
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates
B

I change optional<uint32_t> to vector<uint32_t> so that it can store different
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 281 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates B

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#5).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates B
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates
B

I change optional<uint32_t> to vector<uint32_t> so that it can store different
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 281 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#11).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 281 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Haijie Hong has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 8:

(7 comments)

> (7 comments)

http://gerrit.cloudera.org:8080/#/c/5176/7/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 165:     uint32_t hash = partition_schema.BucketForEncodedColumns(encoded_string, hash_bucket_schema);
> add an early return here so that the logic below can be outside the else bl
Done


Line 181:                             predicate->raw_values().end());
> I think this can be hoisted outside of the for loop.
Done


Line 194:                                   new_encoded_string,
> This means the has component is unconstrained, right?  So does a full set a
Done


Line 295:   // Step 2: Create the hash bucket portion of the partition key.
> It could be more efficient to make this a vector<vector<bool>>, with the in
Done


Line 300:   for (int hash_idx = 0; hash_idx < partition_schema.hash_bucket_schemas_.size(); hash_idx++) {
> Given how much more expensive the hash partition search might be in terms o
Done


Line 309:         can_prune = false;
> this can be more efficient as
Done


Line 344:                                          }));
> given a `vector<bool>` representation, this if/else clause could be combine
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5176/8/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

Line 525:   // DISTRIBUTE BY HASH(a) INTO 2 BUCKETS,
The SQL string needs to be updated.


Line 535:   pb.mutable_range_schema()->Clear();
Clearing a new PB isn't necessary.


Line 578:   Check({ ColumnPredicate::InList(schema.column(0), &a_values) }, 9);
We simplify 1-element IN list predicates to an equality, so this isn't actually testing the IN list pruning.


PS8, Line 617: 2
3


PS8, Line 618: 2
3


http://gerrit.cloudera.org:8080/#/c/5176/8/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 156: void PartitionPruner::SearchHashOfColumnCombination(const PartitionSchema& partition_schema,
I think this method would be simpler if it had an additional inner loop and an accumulating vector<string> instead of being recursive.  The 'col_offset', 'hash_bucket_count', 'encoded_string' variables could be made into locals instead of parameters, and the hash_bucket_bitset could be returned instead of taken  as an out-param.


Line 174:   if (predicate == nullptr) return;
This shouldn't happen any more because we pre-check, right?


Line 303:     for (int col_offset = 0; col_offset < hash_bucket_schema.column_ids.size(); col_offset++) {
This for loop can be simplified to 

    for (ColumnId column_id : hash_bucket_schema.column_ids) {
        const ColumnSchema& column = schema.column_by_id(column_id);
        ...


-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#13).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 278 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5176/7/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 165:     hash_values->insert(hash);
add an early return here so that the logic below can be outside the else block.


Line 181:       const KeyEncoder<string>& encoder = GetKeyEncoder<string>(column.type_info());
I think this can be hoisted outside of the for loop.


Line 194:         return;
This means the has component is unconstrained, right?  So does a full set and an empty set mean unconstrained?


Line 295:   vector<vector<uint32_t>> hash_buckets;
It could be more efficient to make this a vector<vector<bool>>, with the inner vector being a bitset.  That would eliminate the need for copying between an unordered_set and the vector.  the vector<bool> would need to be created with as many false values as hash buckets


Line 300:     SearchHashOfColumnCombination(partition_schema,
Given how much more expensive the hash partition search might be in terms of # of serialized key components, I think it would be prudent to check up front that all of the predicates are equality or in list before proceeding.


Line 309:     hash_buckets.push_back(hash_bucket);
this can be more efficient as

    hash_buckets.emplace_back(std::move(hash_bucket));


Line 344:     if (!hash_buckets[hash_idx].empty()) {
given a `vector<bool>` representation, this if/else clause could be combined if the unconstrained case is represented as a full set.


-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#9).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 276 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Reviewed-on: http://gerrit.cloudera.org:8080/5176
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 278 insertions(+), 50 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates B

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#6).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates B
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates
B

I change optional<uint32_t> to vector<uint32_t> so that it can store different
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 281 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#10).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 276 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#12).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 282 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Haijie Hong has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 13:

(7 comments)

> (9 comments)
 > 
 > Sorry this took so long, was out on holiday break for a stretch. 
 > Will try to be much more responsive going forward.  Thanks again
 > for taking this on!

http://gerrit.cloudera.org:8080/#/c/5176/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

Line 660:   // a in [0, 1];
> We automatically simplify single-valued IN list predicates to an equality p
Done


http://gerrit.cloudera.org:8080/#/c/5176/12/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 154: vector<bool> PartitionPruner::PruneHashComponent(
> Move this comment to the header.
Done


PS12, Line 156: ema,
              :     const Schema& schema,
              :     const ScanSpec& scan_spec) {
              :   vector<bool> hash_bucket_bitset(hash_bucket_schema.num_buckets, false);
> To avoid the NOLINT you can add a line break before the first argument and 
Done


PS12, Line 165:  GetKeyEnc
> Use FindOrDie, since the predicate has already been checked, and the result
Done


http://gerrit.cloudera.org:8080/#/c/5176/12/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

PS12, Line 67:  combination of in-list and e
> I think a better name for this method is 'PruneHashComponent', since it's b
Done


PS12, Line 67: // Sea
> use the fully qualified std::vector in header files.
Done


Line 68:   // Return hash values bitset of these combination.
> See comment .cc about wrapping and NOLINT.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Haijie Hong has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 11:

(8 comments)

> (8 comments)

http://gerrit.cloudera.org:8080/#/c/5176/8/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

Line 525:   // DISTRIBUTE BY HASH(a) INTO 3 BUCKETS,
> The SQL string needs to be updated.
Done


Line 535:   auto pb = PartitionSchemaPB();
> Clearing a new PB isn't necessary.
Done


Line 578:   Check({ ColumnPredicate::InList(schema.column(0), &a_values) }, 18);
> We simplify 1-element IN list predicates to an equality, so this isn't actu
Done


PS8, Line 617: c
> 3
Done


PS8, Line 618: m
> 3
Done


http://gerrit.cloudera.org:8080/#/c/5176/8/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 156: vector<bool> PartitionPruner::SearchHashOfColumnCombination(const PartitionSchema& partition_schema,
> I think this method would be simpler if it had an additional inner loop and
Done


Line 174:       } else if (predicate->predicate_type() == PredicateType::InList) {
> This shouldn't happen any more because we pre-check, right?
Done


Line 303:         break;
> This for loop can be simplified to 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#8).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<bool> so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 298 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#2).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<uint32_t> so that it can store different hash values.
IN-list predicates can only be prune when hash component contains 1 column,
since Kudu doesn't support multi-column IN-list predicate currently.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
2 files changed, 154 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Haijie Hong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5176

to look at the new patch set (#7).

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................

KUDU-1643 Prune hash partitions based on IN-list predicates

I change optional<uint32_t> to vector<uint32_t> so that it can store different
hash values.
It will search all combinations of in-list column value and calculate the hash values.

Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 281 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5176/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1643 Prune hash partitions based on IN-list predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1643 Prune hash partitions based on IN-list predicates
......................................................................


Patch Set 12:

(9 comments)

Sorry this took so long, was out on holiday break for a stretch.  Will try to be much more responsive going forward.  Thanks again for taking this on!

http://gerrit.cloudera.org:8080/#/c/5176/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

Line 660:   // a in [0];
We automatically simplify single-valued IN list predicates to an equality predicate, so there is no need to test this case.  Could you change the remaining cases which use single value IN lists to use equality instead, just to make it more clear?


http://gerrit.cloudera.org:8080/#/c/5176/12/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 154: // Search all combination of in-list and equality predicates.
Move this comment to the header.


PS12, Line 156: const PartitionSchema& partition_schema,
              :                                                             const PartitionSchema::HashBucketSchema& hash_bucket_schema,  // NOLINT(*)
              :                                                             const Schema& schema,
              :                                                             const ScanSpec& scan_spec
To avoid the NOLINT you can add a line break before the first argument and indent to 4 spaces, per the google style guide.


PS12, Line 165: FindOrNull
Use FindOrDie, since the predicate has already been checked, and the result isn't checked against nullptr.


PS12, Line 183: static_cast<size_t>(col_offset)
this static cast can be avoided by making the col_offset a size_t to begin with.


PS12, Line 312: hash_bucket_bitset
I think you need to wrap hash_bucket_bitset in std::move in order to avoid a copy here.


http://gerrit.cloudera.org:8080/#/c/5176/12/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

PS12, Line 67: SearchHashOfColumnCombination
I think a better name for this method is 'PruneHashComponent', since it's being applied to just a single hash component.


PS12, Line 67: vector
use the fully qualified std::vector in header files.


Line 68:                                                               const PartitionSchema::HashBucketSchema& hash_bucket_schema,  // NOLINT(*)
See comment .cc about wrapping and NOLINT.


-- 
To view, visit http://gerrit.cloudera.org:8080/5176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I578ac3c5e6770b13b0bebe01127d7d8263aceb36
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Haijie Hong <ho...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes