You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "wangning (Code Review)" <ge...@cloudera.org> on 2020/10/29 06:42:04 UTC

[kudu-CR] KUDU-1644 hash-partition based partition optimization

wangning has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16674


Change subject: KUDU-1644 hash-partition based partition optimization
......................................................................

KUDU-1644 hash-partition based partition optimization

Hash prune for single hash-key based inList query. Reduce the prediated
values by hash-partition calculation.

Table has P partitions, N records, R rowsets. Inlist predicate has V
values.

Before:
Complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table profile:
hash(id) profile by id partitions 3, simply use mod as hash function.
select * from list where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0, 3, 6, 9)
Tablet 2: id in (1, 4, 7, 10)
Tablet 3: id in (2, 5, 8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
6 files changed, 72 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 1
Gerrit-Owner: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based partition optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#6).

Change subject: KUDU-1644 hash-partition based partition optimization
......................................................................

KUDU-1644 hash-partition based partition optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.

Table has P partitions, N records, R rowsets. Inlist predicate has V
values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table profile:
hash(id) profile by id partitions 3, simply use mod as hash function.
select * from list where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0, 3, 6, 9)
Tablet 2: id in (1, 4, 7, 10)
Tablet 3: id in (2, 5, 8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
9 files changed, 339 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 6
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#15).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 560 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/15
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 15
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2667
PS8, Line 2667: 
              :   spec.OptimizeScan(tablet_schema, scanner->arena(), true);
              :   VLOG(3) << "After
> If we pushed the emptiness check into CanShortCircuit(), would that avoid t
That seems bit different, it seems not good to detect can shortcircuit here. It will return wrong answer and result in the failure of TabletServerTest.TestNonPositiveLimitsShortCircuit
/
I will split them and re-patch



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 14
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 10:24:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 8:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/16674/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/7//COMMIT_MSG@2
PS7, Line 2:   ningw <19...@gmail.com>
           : AuthorDate: 
> this patch is target on 
nit: could you add a TODO comment in PruneHashForInlistIfPossible mentioning this? I agree this would be a good improvement to make, and it'd be nice to have a pointer in the code both explaining the limitations of the code today, and ideas for improvement.


http://gerrit.cloudera.org:8080/#/c/16674/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/8//COMMIT_MSG@31
PS8, Line 31: 
            : After:
            : Tablet 1: id in (0,3,6,9)
            : Tablet 2: id in (1,4,7,10)
            : Tablet 3: id in (2,5,8)
Making sure my understanding is correct here -- this doesn't reduce the number of rows sent back by the servers, but:
- it does reduce the CPU used when evaluating the in-list predicates since the predicates themselves are smaller
- it also allows us to short-circuit immediately if the in-list predicate only contains values that are not in a given tablet


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc
File src/kudu/common/scan_spec-test.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@473
PS8, Line 473: ASSERT_EQ(spec_p1.IsInListPredicateValuesEmpty(), false);
nit: you can use ASSERT_TRUE and ASSERT_FALSE for these


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@503
PS8, Line 503: columnPB
nit: We use snake_case for variable names in our C++ code. Same elsewhere


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@518
PS8, Line 518: parition
nit: "partition", same elsewhere


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@553
PS8, Line 553: / Test that hash(a, b), hash(c) InList predicates.
             : // Neither a or b InList can be pruned.
             : // c InList should be pruned.
Could you also add a case for hash(a), hash(b)?


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@563
PS8, Line 563: 
             :   PartitionSchema partition_schema;
             :   PartitionSchemaPB partition_schemaPB;
             :   auto* hashPB = partition_schemaPB.add_hash_bucket_schemas();
             :   hashPB->set_num_buckets(3);
             :   hashPB->set_seed(0);
             :   auto* columnPB_1_1 = hashPB->add_columns();
             :   int key_column_id_1_1 = schema.find_column("a");
             :   columnPB_1_1->set_id(key_column_id_1_1);
             :   columnPB_1_1->set_name("a");
             : 
             :   auto* columnPB_1_2 = hashPB->add_columns();
             :   int key_column_id_1_2 = schema.find_column("b");
             :   columnPB_1_2->set_id(key_column_id_1_2);
             :   columnPB_1_2->set_name("b");
             : 
             :   auto* hashPB_2_1 = partition_schemaPB.add_hash_bucket_schemas();
             :   hashPB_2_1->set_num_buckets(3);
             :   hashPB_2_1->set_seed(0);
             :   auto* columnPB_2_1 = hashPB_2_1->add_columns();
             :   int key_column_id_2_1 = schema.find_column("c");
             :   columnPB_2_1->set_id(key_column_id_2_1);
             :   columnPB_2_1->set_name("c");
             : 
             :   ASSERT_OK(PartitionSchema::FromPB(partition_schemaPB, schema, &partition_schema));
nit: seems like this code is reused a bunch in these tests. Maybe define a function in an anonymous namespace like:

namespace {

PartitionSchema GeneratePartitionSchema(const vector<pair<vector<string>, int>> hash_partitions, ...) {
  PartitionsSchemaPB partition_schema_pb;
  for (const auto& col_names_and_num_buckets : hash_partitions) {
    auto* hash_pb = partition_schema_pb.add_hash_bucket_schemas();
    hash_pb->set_num_buckets(col_names_and_num_buckets.second);
    for (const auto& col_name : col_names_and_num_buckets.first) {
      hash_pb->(set fields...)
    }
  }
  PartitionSchema partition_schema;
  CHECK_OK(FromPB(partition_schema_pb, &partition_schema));
  return partition_schema;
}

} // anonymous namespace

That way, these tests can just call

 const auto partition_schema = GeneratePartitionSchema({ make_pair({ "a", "b" }, 3), make_pair({ "c" }, 1) });


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@88
PS8, Line 88:                   return predicate.second.predicate_type() == PredicateType::None;
Why shouldn't we put the InList empty check here?


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@191
PS8, Line 191:     if (predicate.predicate_type() == PredicateType::InList) {
We should be able to push this to the front of the loop and avoid a FindColumn() call, e.g.

for (auto& p : predicates_) {
  if (predicate.predicate_type() != PredicateType::InList) continue;
  ...
}


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: col_name,
nit: we've already found the column index -- could we pass 'idx' or the column ID here and avoid an extra FindColumn()?


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: IsSingleColumnHashParition
nit: typo in "Partition"


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@205
PS8, Line 205: Status s = Status::OK();
             :           s = partial_row.Set(idx, reinterpret_cast<const uint8_t*>(value));
nit: can merge these lines


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2667
PS8, Line 2667: // However, it may crash in coming 'OptimizeScan' call to use empty
              :     // predicate value, here could protects special case with minor cost during
              :     // setup stage.
If we pushed the emptiness check into CanShortCircuit(), would that avoid the crash?


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2737
PS8, Line 2737:   if (spec.CanShortCircuit()) {
              :     VLOG(1) << "short-circuiting without creating a server-side scanner.";
              :     *has_more_results = false;
              :     return Status::OK();
              :   }
Hmm, I'm not sure, but could we move this up to right after OptimizeScan()? Seems like otherwise we are doing some work even though we are not going to return anything. Or do we need to short circuit _after_ validating the scan spec?



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Fri, 30 Oct 2020 19:55:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 17:

So sorry for patching so much that the msg is so messy for reviewing. I was busy tackling the break things detected by tidy and something else. The correctness and performance were tested in my environment with hours testing. 
Would you mind me abandon it and re-commit for review convenient?


-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 17
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Nov 2020 08:26:51 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
LOG(V) * N

After:
Complexity becomes:
LOG(V/P) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Reviewed-on: http://gerrit.cloudera.org:8080/16674
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 552 insertions(+), 26 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 20
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partial_row.h@642
PS8, Line 642: // Runtime version of the generic setter.
Sorry I don't understand this comment. Also could you write it as per doxygen format?


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@267
PS8, Line 267: Parition
Typo


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266
PS8, Line 266: // Set is_hash_partition to true if col_name is one of hash partition schema.
             :   void IsSingleColumnHashParition(const Schema& schema,
             :                                   const std::string& col_name,
             :                                   bool* is_hash_partition) const {
It'd be better to return bool instead of using as an out parameter.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h
File src/kudu/common/scan_spec.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@62
PS8, Line 62:  empty.
is empty


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@80
PS8, Line 80: Support hash(onekey).
I suppose the comment indicates, "supports hash schema with only one key"?



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:50:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based partition optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#4).

Change subject: KUDU-1644 hash-partition based partition optimization
......................................................................

KUDU-1644 hash-partition based partition optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.

Table has P partitions, N records, R rowsets. Inlist predicate has V
values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table profile:
hash(id) profile by id partitions 3, simply use mod as hash function.
select * from list where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0, 3, 6, 9)
Tablet 2: id in (1, 4, 7, 10)
Tablet 3: id in (2, 5, 8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
9 files changed, 280 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 4
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#10).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 538 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/10
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 10
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based partition optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#3).

Change subject: KUDU-1644 hash-partition based partition optimization
......................................................................

KUDU-1644 hash-partition based partition optimization

Hash prune for single hash-key based inList query. Reduce the prediated
values by hash-partition calculation.

Table has P partitions, N records, R rowsets. Inlist predicate has V
values.

Before:
Complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table profile:
hash(id) profile by id partitions 3, simply use mod as hash function.
select * from list where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0, 3, 6, 9)
Tablet 2: id in (1, 4, 7, 10)
Tablet 3: id in (2, 5, 8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 76 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 3
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 8:

Tagging Mahesh because he's working on partitioning-related code as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Fri, 30 Oct 2020 18:20:56 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1644 hash-partition based partition optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#5).

Change subject: KUDU-1644 hash-partition based partition optimization
......................................................................

KUDU-1644 hash-partition based partition optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.

Table has P partitions, N records, R rowsets. Inlist predicate has V
values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table profile:
hash(id) profile by id partitions 3, simply use mod as hash function.
select * from list where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0, 3, 6, 9)
Tablet 2: id in (1, 4, 7, 10)
Tablet 3: id in (2, 5, 8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
9 files changed, 338 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#9).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 538 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/9
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 9
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#14).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 552 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/14
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 14
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 10:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@267
PS8, Line 267:     return hash_bucket_schemas_;
> nit: probably better to declare this function in the header file and define
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266
PS8, Line 266: const std::vector<HashBucketSchema>& hash_partition_schemas() const {
             :     return hash_bucket_schemas_;
             :   }
             : 
> +1
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266
PS8, Line 266: const std::vector<HashBucketSchema>& hash_partition_schemas() const {
             :     return hash_bucket_schemas_;
             :   }
             : 
> It'd be better to return bool instead of using as an out parameter.
Tried to get index of hash_bucket_schema, and modified it to use int32_t as return.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@273
PS8, Line 273: atus GetRang
> Shouldn't this check be directly below the call to FindColumn()? What happe
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc
File src/kudu/common/scan_spec-test.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@518
PS8, Line 518: 
> nit: "partition", same elsewhere
Thanks


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@553
PS8, Line 553:  SCOPED_TRACE(spec_p3.ToString(schema));
             :   EXPECT_EQ("PK >= (int8 a=4, int8 b=10, int8 c=-128) AND "
             :             "PK < (int8 a=8, 
> Could you also add a case for hash(a), hash(b)?
Thx for point out. I didn't thought about it throughly.
 I found it doesn't work well in this case. I change the method to detect is a key is in a partition and patched more cases. I think it may works well now.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@563
PS8, Line 563:   EXPECT_EQ("PK >= (int8 a=0, int8 b=40, int8 c=-128) AND "
             :             "PK < (int8 a=9, int8 b=71, int8 c=-128) AND "
             :             "a IN (0, 2, 5, 9) AND b IN (40, 60, 70)",
             :             spec_p4.ToString(schema));
             : 
             :   spec_p5.PruneHashForInlistIfPossible(schema, partitions[4], partition_schema);
             :   spec_p5.OptimizeScan(schema, &arena_, true);
             : 
             :   SCOPED_TRACE(spec_p5.ToString(schema));
             :   EXPECT_EQ("PK >= (int8 a=0, int8 b=20, int8 c=-128) AND "
             :             "PK < (int8 a=9, int8 b=51, int8 c=-128) AND "
             :             "a IN (0, 2, 5, 9) AND b IN (20, 30, 50)",
             :             spec_p5.ToString(schema));
             : 
             :   spec_p6.PruneHashForInlistIfPossible(schema, partitions[5], partition_schema);
             :   spec_p6.OptimizeScan(schema, &arena_, true);
             : 
             :   SCOPED_TRACE(spec_p6.ToString(schema));
             :   EXPECT_EQ("PK >= (int8 a=0, int8 b=10, int8 c=-128) AND "
             :             "PK < (int8 a=9, int8 b=81, int8 c=-128) AND "
             :             "a IN (0, 2, 5, 9) AND b IN (10, 80)",
             :             spec_p6.ToString(schema));
             : 
             :   spec_p7.PruneHashForInlistIfPossible(schema, partitions[6], partition_schema);
             :   spec_p7.OptimizeScan(schema, &arena_, true);
> nit: seems like this code is reused a bunch in these tests. Maybe define a 
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h
File src/kudu/common/scan_spec.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@62
PS8, Line 62:  is pre
> is empty
removed this method and merged into CanShortcuit method


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@80
PS8, Line 80:     hash(onekey), has
> I suppose the comment indicates, "supports hash schema with only one key"?
Patched and can you have an more eye on the comment if possible cause I don't know I delivered the idea properly for poor english skill. Thank you!


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@88
PS8, Line 88:                   return predicate.second.predicate_type() == PredicateType::None 
> Why shouldn't we put the InList empty check here?
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@186
PS8, Line 186: int hash_idx = partition_schema.TryGetSingleColumnHashPartitionIndex(schema, idx);
             :     if (hash_idx == -1) co
> nit: can these lines be merged? same elsewhere with using temp variable for
Added a another short-circuit detect. 
BTW status s is only used here, and the coming one is in lambda function.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@191
PS8, Line 191:           [idx, hash_idx, &schema, &partition, &partition_schema](const void* value) {
> We should be able to push this to the front of the loop and avoid a FindCol
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: onst uint
> +1. If we pass 'idx' here, we would save an extra call to FindColumn() in '
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: onst uint
> nit: we've already found the column index -- could we pass 'idx' or the col
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@205
PS8, Line 205: d_set<string> missing_col_names;
             :   for (const auto& entry : predicates_) {
> nit: can merge these lines
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2667
PS8, Line 2667: 
              :   spec.OptimizeScan(tablet_schema, scanner->arena(), true);
              :   VLOG(3) << "After
> If we pushed the emptiness check into CanShortCircuit(), would that avoid t
sure, added


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2737
PS8, Line 2737: 
              :   // It's important to keep the reference to the tablet for the case when the
              :   // tablet replica's shutdown is run concurrently with the code below.
              :   shared_ptr<Tablet> tablet;
              :   R
> Hmm, I'm not sure, but could we move this up to right after OptimizeScan()?
Add a extra shortcuit detect now as a temp workaround and I'm looking for the possibility with more time on this part of code.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 10
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 08:26:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#13).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 553 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/13
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 13
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#11).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 540 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/11
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 11
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#16).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 560 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/16
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 16
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/partition.cc@458
PS17, Line 458:   CHECK_EQ(partition.hash_buckets().size(), hash_bucket_schemas_.size());
nit: can you also add this as a DCHECK_EQ() to HashPartitionContainsRowImpl? I found it helpful in orienting myself with this code, to understand the relationship between hash_buckets() and this partition schema.


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/partition.cc@460
PS17, Line 460: const HashBucketSchema& hash_bucket_schema = hash_bucket_schemas_[i];
              :     int32_t bucket = -1;
              :     RETURN_NOT_OK(BucketForRow(row, hash_bucket_schema, &bucket));
              : 
              :     if (bucket != partition.hash_buckets()[i]) {
              :       *contains = false;
              :       return Status::OK();
              :     }
nit: could we use HashPartitionContainsRow() here instead, to avoid the code duplication?


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.h
File src/kudu/common/scan_spec.h:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.h@80
PS17, Line 80: Supports hash schema with only onekey.
nit: maybe rephrase this as "Only supports pruning for single-column hash schemas."


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.cc@323
PS17, Line 323:  && lower_bound_key_ != nullptr) {
Was this required for this patch?


http://gerrit.cloudera.org:8080/#/c/16674/14/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16674/14/src/kudu/tserver/tablet_service.cc@2663
PS14, Line 2663:   if (spec.IsInListPredicateValuesEmpty()) {
               :     // Return if hash predicate values is empty.
               :     *has_more_results = false;
               :     return Status::OK();
               :   }
Looking through the failed test logs, TabletServerTest.TestNonPositiveLimitsShortCircuit should pass if we remove this, since we call it at L2732 anyway. The below lines L2668-L2724 must run for that test to pass because result_collector->InitSerializer() is required to correctly initialize the response data.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 17
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Nov 2020 07:25:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 18: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 18
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Nov 2020 02:19:07 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 19: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG@16
PS18, Line 16: Before:
             : To each tablet, time complexity to complete hash-key based in-list query is:
             : LOG(V) * N
             : 
             : After:
             : Complexity becomes:
             : LOG(V/P) * N
> https://docs.google.com/document/d/1WO4TT2ZqGsvlgogyKOsChpinEeupZCkxn9OI5xu
Thanks for the experiments!

Interpreting the results a bit, for larger lists, the improvement seems less pronounced, as more of the time is spent in copying the larger results set compared to evaluating the predicate.

I would expect that for lists that are on the same order of magnitude as the number of hash buckets (in this case, 5-10), that the improvement may be even greater, as we would short circuit some tablets completely.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 19
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 06:06:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 17:

> Patch Set 17:
> 
> So sorry for patching so much that the msg is so messy for reviewing. I was busy tackling the break things detected by tidy and something else. The correctness and performance were tested in my environment with hours testing. 
> Would you mind me abandon it and re-commit for review convenient?

Multiple revisions are expected, so it's nothing to worry about. I'd prefer keeping reviews on this patch, so it's easy to compare against versions I've already reviewed.


-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 17
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Nov 2020 21:30:30 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Mahesh Reddy (Code Review)" <ge...@cloudera.org>.
Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@267
PS8, Line 267:   void IsSingleColumnHashParition(const Schema& schema,
nit: probably better to declare this function in the header file and define it in 'partition.cc'. So just move the function body to the source file.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266
PS8, Line 266: // Set is_hash_partition to true if col_name is one of hash partition schema.
             :   void IsSingleColumnHashParition(const Schema& schema,
             :                                   const std::string& col_name,
             :                                   bool* is_hash_partition) const {
> It'd be better to return bool instead of using as an out parameter.
+1


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@273
PS8, Line 273: if (!s.ok())
Shouldn't this check be directly below the call to FindColumn()? What happens if the column is not found and we call column_id()?


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@186
PS8, Line 186: Status s = schema.FindColumn(col_name, &idx);
             :     if (!s.ok()) continue;
nit: can these lines be merged? same elsewhere with using temp variable for Status object on one line and checking on a separate line.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: col_name,
> nit: we've already found the column index -- could we pass 'idx' or the col
+1. If we pass 'idx' here, we would save an extra call to FindColumn() in 'IsSingleColumnHashPartition()'.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 01:06:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#17).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 558 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/17
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 17
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#19).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
LOG(V) * N

After:
Complexity becomes:
LOG(V/P) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 552 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/19
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 19
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based partition optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#2).

Change subject: KUDU-1644 hash-partition based partition optimization
......................................................................

KUDU-1644 hash-partition based partition optimization

Hash prune for single hash-key based inList query. Reduce the prediated
values by hash-partition calculation.

Table has P partitions, N records, R rowsets. Inlist predicate has V
values.

Before:
Complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table profile:
hash(id) profile by id partitions 3, simply use mod as hash function.
select * from list where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0, 3, 6, 9)
Tablet 2: id in (1, 4, 7, 10)
Tablet 3: id in (2, 5, 8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
6 files changed, 75 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 2
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG@16
PS18, Line 16: Before:
             : To each tablet, time complexity to complete hash-key based in-list query is:
             : V * LOG(N/R) * N
             : 
             : After:
             : Complexity becomes:
             : V/P * LOG(N/R) * N
> I'm curious, do you also have numbers, in the same way that you did in the 
https://docs.google.com/document/d/1WO4TT2ZqGsvlgogyKOsChpinEeupZCkxn9OI5xulryM/edit?usp=sharing
We patched this idea in our company's fork in other form. But it's hard to paste our log here or it is really hard to explain what does the log do. 
So today, I did a little scale experiment to explain this idea with benchmark. 
The number, it's depend on how many partitions the table have as I think. E.g. 9 partitions may have 3 times speed up roughly.
And I also explained why it can not reach this number in my experiment, actually it managed to made the number in our product. 

BTW, 
In commit msg, it's was my mistake to made it wrong.
I removed rowset concept in order to have a better explain.
It should be

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
LOG(V) * N

After:
Complexity becomes:
LOG(V/P) * N



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 18
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Nov 2020 13:58:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 18:

(2 comments)

This looks good to me -- just one more question about testing/perf.

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG@16
PS18, Line 16: Before:
             : To each tablet, time complexity to complete hash-key based in-list query is:
             : V * LOG(N/R) * N
             : 
             : After:
             : Complexity becomes:
             : V/P * LOG(N/R) * N
I'm curious, do you also have numbers, in the same way that you did in the Java-client patch? https://gerrit.cloudera.org/c/14706/

The added benefit of doing this server-side is that we should see the benefit in both the Java and the C++ clients.


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.cc@323
PS17, Line 323: 
> This is not required as I think. 
Ah I see, thanks for explaining! This seems safe, given we dereference lower_bound_key_ below anyway.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 18
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Nov 2020 02:19:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#8).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
9 files changed, 354 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/8
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#12).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 552 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/12
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 12
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#18).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.
This patch reduces the IN List predicated values to be pushed to tablet
without change the content to be returned.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
11 files changed, 552 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/18
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 18
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/partition.cc@458
PS17, Line 458:   CHECK_EQ(partition.hash_buckets().size(), hash_bucket_schemas_.size());
> nit: can you also add this as a DCHECK_EQ() to HashPartitionContainsRowImpl
Done


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/partition.cc@460
PS17, Line 460: const HashBucketSchema& hash_bucket_schema = hash_bucket_schemas_[i];
              :     int32_t bucket = -1;
              :     RETURN_NOT_OK(BucketForRow(row, hash_bucket_schema, &bucket));
              : 
              :     if (bucket != partition.hash_buckets()[i]) {
              :       *contains = false;
              :       return Status::OK();
              :     }
> nit: could we use HashPartitionContainsRow() here instead, to avoid the cod
Done


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.h
File src/kudu/common/scan_spec.h:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.h@80
PS17, Line 80: Supports hash schema with only onekey.
> nit: maybe rephrase this as "Only supports pruning for single-column hash s
Done


http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/17/src/kudu/common/scan_spec.cc@323
PS17, Line 323:  && lower_bound_key_ != nullptr) {
> Was this required for this patch?
This is not required as I think. 
But adding (type == inlist && predicates == empty) into CanShortCircuit break the tidy detect, I haven't figure out how to guarantee it's always notnull with edit in shortcircuit.


http://gerrit.cloudera.org:8080/#/c/16674/14/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16674/14/src/kudu/tserver/tablet_service.cc@2663
PS14, Line 2663:   if (spec.IsInListPredicateValuesEmpty()) {
               :     // Return if hash predicate values is empty.
               :     *has_more_results = false;
               :     return Status::OK();
               :   }
> Looking through the failed test logs, TabletServerTest.TestNonPositiveLimit
Yes that's right. Patched and added NO_FATALS(spec.OptimizeScan()) in tests.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 17
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Mon, 09 Nov 2020 05:58:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16674/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/7//COMMIT_MSG@2
PS7, Line 2:   ningw <19...@gmail.com>
           : AuthorDate: 
this patch is target on 
select * from where solo_hash_key in (elem1, elem2, xxx ),

However, for those query which can fix hash partitions like
select * from where hashkey1 = CONST_VALUE and hashkey2 in (elem1, elem2). Further investigation is needed for patching all of these scenarios.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 7
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Fri, 30 Oct 2020 09:08:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16674

to look at the new patch set (#7).

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................

KUDU-1644 hash-partition based in-list predicate optimization

Hash prune for single hash-key based inList query. Reduce the values to predicate
by hash-partition match.

Table has P partitions, N records, R rowsets. Inlist predicate has V values.

Before:
To each tablet, time complexity to complete hash-key based in-list query is:
V * LOG(N/R) * N

After:
Complexity becomes:
V/P * LOG(N/R) * N

E.g.
Hash partition of table 'profile':
hash(id) by id partitions 3, simply use mod as hash function.
select * from profile where id in (1,2,3,4,5,6,7,8,9,10)

Before:
Tablet 1: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 2: id in (1,2,3,4,5,6,7,8,9,10)
Tablet 3: id in (1,2,3,4,5,6,7,8,9,10)

After:
Tablet 1: id in (0,3,6,9)
Tablet 2: id in (1,4,7,10)
Tablet 3: id in (2,5,8)

Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
---
M src/kudu/common/column_predicate.h
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tserver/tablet_service.cc
9 files changed, 357 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/16674/7
-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 7
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-1644 hash-partition based in-list predicate optimization

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG@16
PS18, Line 16: Before:
             : To each tablet, time complexity to complete hash-key based in-list query is:
             : LOG(V) * N
             : 
             : After:
             : Complexity becomes:
             : LOG(V/P) * N
> Thanks for the experiments!
Thanks for the reviewing and helping! 
I'll patch the todo I leaved in code soon. 
And also I think I can try to investigating more in reducing the impact from copy overhead if possible.



-- 
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 19
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 06:29:06 +0000
Gerrit-HasComments: Yes