You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2021/11/22 20:48:31 UTC

[kudu-CR] KUDU-2671 number of per-range hash dimensions should be fixed for now

Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18045

to look at the new patch set (#2).

Change subject: KUDU-2671 number of per-range hash dimensions should be fixed for now
......................................................................

KUDU-2671 number of per-range hash dimensions should be fixed for now

As it turned out, updating the client's metacache, the system catalog's
logic, and the partition pruner to accommodate for partition keys with
variable size of the hash part seems to be a substantial effort on
itself.  With that, we can still deliver the most frequently requested
functionality of changing the number of hash buckets per range partition
in short term if keeping the restriction on the size of the hash-related
part of a partition key.  That translates into the restriction of having
the same number of hash dimensions across all the per-range hash schemas
in a table.

So, this patch adds a restriction on the number of hash dimensions for
per-range hash schemas: it should be the same for all the ranges in
a table.  However, it's allowed to change the rest of parameters used
to define a hash schema per range:
  * the number of hash buckets
  * the set of columns for the hash bucketing
  * the seed for the hash function

As a part of this changelist, a few test scenarios are now disabled:
those are to be re-enabled once the rest of the code in the system
catalog, the client metacache, and the partition pruner is able
to handle varying hash dimensions.  In addition, new test scenarios
have been added to verify that the invariant of the same number of
hash dimensions across all the range partition is enforced.

Also, I updated the comparison operator for PartitionKey: since the
number of hash dimensions isn't varying across per-range hash schemas,
it's no longer necessary to concatenate the hash and the range parts to
provide the legacy ordering of partition keys for some edge cases.
I guess the comparison operator might change if switching to just having
a single string under the hood in PartitionKey, but at this point I
decided to keep them separate under the hood.  For the sake of being
future-proof and easier to review, I think of starting using strings
views (or Slice) for the range_key() and the hash_key() methods in a
follow-up changelist, regardless of how the serialized partition key
is represented under the hood in PartitionKey.

Change-Id: Ic884fa556462b85c64d77385a521d9077d33c7c1
---
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager.cc
5 files changed, 424 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/18045/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic884fa556462b85c64d77385a521d9077d33c7c1
Gerrit-Change-Number: 18045
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)