You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2021/09/30 17:47:38 UTC

[kudu-CR] WIP KUDU-2671 introduce PartitionKey

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17890


Change subject: WIP KUDU-2671 introduce PartitionKey
......................................................................

WIP KUDU-2671 introduce PartitionKey

This changelist introduces a dedicated PartitionKey data structure to
represent a table's partition key.  The crux is in keeping the hash and
the range part of the key separately, so it's possible to properly work
with such an entity without the context of a particular range or tablet.
Before this patch, encoded partition keys were represented as strings that
had the hash and the range parts concatenated: it was always possible to
properly decode such a key having only table-wide information in the
context because every range has the same hash schema.  However, once
custom hash schemas per range were introduced, now it's impossible to
properly decode such a compound key without knowing the particular hash
schema that was used to encode the key.

The ordering of PartitionKeys in this changelist follows the legacy
notation of comparing the hash part of the key first and then the range
part, but it's going to change in a follow-up changelist because that's
the proper way to do so w.r.t. the logic of the PartitionPruner.
Also, the updated comparison operator for PartitionKey requires
corresponding changes in the catalog manager and client's meta-cache.
I decided to split the changes for easier tracking and reviewing.

As a result, a few new tests scenarios related to using custom hash
schemas per range partition is not working as expected yet because they
require the comparison operator for PartitionKey to be updated.  That
is going to be addressed in a follow-up changelist as well.

WIP:
  * address TODOs to verify that the catalog manager properly reports
    an error to a client for GetTableLocations RPC when a table has
    custom hash partition schema per range, but client doesn't populate
    the newly introduced fields
  * either introduce an extra field to ScanTokenPB (similar to the newly
    added GetTableLocationsRequestPB::partition_key_range field) or rely
    on the ScanTokenPB::tablet_metadata field to extract the hash and
    the range parts out of the strings representing lower and upper scan
    boundaries

Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/client/partitioner-internal.cc
M src/kudu/client/partitioner-internal.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/common/common.proto
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_path_handlers.cc
M src/kudu/tools/table_scanner.cc
M src/kudu/tools/tool_action_table.cc
29 files changed, 996 insertions(+), 688 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/17890/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17890

to look at the new patch set (#4).

Change subject: KUDU-2671 introduce PartitionKey
......................................................................

KUDU-2671 introduce PartitionKey

This changelist introduces a dedicated PartitionKey data structure to
represent a table's partition key.  The crux is in keeping the hash and
the range part of the key separately, so it's possible to properly work
with such an entity without the context of a particular range or tablet.
Before this patch, encoded partition keys were represented as strings that
had the hash and the range parts concatenated: it was always possible to
properly decode such a key having only table-wide information in the
context because every range had the same hash schema.  However, once
custom hash schemas per range were introduced, now it's impossible to
properly decode such a compound key without knowing the particular hash
schema that was used to encode the key.

The ordering of PartitionKeys in this changelist follows the legacy
notation of comparing the hash part of the key first and then the range
part, but it's going to change in a follow-up changelist because that's
the proper way to do so w.r.t. the logic of the PartitionPruner.  Also,
the updated comparison operator for PartitionKey requires corresponding
changes in the catalog manager and client's meta-cache.
I decided to split the changes for easier tracking and reviewing.

As a result, a few new tests scenarios related to using custom hash
schemas per range partition is not working as expected yet because they
require the comparison operator for PartitionKey to be updated:
  * FlexPartitioningCreateTableTest.NoUpperBoundRangeCustomHashSchema
  * PartitionPrunerTest.TestHashSchemasPerRangePruning
  * PartitionPrunerTest.TestSingleRangeElementAndBoundaryCase
That is going to be addressed in a follow-up changelist as well.

As a by-product of this change, the following methods of the
PartitionSchema class started working as needed in case of a table with
per-range custom hash schemas:
  * PartitionKeyDebugStringImpl()
  * PartitionKeyDebugString()

I was also thinking about introducing an extra field to ScanTokenPB,
like the newly added GetTableLocationsRequestPB::partition_key_range
field, or rely on the ScanTokenPB::tablet_metadata field to extract
the hash and the range parts out of the strings representing lower and
upper scan boundaries, but I realized that deserves to be done in a
separate changelist.

Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/client/partitioner-internal.cc
M src/kudu/client/partitioner-internal.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/common/common.proto
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_path_handlers.cc
M src/kudu/tools/table_scanner.cc
M src/kudu/tools/tool_action_table.cc
31 files changed, 1,096 insertions(+), 697 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/17890/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17890

to look at the new patch set (#7).

Change subject: KUDU-2671 introduce PartitionKey
......................................................................

KUDU-2671 introduce PartitionKey

This changelist introduces a dedicated PartitionKey data structure to
represent a table's partition key.  The crux is in keeping the hash and
the range part of the key separately, so it's possible to properly work
with such an entity without the context of a particular range or tablet.
Before this patch, encoded partition keys were represented as strings that
had the hash and the range parts concatenated: it was always possible to
properly decode such a key having only table-wide information in the
context because every range had the same hash schema.  However, once
custom hash schemas per range were introduced, now it's impossible to
properly decode such a compound key without knowing the particular hash
schema that was used to encode the key.

The ordering of PartitionKeys in this changelist follows the legacy
notation for comparison operator, first concatenating the hash and the
range parts and then comparing the result strings.  However, it's going
to change in a follow-up changelist.  That change induces corresponding
changes in PartitionPruner.  Also, the updated comparison operator for
PartitionKey requires updating the code in the catalog manager and in
the client's meta-cache.  I decided to split the changes into a few
changelists for easier tracking and reviewing.

As a by-product of this change, the following methods of the
PartitionSchema class started working as needed in case of a table with
per-range custom hash schemas:
  * PartitionKeyDebugStringImpl()
  * PartitionKeyDebugString()

I was also thinking about introducing an extra field to ScanTokenPB,
like the newly added GetTableLocationsRequestPB::partition_key_range
field, or rely on the ScanTokenPB::tablet_metadata field to extract
the hash and the range parts out of the strings representing lower and
upper scan boundaries, but I realized that deserves to be done in a
separate changelist.

Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/client/partitioner-internal.cc
M src/kudu/client/partitioner-internal.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/common/common.proto
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_path_handlers.cc
M src/kudu/tools/table_scanner.cc
M src/kudu/tools/tool_action_table.cc
31 files changed, 1,074 insertions(+), 694 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/17890/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17890

to look at the new patch set (#5).

Change subject: KUDU-2671 introduce PartitionKey
......................................................................

KUDU-2671 introduce PartitionKey

This changelist introduces a dedicated PartitionKey data structure to
represent a table's partition key.  The crux is in keeping the hash and
the range part of the key separately, so it's possible to properly work
with such an entity without the context of a particular range or tablet.
Before this patch, encoded partition keys were represented as strings that
had the hash and the range parts concatenated: it was always possible to
properly decode such a key having only table-wide information in the
context because every range had the same hash schema.  However, once
custom hash schemas per range were introduced, now it's impossible to
properly decode such a compound key without knowing the particular hash
schema that was used to encode the key.

The ordering of PartitionKeys in this changelist follows the legacy
notation of comparing the hash part of the key first and then the range
part, but it's going to change in a follow-up changelist because that's
the proper way to do so w.r.t. the logic of the PartitionPruner.  Also,
the updated comparison operator for PartitionKey requires corresponding
changes in the catalog manager and client's meta-cache.
I decided to split the changes for easier tracking and reviewing.

As a result, a few new tests scenarios related to using custom hash
schemas per range partition is not working as expected yet because they
require the comparison operator for PartitionKey to be updated:
  * FlexPartitioningCreateTableTest.NoUpperBoundRangeCustomHashSchema
  * PartitionPrunerTest.TestHashSchemasPerRangePruning
  * PartitionPrunerTest.TestSingleRangeElementAndBoundaryCase
That is going to be addressed in a follow-up changelist as well.

As a by-product of this change, the following methods of the
PartitionSchema class started working as needed in case of a table with
per-range custom hash schemas:
  * PartitionKeyDebugStringImpl()
  * PartitionKeyDebugString()

I was also thinking about introducing an extra field to ScanTokenPB,
like the newly added GetTableLocationsRequestPB::partition_key_range
field, or rely on the ScanTokenPB::tablet_metadata field to extract
the hash and the range parts out of the strings representing lower and
upper scan boundaries, but I realized that deserves to be done in a
separate changelist.

Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/client/partitioner-internal.cc
M src/kudu/client/partitioner-internal.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/common/common.proto
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_path_handlers.cc
M src/kudu/tools/table_scanner.cc
M src/kudu/tools/tool_action_table.cc
31 files changed, 1,096 insertions(+), 697 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/17890/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mahesh Reddy, Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17890

to look at the new patch set (#6).

Change subject: KUDU-2671 introduce PartitionKey
......................................................................

KUDU-2671 introduce PartitionKey

This changelist introduces a dedicated PartitionKey data structure to
represent a table's partition key.  The crux is in keeping the hash and
the range part of the key separately, so it's possible to properly work
with such an entity without the context of a particular range or tablet.
Before this patch, encoded partition keys were represented as strings that
had the hash and the range parts concatenated: it was always possible to
properly decode such a key having only table-wide information in the
context because every range had the same hash schema.  However, once
custom hash schemas per range were introduced, now it's impossible to
properly decode such a compound key without knowing the particular hash
schema that was used to encode the key.

The ordering of PartitionKeys in this changelist follows the legacy
notation of comparing the hash part of the key first and then the range
part, but it's going to change in a follow-up changelist because that's
the proper way to do so w.r.t. the logic of the PartitionPruner.  Also,
the updated comparison operator for PartitionKey requires corresponding
changes in the catalog manager and client's meta-cache.
I decided to split the changes for easier tracking and reviewing.

As a result, a few new tests scenarios related to using custom hash
schemas per range partition is not working as expected yet because they
require the comparison operator for PartitionKey to be updated:
  * FlexPartitioningCreateTableTest.NoUpperBoundRangeCustomHashSchema
  * PartitionPrunerTest.TestHashSchemasPerRangePruning
  * PartitionPrunerTest.TestSingleRangeElementAndBoundaryCase
That is going to be addressed in a follow-up changelist as well.

As a by-product of this change, the following methods of the
PartitionSchema class started working as needed in case of a table with
per-range custom hash schemas:
  * PartitionKeyDebugStringImpl()
  * PartitionKeyDebugString()

I was also thinking about introducing an extra field to ScanTokenPB,
like the newly added GetTableLocationsRequestPB::partition_key_range
field, or rely on the ScanTokenPB::tablet_metadata field to extract
the hash and the range parts out of the strings representing lower and
upper scan boundaries, but I realized that deserves to be done in a
separate changelist.

Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/client/partitioner-internal.cc
M src/kudu/client/partitioner-internal.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/common/common.proto
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_path_handlers.cc
M src/kudu/tools/table_scanner.cc
M src/kudu/tools/tool_action_table.cc
31 files changed, 1,096 insertions(+), 697 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/17890/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Mahesh Reddy (Code Review)" <ge...@cloudera.org>.
Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/client/meta_cache.cc@387
PS7, Line 387:   const auto& entry_hash_key = !lower_bound_partition_key().empty()
Maybe I'm missing something, but why do we not check if the upper_bound_partition_key is empty? Is one of the lower or upper bound partition keys guaranteed to be set?


http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/common/partition.cc@108
PS7, Line 108:   // Check the invariant: valid partition cannot span over multiple sets
Since this looks incomplete, maybe add a TODO here?


http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/common/partition.cc@628
PS7, Line 628: << p.begin_.DebugString();
Is this still necessary?


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc@6394
PS4, Line 6394: ,
> Well, this is shell-like notation (particularly, Bourne shell).  Adding a s
Nope I don't feel strongly about it, we can keep it as is.


http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/master/catalog_manager.cc@6399
PS7, Line 6399: start.has_hash_key() || start.has_range_key())
What if only one of hash or range key is set? Would the other part of the PartitionKey just be empty?



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 07 Oct 2021 22:38:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 07 Oct 2021 19:50:50 +0000
Gerrit-HasComments: No

[kudu-CR] WIP KUDU-2671 introduce PartitionKey

Posted by "Mahesh Reddy (Code Review)" <ge...@cloudera.org>.
Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: WIP KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition.h@102
PS3, Line 102: other < *this;
nit: switch order to *this > other;
Easier to read given the operator is >.


http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition.h@106
PS3, Line 106: return other < *this || other == *this;
nit: same as above.


http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition_pruner-test.cc@1260
PS3, Line 1260: 12 - 5
12 - 5 = 7 so isn't this the result we're expecting?


http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/master/master.proto@621
PS3, Line 621: message TablePartitionKeyPB {
Would it make sense to split up this protobuf into one message just encapsulating one range key and one hash key?

message TablePartitionKeyPB {
   optional bytes range_key = 1 [(kudu.REDACT) = true];
   optional bytes hash_key = 2;
}

Then in GetTableLocationsRequestPB we can introduce two new fields such as:

optional TablePartitionKeyPB key_range_start = 8;
optional TablePartitionKeyPB key_range_end = 9;



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 05:49:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 4:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/17890/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17890/4//COMMIT_MSG@29
PS4, Line 29: As a result, a few new tests scenarios related to using custom hash
            : schemas per range partition is not working as expected yet because they
            : require the comparison operator for PartitionKey to be updated:
> I think I missed something -- was there a functional change that led to thi
Good point -- actually, the comparison operator has changed a bit and that's why the difference: the corner case is visible in case when there are ranges with no hash schema with encoded range key greater than any encoded key with at least one hash bucket schema.

Instead of explaining this difference here in the description, I decided to make the comparison operator for PartitionKey really equivalent to the legacy comparison, and now those tests pass.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@50
PS4, Line 50: class PartitionKey {
> nit: maybe this will change based on the ordering, but maybe worth eventual
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@52
PS4, Line 52:   PartitionKey() = default;
> nit: maybe document that this is considered an "-Inf" key, and is valid to 
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@92
PS4, Line 92: *this > other || *this == other;
> nit: maybe just !(this < other)? Same with <=?
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@112
PS4, Line 112: redable
> nit: readable
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc@108
PS4, Line 108: ranges
> nit: I suppose this makes sense too, but to avoid overloading "range" too m
Good point.  I removed this DCHECK along with the comment since I found Partition is used in some hacky ways in tests.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc@627
PS4, Line 627: ));
> nit: maybe print the range keys?
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc@664
PS4, Line 664:     // TODO(aserbin): is this really necessary -- zeros in hash key are the
             :     //                minimum possible values, and the range part is already
             :     //                empty?
> This is an interesting observation -- there doesn't seem to be an functiona
Yep, it seems this was just one of various preparations when building sets/maps of ordered  partition keys.  I think to remove this in a separate changelist after making sure I'm not missing some important point here.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager-test.cc@101
PS4, Line 101: TEST(TableInfoTest, GetTableLocationsLegacyCustomHashSchemas) {
> Would it make sense to add a test with multiple ranges and have the start a
Yeah -- those are to be added in a follow-up patch to verify the functionality in the end-to-end manner.  This small unit-test is here to verify one particular point of GetTabletsInRange() implementation: it returns non-OK status when not supplying the range information properly for a table with custom hash partitioning schemas per range.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.h@304
PS4, Line 304:  only a
> nit: "This is only applicable..."
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc@5697
PS4, Line 5697: PREDICT_FALSE(req->has_key_start() && req->has_key_end() &&
              :                     req->key_start().has_range_key() &&
              :                     req->key_end().has_range_key() &&
              :                     req->key_start().range_key() > req->key_end().range_key())) {
              :     return Status::InvalidArgument("start partition range key must not be "
              :                                    "greater than the end partition range key");
              :   }
> Is it worth also checking if key_end doesn't have a range key but key_start
Yep, there are some options here: we might consider absent range key part as an empty range key, i.e. inf.  I'd rather postpone checking for invariants for a follow-up changelist -- I guess we might add those uniform checks in several places to validating the input data.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc@6394
PS4, Line 6394: ,
> nit: space after comma
Well, this is shell-like notation (particularly, Bourne shell).  Adding a space breaks it in shell (i.e. in shell you shouldn't add a space there), so I'd either keep it as.  Let me know if you feel strongly about this -- I can update the comment and remove this shell-ism.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto@649
PS4, Line 649: fields obsolete the
             :   // 'partition_key_start' and 'partition_key_end' fields above
> nit: "these fields make the ... fields above obsolete"
Done


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto@654
PS4, Line 654: encoraged
> encouraged.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 06 Oct 2021 05:15:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17890/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17890/4//COMMIT_MSG@29
PS4, Line 29: As a result, a few new tests scenarios related to using custom hash
            : schemas per range partition is not working as expected yet because they
            : require the comparison operator for PartitionKey to be updated:
I think I missed something -- was there a functional change that led to this behavior other than the debug string implementations? I'm confused as to why these no longer pass, given the PartitionKey comparison operators abide by the existing rules.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@50
PS4, Line 50: class PartitionKey {
nit: maybe this will change based on the ordering, but maybe worth eventually adding a description, like

"Key that identifies the partition that a row belongs to. Rows that belong to a given partition all fall within the same partition key range."


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@52
PS4, Line 52:   PartitionKey() = default;
nit: maybe document that this is considered an "-Inf" key, and is valid to be used


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@92
PS4, Line 92: *this > other || *this == other;
nit: maybe just !(this < other)? Same with <=?


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.h@112
PS4, Line 112: redable
nit: readable


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc@108
PS4, Line 108: ranges
nit: I suppose this makes sense too, but to avoid overloading "range" too much, maybe stick with "buckets"?


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc@627
PS4, Line 627: ));
nit: maybe print the range keys?


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/common/partition.cc@664
PS4, Line 664:     // TODO(aserbin): is this really necessary -- zeros in hash key are the
             :     //                minimum possible values, and the range part is already
             :     //                empty?
This is an interesting observation -- there doesn't seem to be an functional difference between (1, _, _) and (1, 0, "") as far as partitioning is concerned, since it seems incorrect/buggy to operate with partition keys that fall in between (1, _, _) and (1, 0, "").


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.h@304
PS4, Line 304:  only a
nit: "This is only applicable..."


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc@5697
PS4, Line 5697: PREDICT_FALSE(req->has_key_start() && req->has_key_end() &&
              :                     req->key_start().has_range_key() &&
              :                     req->key_end().has_range_key() &&
              :                     req->key_start().range_key() > req->key_end().range_key())) {
              :     return Status::InvalidArgument("start partition range key must not be "
              :                                    "greater than the end partition range key");
              :   }
Is it worth also checking if key_end doesn't have a range key but key_start does? In general, are there other constraints that are worth documenting for key_start and key_end?

Though perhaps it's worth waiting until the usage logic stabilizes.


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto@649
PS4, Line 649: fields obsolete the
             :   // 'partition_key_start' and 'partition_key_end' fields above
nit: "these fields make the ... fields above obsolete"



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 05 Oct 2021 01:36:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition.h@102
PS3, Line 102: other < *this;
> nit: switch order to *this > other;
That would be an infinite recursion.  The idea is to express operator>() via already defined operator<(), hence the order.


http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition.h@106
PS3, Line 106: return other < *this || other == *this;
> nit: same as above.
Yup, here it's possible to use the newly defined operator>() since the latter has just been defined above.


http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/common/partition_pruner-test.cc@1260
PS3, Line 1260: 12 - 5
> 12 - 5 = 7 so isn't this the result we're expecting?
Ah, that seems to be a typo.  Thanks!


http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/17890/3/src/kudu/master/master.proto@621
PS3, Line 621: message TablePartitionKeyPB {
> Would it make sense to split up this protobuf into one message just encapsu
I think that's a good idea -- that way the new type will mirror PartitionKey in the code.

Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Oct 2021 04:15:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Mahesh Reddy (Code Review)" <ge...@cloudera.org>.
Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager-test.cc@101
PS4, Line 101: TEST(TableInfoTest, GetTableLocationsLegacyCustomHashSchemas) {
Would it make sense to add a test with multiple ranges and have the start and end partition keys be from different ranges?


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/catalog_manager.cc@6394
PS4, Line 6394: ,
nit: space after comma


http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/17890/4/src/kudu/master/master.proto@654
PS4, Line 654: encoraged
encouraged.



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 04 Oct 2021 22:58:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/client/meta_cache.cc@387
PS7, Line 387:   const auto& entry_hash_key = !lower_bound_partition_key().empty()
> Maybe I'm missing something, but why do we not check if the upper_bound_par
Right: the hash key, if present, is set either in the lower or the upper bound.


http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/common/partition.cc@108
PS7, Line 108:   // Check the invariant: valid partition cannot span over multiple sets
> Since this looks incomplete, maybe add a TODO here?
I guess I'm going to drop this in the follow-up changelist -- it seems that's the only comment that needs some addressing, so I guess it's easier to do so in next iteration.


http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/common/partition.cc@628
PS7, Line 628: << p.begin_.DebugString();
> Is this still necessary?
It's not exactly necessary, but it might help to see what's the PartitionKey if this ever fails -- I added this when addressing Andrew's feedback.


http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17890/7/src/kudu/master/catalog_manager.cc@6399
PS7, Line 6399: start.has_hash_key() || start.has_range_key())
> What if only one of hash or range key is set? Would the other part of the P
Yes, that's possible:
  * the range part is empty, but the hash part is non-empty -- that's for unbounded range with some hash bucketing
  * the range part is non-empty, but the hash part is empty -- that the case of a range with exact lower bound, but with no hash bucketing



-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 08 Oct 2021 00:46:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2671 introduce PartitionKey

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17890 )

Change subject: KUDU-2671 introduce PartitionKey
......................................................................

KUDU-2671 introduce PartitionKey

This changelist introduces a dedicated PartitionKey data structure to
represent a table's partition key.  The crux is in keeping the hash and
the range part of the key separately, so it's possible to properly work
with such an entity without the context of a particular range or tablet.
Before this patch, encoded partition keys were represented as strings that
had the hash and the range parts concatenated: it was always possible to
properly decode such a key having only table-wide information in the
context because every range had the same hash schema.  However, once
custom hash schemas per range were introduced, now it's impossible to
properly decode such a compound key without knowing the particular hash
schema that was used to encode the key.

The ordering of PartitionKeys in this changelist follows the legacy
notation for comparison operator, first concatenating the hash and the
range parts and then comparing the result strings.  However, it's going
to change in a follow-up changelist.  That change induces corresponding
changes in PartitionPruner.  Also, the updated comparison operator for
PartitionKey requires updating the code in the catalog manager and in
the client's meta-cache.  I decided to split the changes into a few
changelists for easier tracking and reviewing.

As a by-product of this change, the following methods of the
PartitionSchema class started working as needed in case of a table with
per-range custom hash schemas:
  * PartitionKeyDebugStringImpl()
  * PartitionKeyDebugString()

I was also thinking about introducing an extra field to ScanTokenPB,
like the newly added GetTableLocationsRequestPB::partition_key_range
field, or rely on the ScanTokenPB::tablet_metadata field to extract
the hash and the range parts out of the strings representing lower and
upper scan boundaries, but I realized that deserves to be done in a
separate changelist.

Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Reviewed-on: http://gerrit.cloudera.org:8080/17890
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/flex_partitioning_client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/client/partitioner-internal.cc
M src/kudu/client/partitioner-internal.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scan_token-internal.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/common/common.proto
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_path_handlers.cc
M src/kudu/tools/table_scanner.cc
M src/kudu/tools/tool_action_table.cc
31 files changed, 1,074 insertions(+), 694 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I00255ec404beeb999117f5265de0d5d8deaf0d68
Gerrit-Change-Number: 17890
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)