You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/07/04 18:49:11 UTC

[kudu-CR] KUDU-2823 Place tablet replicas based on deminsion

Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13632 )

Change subject: KUDU-2823 Place tablet replicas based on deminsion
......................................................................


Patch Set 10:

(34 comments)

I skimmed trough, and overall it looks reasonable.  There is a few question on the determining the load of a tablet server when using dimension labeling for new replicas and it would be nice to have a bit more of the unit test coverage.

http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@7
PS10, Line 7: deminsion
dimension


http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@9
PS10, Line 9: the newly
            : created tablet to be evenly distributed on each tablet server
maybe, rephrase for clarity:

... replicas of the newly created tablet to be evenly distributed across all available tablet servers.


http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@25
PS10, Line 25: deminsion
dimension


http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@27
PS10, Line 27: deminsion
dimension


http://gerrit.cloudera.org:8080/#/c/13632/10//COMMIT_MSG@30
PS10, Line 30: deminsion
dimension


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h
File src/kudu/client/client.h:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h@882
PS10, Line 882: the tablet
Which tablet?  Or you meant 'table'?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/client/client.h@1210
PS10, Line 1210: that was created
to be created ?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc
File src/kudu/integration-tests/create-table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@228
PS10, Line 228:   vector<string> ts_flags;
maybe, drop this since it's empty anyways.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@229
PS10, Line 229:   vector<string> master_flags;
              :   master_flags.emplace_back("--master_place_tablet_replicas_based_on_dimension=true");
              :   master_flags.emplace_back("--tserver_last_replica_creations_halflife_ms=10");
nit: the shorter form might be

const vector<string> master_flags = {
  "--master_place_tablet_replicas_based_on_dimension",
  "--tserver_last_replica_creations_halflife_ms=10",
};


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@233
PS10, Line 233: ts_flags
nit: {}


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/integration-tests/create-table-itest.cc@321
PS10, Line 321: 10
kNumServers ?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@708
PS10, Line 708: boost::optional<string>*
why boost::optional<string>* ?  It seems string* as a type for the output parameter is good enough here, no?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@3733
PS10, Line 3733: range
dimension ?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@3748
PS10, Line 3748:     if (PREDICT_TRUE(s.ok())) {
               :       PlacementPolicy policy(std::move(ts_descs), rng_);
               :       s = policy.PlaceExtraTabletReplica(std::move(existing), range, &extra_replica);
               :     }
I think that in case of failure to get partition's debug string it's safe enough to just log a warning and proceed with empty/none dimension label.

The idea is that this method is used for re-replication as well, and it's better to place a replica maybe a bit off from the dimension-dictated destination, but still preserve availability of the tablet's data.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@4688
PS10, Line 4688: range
dimension


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/catalog_manager.cc@4688
PS10, Line 4688: enable
is enabled


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/master.proto@321
PS10, Line 321: when 
drop


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc
File src/kudu/master/placement_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@904
PS10, Line 904:           { "ts1", 1000, { { "A", 1000 }, } },
              :           { "ts2", 1000, { { "A", 1000 }, } },
It would be nice to add a few scenarios where:

* tablets for multiple dimensions label are being added (like 5 different dimensions) for the same table
* the initial distribution of tablet replicas has a few 'deeps' and 'peaks' around some average level: e.g., 10, 20, 30
* the replicas are placed for tablets with replication factor of 3 (i.e. the first argument for PlaceTabletReplicas() is 3, not only 1).
* verify how PlaceExtraTabletReplica() works as well
* there scenarios should cover the cases where the replication factor is a) equal to the number of tablet servers b) replication factor + 1 (e.g., for RF=3 that's 4 tablet servers) c) two times of the replication factor (e.g. 6 tablet servers or more for RF=3).

In the end, it's necessary to validate the overall distribution of the replicas cluster-wise and label-wise.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@910
PS10, Line 910: fuc
nit: func


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@930
PS10, Line 930: boost::none
It would be nice to have a coverage for the scenario when label-specific replicas are added in the case of huge imbalance of total number of replicas.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@933
PS10, Line 933: ASSERT_TRUE(ts_uuid == "ts0" || ts_uuid == "ts1" || ts_uuid == "ts2") << ts_uuid;
drop this: it does not provide much of the semantically meaningful coverage


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@941
PS10, Line 941: ASSERT_GE(stddev, 20.0);
Did you run it many times (like thousands) to be sure that's a good enough threshold to avoid flakiness in the future runs?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944
PS10, Line 944: B
For clarity, it's better to use some other label which is not easily conflated with table names ('A' is used as a table name in the initial distribution of replicas for the test).  Maybe, something like 'label0' ?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944
PS10, Line 944: Ask for
Place


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@944
PS10, Line 944: with 
with dimension label ?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@954
PS10, Line 954: 1
It would be nice to add a test scenario that corresponds to a replication factors of 3 and 5 as well.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@957
PS10, Line 957: ASSERT_TRUE(ts_uuid == "ts0" || ts_uuid == "ts1" || ts_uuid == "ts2") << ts_uuid;
I'm not sure this assert has some semantically meaningful coverage: sure, the selected tablet server is one of the registered ones (there are no others).  Also, the ASSERT() at line 961 covers this as well.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy-test.cc@964
PS10, Line 964: LOG(INFO) << "stddev = " << stddev;
Is it crucial to output from the test?  If that was just for debugging purposes, consider removing this once the test is ready.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc
File src/kudu/master/placement_policy.cc:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@56
PS10, Line 56:   // TODO (oclarms): get the number of times this tablet server has recently been
             :   //  selected to create a tablet replica by dimension.
Is this about somehow amplifying the numbers already accounted for by RecentReplicaCreations()?  I'm curious why that is needed.  Does it mean the current way of accounting is not good enough?


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@58
PS10, Line 58: return desc->RecentReplicaCreations() + desc->num_live_replicas(std::move(dimension));
What if measuring the load of two tablet servers, where nothing has been created so far (so both  RecentReplicaCreations() and num_live_replicas(label) are 0), but first has 10, but second 20 tablet replicas total?  I think the former one should be considered less loaded than the second.  Or you want those dimension to be completely independent from the overall load?

Maybe, add num_live_replicas() / 10000 or something like that into the sum?  Ideally, that should be something that expresses the preference of using dimension-only count against  weighted 'all other counts'.  I'm not sure what's the best way to express it here, though.  Probably, using 1/10000 is good enough initially given it's not expected to host that many replicas per tablet server :)


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/placement_policy.cc@73
PS10, Line 73: If
nit:  If --> if


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/ts_descriptor.h
File src/kudu/master/ts_descriptor.h:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/master/ts_descriptor.h@136
PS10, Line 136: number
total number


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.h@204
PS10, Line 204: Return
nit: it's not the result value of this method per se, so maybe replace 'Return' with simply 'Get' here.


http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13632/10/src/kudu/tserver/ts_tablet_manager.cc@1248
PS10, Line 1248: CHECK
nit: it's safe to use DCHECK here since in release build it would crash with SIGSEGV anyways at line 1261 if num_live_tablets_by_dimension was nullptr.



-- 
To view, visit http://gerrit.cloudera.org:8080/13632
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a225e221eb42ef2f5489687e80a151d8dc1a42
Gerrit-Change-Number: 13632
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Jul 2019 18:49:11 +0000
Gerrit-HasComments: Yes