You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2022/06/02 14:03:45 UTC
[GitHub] [incubator-pegasus] empiredan commented on issue #865: Feature: support to increase the replication factor of each table

empiredan commented on issue #865:
URL: https://github.com/apache/incubator-pegasus/issues/865#issuecomment-1144905936

   > > > > > > Good enhancement~
   > > > > > > I have some questions about updating table replica count:
   > > > > > > 
   > > > > > > 1. There're also cluster-level config called `max_replica_count`, will it be confict if cluster `max_replica_count` is not equal to table `max_replica_count`? Will important features work such as user read/write, learn, load balance, rolling_update?
   > > > > > > 2. Table `app_info` file will store in each replica disk (_dir/.app_info), if `max_replica_count` is updated, this file should also be updated, how to update it?
   > > > > > > 3. In your description, `cure` (executed each 10s) will delete redunant replica or add lacking replica. Could you please add some manual test results?
   > > > > > > 
   > > > > > > Expecting your reply~
   > > > > > 
   > > > > > 
   > > > > > Ok, let's discuss one by one:
   > > > > > 
   > > > > > 1. Does cluster-level config mean `max_allowed_replica_count` in [feat: restrict the replication factor while creating app XiaoMi/rdsn#963](https://github.com/XiaoMi/rdsn/pull/963) ? Actually `max_replica_count` is the RF, and `max_allowed_replica_count` is a the upper bound to which an RF can be set. Or we should rename `max_replica_count` as replication_factor to avoid misunderstanding ?
   > > > > > 2. Good question ! This implementation has't written into `.app_info`, this is a bug ! To guarantee the atomicity, I think first the request `set_replica_count` can be replicated to each replica; Received the request, each replica writes it into `slog`, and then writes the new `max_replica_count` into `.app_info`. Even if a replica server fails during the process, the request will not be lost.
   > > > > > 3. As is described in this issue, increasing `max_replica_count` has been tested sufficiently, however decreasing `max_replica_count` has not been tested. I'll test decreasing later to see if redundant replica can be processed correctly.
   > > > > 
   > > > > 
   > > > > Thanks for your reply~ Continue to discuss:
   > > > > 
   > > > > 1. I mean meta option called `max_replicas_in_group` , you can reference it here:
   > > > >    https://github.com/XiaoMi/rdsn/blob/289eb4609ae781e3bcc6bbc6bfeae64dfd8fa785/src/meta/meta_options.cpp#L160-L161
   > > > > 2. I suggest that you can update `max_replica_count` through `on_config_sync` into `.app_info`
   > > > > 3. I think decreasing replica count may have some problems, expecting your test result~
   > > > 
   > > > 
   > > > Thanks for suggestion !
   > > > 
   > > > 1. I think we can eliminate `max_replicas_in_group` and turn to partition-level `max_replica_count`. I'll explain this as below:
   > > > 
   > > > Now `max_replicas_in_group` is only used to update `config_context::MAX_REPLICA_COUNT_IN_GRROUP`; And `config_context::MAX_REPLICA_COUNT_IN_GRROUP` is only used in:
   > > > ```c++
   > > > void config_context::check_size()
   > > > {
   > > >     // when add learner, it is possible that replica_count > max_replica_count, so we
   > > >     // need to remove things from dropped only when it's not empty.
   > > >     while (replica_count(*config_owner) + dropped.size() > MAX_REPLICA_COUNT_IN_GRROUP &&
   > > >            !dropped.empty()) {
   > > >         dropped.erase(dropped.begin());
   > > >         prefered_dropped = (int)dropped.size() - 1;
   > > >     }
   > > > }
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > However, `MAX_REPLICA_COUNT_IN_GRROUP` can be replaced with partition-level `max_replica_count` as below:
   > > > ```c++
   > > > void config_context::check_size()
   > > > {
   > > >     // when add learner, it is possible that replica_count > max_replica_count, so we
   > > >     // need to remove things from dropped only when it's not empty.
   > > >     while (replica_count(*config_owner) + dropped.size() > config_owner->max_replica_count &&
   > > >            !dropped.empty()) {
   > > >         dropped.erase(dropped.begin());
   > > >         prefered_dropped = (int)dropped.size() - 1;
   > > >     }
   > > > }
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > The `partition_configuration` pointed by `config_owner` is updated in `server_state::on_update_configuration_on_remote_reply`, which will be called by `set_replica_count`.
   > > > 
   > > > 2. Good idea ! Each `config_sync_interval_ms` (typically 30 ms) the replica server requests `query_configuration_by_node` to the meta server; Received the response, if the ballot of the response is newer than the local, the replica server will update every table-level and partition-level `max_replica_count` to the one of the response. Is it right ?
   > > > 3. Actually decreasing replica count has been forbidden till now by us. Decreasing is more error-prone than increasing. If we decide to support decreasing replica count, I'll test sufficiently.
   > > 
   > > 
   > > Thanks for your reply~
   > > 
   > > 1. If you have already checked that table level replica_count can replace cluster `max_replicas_in_group`, just go ahead~
   > > 2. Yes, I recommend update `.app_info` file through `on_config_sync` rpc
   > > 3. Okay. How about update this issue title from `update the replication factor` into `increase the replication factor`? As the decreasing is not supported.
   > 
   > Ok, we can first support increasing the replication factor. Decreasing can be supported later if needed.
   
   Now our implementation has supported to decrease the replication factor. 
   
   It is noticeable that the redundant replica data will be garbage-collected only when meta level is lively, which means `set_meta_level lively` should be executed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org