You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2019/11/22 20:15:00 UTC
[jira] [Commented] (KUDU-3008) Don't put all replicas into one
location with 2 locations and odd replica factor.
[ https://issues.apache.org/jira/browse/KUDU-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980483#comment-16980483 ]
Alexey Serbin commented on KUDU-3008:
-------------------------------------
Good catch!
Yes, 2 + 1 is definitely better than 3 + 0, even if the location placement policy constraint (no majority of replicas in a single location) doesn't hold in either case:
* 1 + 2 is better than 3 + 0 because if all servers in the first location become unavailable, in the former case the tablet is still available, while in the latter it's not.
* 1 + 2 is better than 0 + 3 because in case of non-recoverable failure of the second location, no replica survives and all data is lost in the latter case, while in the former case there will be 1 replica left and it may be used for manual data recovery.
> Don't put all replicas into one location with 2 locations and odd replica factor.
> ---------------------------------------------------------------------------------
>
> Key: KUDU-3008
> URL: https://issues.apache.org/jira/browse/KUDU-3008
> Project: Kudu
> Issue Type: Improvement
> Reporter: ZhangYao
> Assignee: ZhangYao
> Priority: Minor
>
> Accidentally I found that kudu will put all replicas of a table into one location when we only have 2 locations and the replica factor is odd. Below is the case:
> {{location /DEFAULT/22254 has 3 tservers}}
> {{location /DEFAULT/22255 has 3 tservers}}
> {{Table created: replica factor = 3, tablet = 8.}}
> {{Before I create the table, the ksck tablet summary is:}}
>
> {code:java}
> Tablet Replica Count by Tablet Server
> UUID | Host | Replica Count | Location
> ----------------------------------+------------------------------------+---------------+----------------
> 5f5ddec364834ce59282d37388010f06 | opencomputeoffline.xxxxxx.net:7056 | 10 | /DEFAULT/22255
> 00f24c36d39a49e8b77ff43b3bcbf0c9 | opencomputeoffline.xxxxxx.net:7054 | 10 | /DEFAULT/22255
> d0091ae869704458865b9b079ad2389e | opencomputeoffline.xxxxxx.net:7055 | 9 | /DEFAULT/22255
> 507547dd183c4474855d55f7bdd9d526 | opencomputeoffline.xxxxxx.net:7052 | 7 | /DEFAULT/22254
> c6a2b6e99f0a43308d9e5773b2d8c729 | opencomputeoffline.xxxxxx.net:7053 | 6 | /DEFAULT/22254
> 031808c37385477fb063e50fbc614f44 | opencomputeoffline.xxxxxx.net:7050 | 6 | /DEFAULT/22254 {code}
> {{After I create the table, the ksck tablet summary is:}}
>
> {code:java}
> Tablet Replica Count by Tablet Server
> UUID | Host | Replica Count | Location
> ----------------------------------+------------------------------------+---------------+----------------
> 507547dd183c4474855d55f7bdd9d526 | opencomputeoffline.xxxxxx.net:7052 | 15 | /DEFAULT/22254
> c6a2b6e99f0a43308d9e5773b2d8c729 | opencomputeoffline.xxxxxx.net:7053 | 14 | /DEFAULT/22254
> 031808c37385477fb063e50fbc614f44 | opencomputeoffline.xxxxxx.net:7050 | 14 | /DEFAULT/22254
> 5f5ddec364834ce59282d37388010f06 | opencomputeoffline.xxxxxx.net:7056 | 10 | /DEFAULT/22255
> 00f24c36d39a49e8b77ff43b3bcbf0c9 | opencomputeoffline.xxxxxx.net:7054 | 10 | /DEFAULT/22255
> d0091ae869704458865b9b079ad2389e | opencomputeoffline.xxxxxx.net:7055 | 9 | /DEFAULT/22255 {code}
> I found that /DEFAULT/22255 doesn't have new replica and all replicas are located in /DEFAULT/22254. When look into the code I found that in PlacementPolicy::SelectLocation when location num is 2, we only take care about even replica factor and try to spread replicas evenly in 2 locations. I think we should also consider about the odd replica factor. When there is 2 locations, although there must have one location contains replicas more than half but it better than contains all replicas.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)