You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Bankim Bhavsar (Jira)" <ji...@apache.org> on 2019/10/31 17:24:00 UTC

[jira] [Comment Edited] (KUDU-2963) Catalog manager never gives up on CreateTablet RPCs

    [ https://issues.apache.org/jira/browse/KUDU-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964225#comment-16964225 ] 

Bankim Bhavsar edited comment on KUDU-2963 at 10/31/19 5:23 PM:
----------------------------------------------------------------

Resolving this JIRA as "Not a Bug". See the explanation above.
 Commit [698a7f94f12913b27947a4855a1b82a2d74823e4|https://github.com/apache/kudu/commit/698a7f94f12913b27947a4855a1b82a2d74823e4] added a unit test case that fixes the test CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation and also verifies that CreateTablet RPCs are not retried indefinitely.


was (Author: bankim):
Resolving this JIRA as "Not a Bug". See the explanation above.
 Commit [698a7f94f12913b27947a4855a1b82a2d74823e4|[https://github.com/apache/kudu/commit/698a7f94f12913b27947a4855a1b82a2d74823e4]] added a unit test case that fixes the test CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation and also verifies that CreateTablet RPCs are not retried indefinitely.

> Catalog manager never gives up on CreateTablet RPCs
> ---------------------------------------------------
>
>                 Key: KUDU-2963
>                 URL: https://issues.apache.org/jira/browse/KUDU-2963
>             Project: Kudu
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Assignee: Bankim Bhavsar
>            Priority: Major
>              Labels: newbie
>             Fix For: NA
>
>
> This is a problem when there aren't enough live tservers upon which to place a tablet's replicas, or when a chosen tserver doesn't create the replica quickly enough. If the catalog manager decides to replace the tablet, the replaced tablet's CreateTablet RPCs continue to retry ad infinitum. If the previously dead tservers then come back to life, they must needlessly process the CreateTablet RPCs.
> The tablets are eventually deleted, either through explicit DeleteTablet RPCs (triggered by the catalog manager replacement process), or by heartbeating, but it's an unnecessary drain on cluster resources.
> We should probably abort CreateTablet RPCs for tablets that have been removed from their table.
> CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation demonstrates this acutely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)