You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/03 02:47:00 UTC

[jira] [Resolved] (KUDU-2410) Add auto-repair function to ksck to repair "stuck tablet" situations common on older versions

     [ https://issues.apache.org/jira/browse/KUDU-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke resolved KUDU-2410.
-------------------------------
    Fix Version/s: NA
       Resolution: Won't Fix

> Add auto-repair function to ksck to repair "stuck tablet" situations common on older versions
> ---------------------------------------------------------------------------------------------
>
>                 Key: KUDU-2410
>                 URL: https://issues.apache.org/jira/browse/KUDU-2410
>             Project: Kudu
>          Issue Type: Improvement
>          Components: supportability
>    Affects Versions: 1.7.0
>            Reporter: William Berkeley
>            Assignee: William Berkeley
>            Priority: Major
>             Fix For: NA
>
>
> There's two common situations where tablets get stuck and can't recover automatically, characterized by the following ksck outputs:
> Tombstone + eviction in-flight:
> {noformat}
> Tablet 796d3d67d6e0429fb5f91c2c7bbd486d of table 'loadgen_auto_802e774c09d74a208330db4c108a7d30' is under-replicated: 1 replica(s) not RUNNING
>   16204380dc404171bebd99af2504cb14 (wdb-k015-2:7050): RUNNING
>   61dec96f5aed4cd2a47814de42d721e6 (wdb-k015-3:7050): RUNNING [LEADER]
>   d1689e073948415a901c64a9e9269416 (wdb-k015-1:7050): bad state
>     State:       NOT_STARTED
>     Data state:  TABLET_DATA_TOMBSTONED
>     Last status: Tablet initializing...
> 2 replicas' active configs differ from the master's.
>   All the peers reported by the master and tablet servers are:
>   A = 16204380dc404171bebd99af2504cb14
>   B = 61dec96f5aed4cd2a47814de42d721e6
>   C = d1689e073948415a901c64a9e9269416
> The consensus matrix is:
>  Config source |         Voters         | Current term | Config index | Committed?
> ---------------+------------------------+--------------+--------------+------------
>  master        | A   B*  C              |              |              | Yes
>  A             | A   B   C              | 2            | 305          | Yes
>  B             |     B   C              | 2            | 307          | No
>  C             | [config not available] |              |              |
> {noformat}
> Permanently failed + eviction in-flight:
> {noformat}
> Tablet 796d3d67d6e0429fb5f91c2c7bbd486d of table 'loadgen_auto_802e774c09d74a208330db4c108a7d30' is under-replicated: 1 replica(s) not RUNNING
>   16204380dc404171bebd99af2504cb14 (wdb-k015-2:7050): RUNNING
>   61dec96f5aed4cd2a47814de42d721e6 (wdb-k015-3:7050): RUNNING [LEADER]
>   d1689e073948415a901c64a9e9269416 (wdb-k015-1:7050): missing
> 2 replicas' active configs differ from the master's.
>   All the peers reported by the master and tablet servers are:
>   A = 16204380dc404171bebd99af2504cb14
>   B = 61dec96f5aed4cd2a47814de42d721e6
>   C = d1689e073948415a901c64a9e9269416
> The consensus matrix is:
>  Config source |         Voters         | Current term | Config index | Committed?
> ---------------+------------------------+--------------+--------------+------------
>  master        | A   B*  C              |              |              | Yes
>  A             | A   B   C              | 2            | 305          | Yes
>  B             |     B   C              | 2            | 307          | No
>  C             | [config not available] |              |              |
> {noformat}
> The former case is resolved by tombstoned voting (KUDU-871), while the latter is made much, much less likely by 3-4-3 replication (KUDU-1097).
> However, tablets still get stuck on older versions, and it shouldn't be too hard to enhance ksck to detect and automatically fix these two situations by tablet copying B -> C and aborting the config change on B, respectively.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)