You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2020/04/02 00:58:00 UTC

[jira] [Assigned] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time

     [ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Serbin reassigned KUDU-3082:
-----------------------------------

    Assignee: Alexey Serbin

> tablets in "CONSENSUS_MISMATCH" state for a long time
> -----------------------------------------------------
>
>                 Key: KUDU-3082
>                 URL: https://issues.apache.org/jira/browse/KUDU-3082
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.10.1
>            Reporter: YifanZhang
>            Assignee: Alexey Serbin
>            Priority: Major
>         Attachments: master_leader.log, ts25.info.gz, ts26.log.gz
>
>
> Lately we found a few tablets in one of our clusters are unhealthy, the ksck output is like:
>  
> {code:java}
> Tablet Summary
> Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 replicas' active configs disagree with the leader master's
>   7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING
>   d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> All reported replicas are:
>   A = 7380d797d2ea49e88d71091802fb1c81
>   B = d1952499f94a4e6087bee28466fcb09f
>   C = 47af52df1adc47e1903eb097e9c88f2e
>   D = 08beca5ed4d04003b6979bf8bac378d2
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A   B   C*       |              |              | Yes
>  A             | A   B   C*       | 5            | -1           | Yes
>  B             | A   B   C        | 5            | -1           | Yes
>  C             | A   B   C*  D~   | 5            | 54649        | No
> Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 replicas' active configs disagree with the leader master's
>   d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
>   5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING
> All reported replicas are:
>   A = d1952499f94a4e6087bee28466fcb09f
>   B = 47af52df1adc47e1903eb097e9c88f2e
>   C = 5a8aeadabdd140c29a09dabcae919b31
>   D = 14632cdbb0d04279bc772f64e06389f9
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A   B*  C        |              |              | Yes
>  A             | A   B*  C        | 5            | 5            | Yes
>  B             | A   B*  C   D~   | 5            | 96176        | No
>  C             | A   B*  C        | 5            | 5            | Yes
> Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 replicas' active configs disagree with the leader master's
>   a9eaff3cf1ed483aae849549999d649a (kudu-ts23): RUNNING
>   f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> All reported replicas are:
>   A = a9eaff3cf1ed483aae849549999d649a
>   B = f75df4a6b5ce404884313af5f906b392
>   C = 47af52df1adc47e1903eb097e9c88f2e
>   D = d1952499f94a4e6087bee28466fcb09f
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A   B   C*       |              |              | Yes
>  A             | A   B   C*       | 1            | -1           | Yes
>  B             | A   B   C*       | 1            | -1           | Yes
>  C             | A   B   C*  D~   | 1            | 2            | No
> Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 replicas' active configs disagree with the leader master's
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
>   f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING
>   f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING
> All reported replicas are:
>   A = 47af52df1adc47e1903eb097e9c88f2e
>   B = f0f7b2f4b9d344e6929105f48365f38e
>   C = f75df4a6b5ce404884313af5f906b392
>   D = d1952499f94a4e6087bee28466fcb09f
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A*  B   C        |              |              | Yes
>  A             | A*  B   C   D~   | 1            | 1991         | No
>  B             | A*  B   C        | 1            | 4            | Yes
>  C             | A*  B   C        | 1            | 4            | Yes{code}
> These tablets couldn't recover for a couple of days until we restart kudu-ts27.
> I found so many duplicated logs in kudu-ts27 are like:
> {code:java}
> I0314 04:38:41.511279 65731 raft_consensus.cc:937] T 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 LEADER]: attempt to promote peer 08beca5ed4d04003b6979bf8bac378d2: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing.
> I0314 04:38:41.751009 65453 raft_consensus.cc:937] T 6d9d3fb034314fa7bee9cfbf602bcdc8 P 47af52df1adc47e1903eb097e9c88f2e [term 5 LEADER]: attempt to promote peer 14632cdbb0d04279bc772f64e06389f9: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing.
> {code}
> There seems to be some RaftConfig change operations that somehow cannot complete.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)