You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2020/04/02 00:58:00 UTC
[jira] [Assigned] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state
for a long time
[ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin reassigned KUDU-3082:
-----------------------------------
Assignee: Alexey Serbin
> tablets in "CONSENSUS_MISMATCH" state for a long time
> -----------------------------------------------------
>
> Key: KUDU-3082
> URL: https://issues.apache.org/jira/browse/KUDU-3082
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 1.10.1
> Reporter: YifanZhang
> Assignee: Alexey Serbin
> Priority: Major
> Attachments: master_leader.log, ts25.info.gz, ts26.log.gz
>
>
> Lately we found a few tablets in one of our clusters are unhealthy, the ksck output is like:
>
> {code:java}
> Tablet Summary
> Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 replicas' active configs disagree with the leader master's
> 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING
> d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING
> 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> All reported replicas are:
> A = 7380d797d2ea49e88d71091802fb1c81
> B = d1952499f94a4e6087bee28466fcb09f
> C = 47af52df1adc47e1903eb097e9c88f2e
> D = 08beca5ed4d04003b6979bf8bac378d2
> The consensus matrix is:
> Config source | Replicas | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
> master | A B C* | | | Yes
> A | A B C* | 5 | -1 | Yes
> B | A B C | 5 | -1 | Yes
> C | A B C* D~ | 5 | 54649 | No
> Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 replicas' active configs disagree with the leader master's
> d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING
> 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING
> All reported replicas are:
> A = d1952499f94a4e6087bee28466fcb09f
> B = 47af52df1adc47e1903eb097e9c88f2e
> C = 5a8aeadabdd140c29a09dabcae919b31
> D = 14632cdbb0d04279bc772f64e06389f9
> The consensus matrix is:
> Config source | Replicas | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
> master | A B* C | | | Yes
> A | A B* C | 5 | 5 | Yes
> B | A B* C D~ | 5 | 96176 | No
> C | A B* C | 5 | 5 | Yes
> Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 replicas' active configs disagree with the leader master's
> a9eaff3cf1ed483aae849549999d649a (kudu-ts23): RUNNING
> f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING
> 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> All reported replicas are:
> A = a9eaff3cf1ed483aae849549999d649a
> B = f75df4a6b5ce404884313af5f906b392
> C = 47af52df1adc47e1903eb097e9c88f2e
> D = d1952499f94a4e6087bee28466fcb09f
> The consensus matrix is:
> Config source | Replicas | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
> master | A B C* | | | Yes
> A | A B C* | 1 | -1 | Yes
> B | A B C* | 1 | -1 | Yes
> C | A B C* D~ | 1 | 2 | No
> Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 replicas' active configs disagree with the leader master's
> 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING
> f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING
> All reported replicas are:
> A = 47af52df1adc47e1903eb097e9c88f2e
> B = f0f7b2f4b9d344e6929105f48365f38e
> C = f75df4a6b5ce404884313af5f906b392
> D = d1952499f94a4e6087bee28466fcb09f
> The consensus matrix is:
> Config source | Replicas | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
> master | A* B C | | | Yes
> A | A* B C D~ | 1 | 1991 | No
> B | A* B C | 1 | 4 | Yes
> C | A* B C | 1 | 4 | Yes{code}
> These tablets couldn't recover for a couple of days until we restart kudu-ts27.
> I found so many duplicated logs in kudu-ts27 are like:
> {code:java}
> I0314 04:38:41.511279 65731 raft_consensus.cc:937] T 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 LEADER]: attempt to promote peer 08beca5ed4d04003b6979bf8bac378d2: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing.
> I0314 04:38:41.751009 65453 raft_consensus.cc:937] T 6d9d3fb034314fa7bee9cfbf602bcdc8 P 47af52df1adc47e1903eb097e9c88f2e [term 5 LEADER]: attempt to promote peer 14632cdbb0d04279bc772f64e06389f9: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing.
> {code}
> There seems to be some RaftConfig change operations that somehow cannot complete.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)