You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Mike Percy (Jira)" <ji...@apache.org> on 2020/06/19 01:23:00 UTC
[jira] [Updated] (KUDU-639) Leader doesn't overwrite demoted
follower's log properly
[ https://issues.apache.org/jira/browse/KUDU-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Percy updated KUDU-639:
----------------------------
Fix Version/s: M5
This was fixed in 2015. Please file a separate Jira to track the task if it seems likely someone will add a test for this
> Leader doesn't overwrite demoted follower's log properly
> --------------------------------------------------------
>
> Key: KUDU-639
> URL: https://issues.apache.org/jira/browse/KUDU-639
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: M4.5
> Reporter: David Alves
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: M5
>
>
> We just ran into this situation in the YCSB cluster, which is apparently a log divergence.
> We have nodes a, b, c (corresponding to nodes 33c8fb1dc4434df0938ccc27ecfd58a1/a1219, 4ed2e09f80e04d198edeb53e15b3539e/a1220, ab8ed89f9041495a95b8d2b77591c9d7/a1215).
> Node a is leader for term 3, timesout
> Node b is elected leader for term 5 with votes from b, c
> When b is elected leader the log state is:
> State: All replicated op: 3.6546, Majority replicated op: 3.6533, Committed index: 3.6533, Last appended: 3.6546, Current term: 5
> b never actually replicates anything and eventually loses leadership to node a, again.
> When b loses leadership it's wall is at the following state:
> State: All replicated op: 0.0, Majority replicated op: 3.6533, Committed index: 3.6533, Last appended: 5.6547, Current term: 5
> That is b appended a message in term 5 but never actually got to commit it.
> However, if we look at b's log we find a message in term 5 committed:
> 3.6546@99404 REPLICATE WRITE_OP
> COMMIT 3.6533
> 5.6547@99789 REPLICATE CHANGE_CONFIG_OP
> COMMIT 3.6535
> COMMIT 3.6536
> COMMIT 3.6537
> COMMIT 3.6538
> COMMIT 3.6534
> COMMIT 3.6541
> COMMIT 3.6540
> COMMIT 3.6543
> COMMIT 3.6542
> COMMIT 3.6545
> COMMIT 3.6546
> COMMIT 3.6544
> COMMIT 3.6539
> COMMIT 5.6547
> 3.6548@99430 REPLICATE WRITE_OP
> 6.6549@99795 REPLICATE CHANGE_CONFIG_OP
> And more problematically, that diverges from the other two nodes's logs:
> 3.6546@99404 REPLICATE WRITE_OP
> COMMIT 3.6533
> COMMIT 3.6536
> COMMIT 3.6537
> COMMIT 3.6535
> COMMIT 3.6539
> COMMIT 3.6538
> COMMIT 3.6534
> COMMIT 3.6541
> COMMIT 3.6540
> COMMIT 3.6543
> COMMIT 3.6542
> COMMIT 3.6544
> 3.6547@99429 REPLICATE WRITE_OP
> 3.6548@99430 REPLICATE WRITE_OP
> 6.6549@99795 REPLICATE CHANGE_CONFIG_OP
> 6.6550@99878 REPLICATE WRITE_OP
> 6.6551@99879 REPLICATE WRITE_OP
> 6.6552@99880 REPLICATE WRITE_OP
> COMMIT 3.6545
> COMMIT 3.6548
> COMMIT 3.6547
> COMMIT 3.6546
> COMMIT 6.6549
--
This message was sent by Atlassian Jira
(v8.3.4#803005)