You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Mike Percy (Jira)" <ji...@apache.org> on 2020/06/19 01:23:00 UTC

[jira] [Updated] (KUDU-639) Leader doesn't overwrite demoted follower's log properly

     [ https://issues.apache.org/jira/browse/KUDU-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Percy updated KUDU-639:
----------------------------
    Fix Version/s: M5

This was fixed in 2015. Please file a separate Jira to track the task if it seems likely someone will add a test for this

> Leader doesn't overwrite demoted follower's log properly
> --------------------------------------------------------
>
>                 Key: KUDU-639
>                 URL: https://issues.apache.org/jira/browse/KUDU-639
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: M4.5
>            Reporter: David Alves
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: M5
>
>
> We just ran into this situation in the YCSB cluster, which is apparently a log divergence.
> We have nodes a, b, c (corresponding to nodes 33c8fb1dc4434df0938ccc27ecfd58a1/a1219, 4ed2e09f80e04d198edeb53e15b3539e/a1220, ab8ed89f9041495a95b8d2b77591c9d7/a1215).
> Node a is leader for term 3, timesout
> Node b is elected leader for term 5 with votes from b, c
> When b is elected leader the log state is:
> State: All replicated op: 3.6546, Majority replicated op: 3.6533, Committed index: 3.6533, Last appended: 3.6546, Current term: 5
> b never actually replicates anything and eventually loses leadership to node a, again.
> When b loses leadership it's wall is at the following state:
> State: All replicated op: 0.0, Majority replicated op: 3.6533, Committed index: 3.6533, Last appended: 5.6547, Current term: 5
> That is b appended a message in term 5 but never actually got to commit it.
> However, if we look at b's log we find a message in term 5 committed:
> 3.6546@99404	REPLICATE WRITE_OP
> COMMIT 3.6533
> 5.6547@99789	REPLICATE CHANGE_CONFIG_OP
> COMMIT 3.6535
> COMMIT 3.6536
> COMMIT 3.6537
> COMMIT 3.6538
> COMMIT 3.6534
> COMMIT 3.6541
> COMMIT 3.6540
> COMMIT 3.6543
> COMMIT 3.6542
> COMMIT 3.6545
> COMMIT 3.6546
> COMMIT 3.6544
> COMMIT 3.6539
> COMMIT 5.6547
> 3.6548@99430	REPLICATE WRITE_OP
> 6.6549@99795	REPLICATE CHANGE_CONFIG_OP
> And more problematically, that diverges from the other two nodes's logs:
> 3.6546@99404	REPLICATE WRITE_OP
> COMMIT 3.6533
> COMMIT 3.6536
> COMMIT 3.6537
> COMMIT 3.6535
> COMMIT 3.6539
> COMMIT 3.6538
> COMMIT 3.6534
> COMMIT 3.6541
> COMMIT 3.6540
> COMMIT 3.6543
> COMMIT 3.6542
> COMMIT 3.6544
> 3.6547@99429	REPLICATE WRITE_OP
> 3.6548@99430	REPLICATE WRITE_OP
> 6.6549@99795	REPLICATE CHANGE_CONFIG_OP
> 6.6550@99878	REPLICATE WRITE_OP
> 6.6551@99879	REPLICATE WRITE_OP
> 6.6552@99880	REPLICATE WRITE_OP
> COMMIT 3.6545
> COMMIT 3.6548
> COMMIT 3.6547
> COMMIT 3.6546
> COMMIT 6.6549



--
This message was sent by Atlassian Jira
(v8.3.4#803005)