You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2016/02/26 13:51:19 UTC

[jira] [Updated] (KUDU-1188) For snapshot read correctness, enforce simple form of leader leases

     [ https://issues.apache.org/jira/browse/KUDU-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Percy updated KUDU-1188:
-----------------------------
    Parent: KUDU-430

> For snapshot read correctness, enforce simple form of leader leases
> -------------------------------------------------------------------
>
>                 Key: KUDU-1188
>                 URL: https://issues.apache.org/jira/browse/KUDU-1188
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: tserver
>    Affects Versions: Public beta
>            Reporter: David Alves
>
> Since raft doesn't allow holes in the log, a new leader is guaranteed to have all the writes that preceded its election and to have them in flight when elected (meaning mvcc will have those transactions in flight, meaning a snapshot read will wait for them to complete). So, for writes, leases aren't really necessary. This is contrary to paxos in spanner where there is no timestamp propagation and the log might have holes and leases are required to enforce write correctness.
> However some form of lease is necessary to enforce read consistency. In particular in the following case:
> Leader A, accepts a write at time 10 which commits and has no following writes, it then serves a snapshot read at 15, and crashed.
> Leader B is elected but has a slow clock which reads 11 when he's ready to serve writes. It then accepts a write at time 13.
> The snapshot read at 15 is now broken.
> A simple form to avoid this is to have each replica promise, on each ack, that if ever elected leader it won't accept writes or serve snapshot read until a certain period, say 2 secs has passed since that ack. On the leader side, the leader is only allowed to serve snapshot read up to 2 seconds since _a majority_ of replicas has ack'd. which in practice means 1 replica usually.
> With such a mechanism in place, if the lease is 5, then leader B wouldn't accept the write at time 13 and would instead wait until 15 had passed, not breaking the snapshot read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)