You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/02 13:00:00 UTC
[jira] [Resolved] (KUDU-1702) Document/Implement read-your-writes for Impala/Spark etc.

     [ https://issues.apache.org/jira/browse/KUDU-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke resolved KUDU-1702.
-------------------------------
    Fix Version/s: NA
       Resolution: Fixed

Resolving because the Spark work is done and IMPALA-7184 tracks the Impala side work. 

> Document/Implement read-your-writes for Impala/Spark etc.
> ---------------------------------------------------------
>
>                 Key: KUDU-1702
>                 URL: https://issues.apache.org/jira/browse/KUDU-1702
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: client, tablet, tserver
>    Affects Versions: 1.1.0
>            Reporter: David Alves
>            Assignee: David Alves
>            Priority: Major
>             Fix For: NA
>
>
> Engines like Impala/Spark use many independent client instances, so we should provide a way to have read-your-writes across many independent client instances, which translates to providing a way to get linearizable behavior. 
> At first this can be done using the APIs that are already available. For instance if the objective is to be sure to have the results of a write in a a following scan, the following steps can be taken:
> - After a write the engine should collect the last observed timestamps from kudu clients
> - The engine's coordinator then takes the max of those timestamps, adds 1 and uses that as a snapshot scan timestamp.
> One important pre-requisite of the behavior above is that scans be done in READ_AT_SNAPSHOT mode. Also the steps above currently don't actually guarantee the expected behavior, but should as the currently anomalies are taken care of (as part of KUDU-430).
> In the immediate future we'll add APIs to the Kudu client so as to make the inner workings of getting this behavior oblivious to the engine. The steps will still be the same, i.e. timestamps or timestamp tokens will still be passed around, but the kudu client will encapsulate the choice of the timestamp for the scan.
> Later we will add a way to obtain this behavior without timestamp propagation, either by doing a write-side commit-wait, where clients wait out the clock error after/during the last write thus making sure any future operation will have a higher timestamp; or by making read-side commit wait, where we provide an api on the kudu client for the engine to perform a similar call before the scan call to obtain a scan timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)