You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2023/04/18 04:00:03 UTC

[jira] [Commented] (KUDU-3466) I hope kudu can support snapshot management like iceberg, can create, query, expiration snapshots, support query data changes between two snapshots.

    [ https://issues.apache.org/jira/browse/KUDU-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713374#comment-17713374 ] 

Alexey Serbin commented on KUDU-3466:
-------------------------------------

[~c8679724@163.com],

Kudu has had ability to scan data at a snapshot from the very beginning since it uses MVCC.  You could check the following client API:
* C++: https://kudu.apache.org/cpp-client-api/classkudu_1_1client_1_1KuduScanner.html#a5b36a405daf09399438d5501b25b9f9f
* Java: https://kudu.apache.org/apidocs/org/apache/kudu/client/AbstractKuduScannerBuilder.html#snapshotTimestampMicros-long-

Would be that enough for your use case?

As for difference between two snapshots, that functionality seems to be present since backup/restore was implemented, but it's not exposed via client API.  That's used for incremental backup/restore in Kudu.  Is it something that might be useful in your use case?  If so, what sort of client API you'd like to have exposed for that purpose?

Thanks!

> I hope kudu can support snapshot management like iceberg, can create, query, expiration snapshots, support query data changes between two snapshots. 
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3466
>                 URL: https://issues.apache.org/jira/browse/KUDU-3466
>             Project: Kudu
>          Issue Type: New Feature
>          Components: api, cfile
>            Reporter: sky
>            Priority: Major
>
> I hope kudu can support snapshot management like iceberg, can create, query, expiration snapshots, support query data changes between two snapshots. This is very necessary, when we write data to kudu in real time, often expect the data to be like the past, query the data before a fixed time, or query the changes between two snapshots in order to facilitate real-time calculation or incremental calculation, in the case of data lake is becoming more and more of a trend, Real-time computation/incremental computation and snapshot management are very much needed new features of kudu.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)