You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "Xu Shao Hong (Jira)" <ji...@apache.org> on 2022/04/29 09:50:00 UTC

[jira] [Assigned] (HDDS-6510) Incremental Checkpointing Support

     [ https://issues.apache.org/jira/browse/HDDS-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xu Shao Hong reassigned HDDS-6510:
----------------------------------

    Assignee: Xu Shao Hong

> Incremental Checkpointing Support
> ---------------------------------
>
>                 Key: HDDS-6510
>                 URL: https://issues.apache.org/jira/browse/HDDS-6510
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Xu Shao Hong
>            Assignee: Xu Shao Hong
>            Priority: Major
>         Attachments: 2022-03-15 7.58.44.png
>
>
> Currently, each time to install a snapshot for OM and SCM is to get a checkpoint of RDB and send it to the follower. As the data stored in RDB increases, the very long transmission time of the whole checkpoint could be a large cost, which could cause the follower to install the snapshot repeatedly if it finds out the leader has already truncated the new raft logs and needs to install a new snapshot.
> Given an example in the test(OM), the raft log index is 570767469, it takes around 13 minutes for the follower to install the snapshot. As ozone is designed to overcome the shortage of in-memory metadata, it should have the ability to preserve much more data than a hundred million level.  Once the OM has reached that level, each time to install snapshot would be a big problem. There will be only two raft peers working (if we set up 3-node HA) and that condition is fragile.
> Another statics: For 16 hundred million keys, the size of om.db directory is 45GB. Around 2.8 hundred million keys/GB. This is tested through createKey api.
> To solve the problem, we should have Incremental Checkpointing. This could provide another slight increment instead of the whole RDB checkpoint and thus reduce the time of transmission. I recommend referring to the implementation in FLINK, but we need to store the diff of checkpoints locally instead of another storage system.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org