You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Song Ziyang (Jira)" <ji...@apache.org> on 2022/07/02 15:08:00 UTC

[jira] [Closed] (RATIS-1597) Delay snapshot MD5 computing to InstallSnapshot stream process

     [ https://issues.apache.org/jira/browse/RATIS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Song Ziyang closed RATIS-1597.
------------------------------

> Delay snapshot MD5 computing to InstallSnapshot stream process
> --------------------------------------------------------------
>
>                 Key: RATIS-1597
>                 URL: https://issues.apache.org/jira/browse/RATIS-1597
>             Project: Ratis
>          Issue Type: Improvement
>          Components: performance
>            Reporter: Song Ziyang
>            Assignee: Song Ziyang
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: 661_review.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Leader’s LogAppender while-loop checks latest snapshot info to decide wether to send a snapshot to a follower. The SnapshotInfo includes every snapshot file with its MD5 digest. Therefore, StateMachine is required to compute MD5 each time it takes a snapshot. 
>  
> However, for database workload, snapshot files may contain GBs of data, which makes MD5 computing a very consuming task. Since MD5 is only used when leader InstallSnapshot to a follower, it is better to compute MD5 along with InstallSnapshot stream process.
>  
> Currently, InstallSnapshot stream process will break snapshot file into fixed-size chunks and send them to followers one by one. Is it possible to calculate MD5 when reading each chunk? This implementation can avoid precomputing MD5 and minimize the IO cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)