You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/03/12 16:01:00 UTC

[jira] [Resolved] (HBASE-26323) Introduce a SnapshotProcedure

     [ https://issues.apache.org/jira/browse/HBASE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang resolved HBASE-26323.
-------------------------------
    Fix Version/s: 2.6.0
                   3.0.0-alpha-3
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to master and branch-2.

Thanks [~frostruan] for contributing!

Please fill the release note as this is a big improvement.

> Introduce a SnapshotProcedure
> -----------------------------
>
>                 Key: HBASE-26323
>                 URL: https://issues.apache.org/jira/browse/HBASE-26323
>             Project: HBase
>          Issue Type: New Feature
>          Components: proc-v2, snapshots
>            Reporter: ruanhui
>            Assignee: ruanhui
>            Priority: Major
>             Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> Currently,snapshot in hbase uses zk as coordinator. It has some limitations, 
>  a. Snapshot maybe fails when there are region server crashes.
>  b. Snapshot maybe failed when master restarts.
>  c. Only one snapshot per table can be taken in a time.
>  d. Snapshot verify will be handled by master, which may take long time when our table has a large number of regions, for example 10000.
>  
> Since we have procedure v2 framework now, it is possible to solve the above problems. So here is a procedure2-based snapshot implementation. It has some goals,
>  a. Snapshot can continue when there are region server crashes.
>  b. Snapshot can continue when master restarts.
>  c. More than one snapshot per table can be taken in a time.
>  d. We can use region servers to verify snapshot to accelerate procedure.
>  
> Here are some details about implementation.
>  *SnapshotProcedure*
>  SnapshotProcedure is used to take snapshot on a table. It acquires shared table lock on the snapshot table and hold the shared lock during suspend and yield. 
>  *SnapshotRegionProcedure*
>  SnapshotRegionProcedure is used to take snapshot on a specific region of the snapshot table. It acquires exclusive region lock and releases lock during suspend and yield. Before dispatch remote snapshot operations to region server, it will check target region in RIT or not. If target region is in RIT, it will sleep some time and retry.
>  *SnapshotVerifyProcedure*
>  SnapshotVerifyProcedure is used to send snapshot verify request to region server. If snapshot is corrupted, it will notify parent snapshot to retry. When remote region server is crashed, it will choose another online server and retry.
>  
> I would be very grateful for any advice and guidance. Is anyone interested in taking a look?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)