You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/07/01 20:13:00 UTC
[jira] [Closed] (HBASE-15227) HBase Backup Phase 3: Fault tolerance (client/server) support
[ https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Kyle Purtell closed HBASE-15227.
---------------------------------------
> HBase Backup Phase 3: Fault tolerance (client/server) support
> -------------------------------------------------------------
>
> Key: HBASE-15227
> URL: https://issues.apache.org/jira/browse/HBASE-15227
> Project: HBase
> Issue Type: Task
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Priority: Major
> Labels: backup
> Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults:
> # Backup operations MUST be atomic (no partial completion state in the backup system table)
> # Process must detect any type of failures which can result in a data loss (partial backup or partial restore)
> # Proper system table state restore and cleanup must be done in case of a failure
> # Additional utility to repair backup system table and corresponding file system cleanup must be implemented
> h3. Backup
> h4. General FT framework implementation
> Before actual backup operation starts, snapshot of a backup system table is taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will be removed upon backup completion.
> In case of *any* server-side failures, client catches errors/exceptions and handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes any active snapshots of a tables being backed up (during full backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup system table before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that backup repair tool (see below) must be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes any active snapshots of a tables being backed up (during full backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup system table before)
> h4. Detection of a partial loss of data
> h5. Full backup
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup
> Conversion of WAL to HFiles, when WAL file is moved from active to archive directory. The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No special FT is required.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)