You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Vladimir Rodionov (JIRA)" <ji...@apache.org> on 2018/04/03 23:24:00 UTC

[jira] [Resolved] (HBASE-15227) HBase Backup Phase 3: Fault tolerance (client/server) support

     [ https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Rodionov resolved HBASE-15227.
---------------------------------------
    Resolution: Fixed

Done.

> HBase Backup Phase 3: Fault tolerance (client/server) support
> -------------------------------------------------------------
>
>                 Key: HBASE-15227
>                 URL: https://issues.apache.org/jira/browse/HBASE-15227
>             Project: HBase
>          Issue Type: Task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Major
>              Labels: backup
>         Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults: 
> # Backup operations MUST be atomic (no partial completion state in the backup system table)
> # Process must detect any type of failures which can result in a data loss (partial backup or partial restore) 
> # Proper system table state restore and cleanup must be done in case of a failure
> # Additional utility to repair backup system table and corresponding file system cleanup must be implemented
> h3. Backup
> h4. General FT framework implementation 
> Before actual backup operation starts, snapshot of a backup system table is taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will be removed upon backup completion. 
> In case of *any* server-side failures, client catches errors/exceptions and handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup system table before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that backup repair tool (see below) must be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup system table before)
> h4. Detection of a partial loss of data
> h5. Full backup  
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup 
> Conversion of WAL to HFiles, when WAL file is moved from active to archive directory. The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No special FT is required.   
>  
>      



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)