You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/11/16 03:39:00 UTC

[jira] [Commented] (HBASE-21480) Taking snapshot when RS crashes may hang

    [ https://issues.apache.org/jira/browse/HBASE-21480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688985#comment-16688985 ] 

Duo Zhang commented on HBASE-21480:
-----------------------------------

Changed the title, let's focus on the most critical problem first. The UT will hang there, because we hold the exclusive lock for the table all the time when taking a snapshot, and it prevents the SCP to bring the regions online, which causes a dead lock.

I think we can first acquire the exclusive lock, to let the previous merge/split procedures to finish, and then change to acquire a shared lock, to prevent other operations on the table, such as ModifyTableProcedure or DisableTableProcedure. And for merge/split procedures, we first check if there is a snapshot operation on going, if so, we give up and rollback.

> Taking snapshot when RS crashes may hang
> ----------------------------------------
>
>                 Key: HBASE-21480
>                 URL: https://issues.apache.org/jira/browse/HBASE-21480
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0
>
>         Attachments: HBASE-21480-UT.patch
>
>
> The current implementation is not good enough. It will take the exclusive lock all the time which could hurt the availability, as we need to hold the shared lock when assigning regions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)