You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2023/02/06 02:42:00 UTC

[jira] [Commented] (HBASE-27614) Region Reopen failure when the openNum has issue

    [ https://issues.apache.org/jira/browse/HBASE-27614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684390#comment-17684390 ] 

Duo Zhang commented on HBASE-27614:
-----------------------------------

The seqNum should not go backward. If changing TTL can cause seqNum to go back then we have a critital issue here, not only for reopening the region...

> Region Reopen failure when the openNum has issue
> ------------------------------------------------
>
>                 Key: HBASE-27614
>                 URL: https://issues.apache.org/jira/browse/HBASE-27614
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Dong0829
>            Assignee: Dong0829
>            Priority: Major
>
> We faced the issue when change the TTL for the hbase table and a lot of regions keep reopen and tons of TRSP created, after troubleshooting, we found some logic issue for the region reopen procedure logic.
> In the reopen process, it will check the seqNum to confirm if the region reopened successfully or not. If the seqNum accident become bigger than the current HFile and WAL (because of the data loss), there will be issue and unnecessary loop for the region close/open
>  
> We should be able to optimize the logic, more details
> For this regionOpenedWithoutPersistingToMeta, should we just update the OpenSeqNum when the new one is bigger than the old one?
> As the region already opened, we should update the OpenSeqNum no matter its bigger or smaller, otherwise, we should not just return WARN but failed the open, right?
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]
>  
> Above does matter because for the checkReopened([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L312]), if the seq is smaller, the region will be returned and keep reopening.  So we should either update the logic in regionOpenedWithoutPersistingToMeta or checkReopened to make sure the region reopen works properly if the seqNum has issue
>  
>  
> Reproduce steps:
>  
> 1. {{{}Create a test table and put some data, for example:{}}}{{{}test{}}}
> {{create 'test', 'info'}}
> {{put 'test', 'fool', 'info:cat', 'test'}}
> {{2. Manually update one region row for this test table in hbase:meta on the column, for example:}}
> {{put 'hbase:meta', 'test,,1673406566311.3eb4d3e0258bd06f4639a595920c7673.', 'info:seqnumDuringOpen', "\x00\x00\x00\x00\x00\x10\x00\x05"}}
> 3. Modify the table TTL :
> alter 'test', \{NAME=>'info' , TTL => '63244800'}}}
>  
> You will see the region keep reopening



--
This message was sent by Atlassian Jira
(v8.20.10#820010)