You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anoop Sam John (Jira)" <ji...@apache.org> on 2020/05/08 04:02:00 UTC
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries

    [ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102230#comment-17102230 ] 

Anoop Sam John commented on HBASE-24189:
----------------------------------------

Another possible solution way would be this
When we open a region, we will be creating the recovered.edits directory under that.  So for the WALSplitter to write the edits file under the region, there is ideally no need to create the dirs.  At least it dont need to create the region dir.  But in code what we do is if the  region/recovered.edits dir is not there we will create it using mkdirs.  So even if region dir is not there, we will end up creating that.  we can avoid doing this mkdirs.  And just do INFO log and skip all edits for that region.  Sounds like a less risky and simple thing (?)

> Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24189
>                 URL: https://issues.apache.org/jira/browse/HBASE-24189
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>    Affects Versions: 2.2.4
>         Environment: * HDFS 3.1.3
>  * HBase 2.1.4
>  * OpenJDK 8
>            Reporter: Andrey Elenskiy
>            Assignee: Anoop Sam John
>            Priority: Major
>
> Under the following scenario region directories in HDFS can be recreated with only recovered.edits in them:
>  # Create table "test"
>  # Put into "test"
>  # Delete table "test"
>  # Create table "test" again
>  # Crash the regionserver to which the put has went to force the WAL replay
>  # Region directory in old table is recreated in new table
>  # hbase hbck returns inconsistency
> This appears to happen due to the fact that WALs are not cleaned up once a table is deleted and they still contain the edits from old table. I've tried wal_roll command on the regionserver before crashing it, but it doesn't seem to help as under some circumstances there are still WAL files around. The only solution that works consistently is to restart regionserver before creating the table at step 4 because that triggers log cleanup on startup: [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)]
>  
> Truncating a table also would be a workaround by in our case it's a no-go as we create and delete tables in our tests which run back to back (create table in the beginning of the test and delete in the end of the test).
> A nice option in our case would be to provide hbase shell utility to force clean up of log files manually as I realize that it's not really viable to clean all of those up every time some table is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)