You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Allan Yang (JIRA)" <ji...@apache.org> on 2019/01/21 14:08:00 UTC

[jira] [Created] (HBASE-21751) WAL create fails during region open may cause region assign forever fail

Allan Yang created HBASE-21751:
----------------------------------

             Summary: WAL create fails during region open may cause region assign forever fail
                 Key: HBASE-21751
                 URL: https://issues.apache.org/jira/browse/HBASE-21751
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.0.4, 2.1.2
            Reporter: Allan Yang
            Assignee: Allan Yang
             Fix For: 2.2.0, 2.1.3, 2.0.5


During the first region opens on the RS, WALFactory will create a WAL file, but if the wal creation fails, in some cases, HDFS will leave a empty file in the dir(e.g. disk full, file is created succesfully but block allocation fails). We have a check in AbstractFSWAL that if WAL belong to the same factory exists, then a error will be throw. Thus, the region can never be open on this RS later.
{code:java}
2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
java.io.IOException: Target WAL already exists within directory hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
        at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382)
        at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210)
        at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
        at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
        at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
        at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
        at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:834)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)