You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2020/08/03 14:07:00 UTC

[jira] [Commented] (HBASE-21751) WAL creation fails during region open may cause region assign forever fail

    [ https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170053#comment-17170053 ] 

Hudson commented on HBASE-21751:
--------------------------------

Results for branch master
	[build #1802 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1802/]: (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1802/General_20Nightly_20Build_20Report/]






(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1802/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 report|https://builds.apache.org/job/HBase%20Nightly/job/master/1802/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/master/1802//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3)


> WAL creation fails during region open may cause region assign forever fail
> --------------------------------------------------------------------------
>
>                 Key: HBASE-21751
>                 URL: https://issues.apache.org/jira/browse/HBASE-21751
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.2, 2.0.4
>            Reporter: Allan Yang
>            Assignee: Bing Xiao
>            Priority: Major
>             Fix For: 2.3.0, 2.2.1, 2.1.6
>
>         Attachments: HBASE-21751-branch-2.1-v1.patch, HBASE-21751-branch-2.1-v2.patch, HBASE-21751-branch-2.1-v3.patch, HBASE-21751.patch, HBASE-21751.v2.patch, HBASE-21751.v3.patch, HBASE-21751v2.patch
>
>
> During the first region opens on the RS, WALFactory will create a WAL file, but if the wal creation fails, in some cases, HDFS will leave a empty file in the dir(e.g. disk full, file is created succesfully but block allocation fails). We have a check in AbstractFSWAL that if WAL belong to the same factory exists, then a error will be throw. Thus, the region can never be open on this RS later.
> {code:java}
> 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
> java.io.IOException: Target WAL already exists within directory hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
>         at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382)
>         at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210)
>         at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
>         at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
>         at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
>         at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
>         at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
>         at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)