You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2017/03/23 15:31:41 UTC

[jira] [Comment Edited] (HBASE-17287) Master becomes a zombie if filesystem object closes

    [ https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938534#comment-15938534 ] 

Ted Yu edited comment on HBASE-17287 at 3/23/17 3:30 PM:
---------------------------------------------------------

In getLogDirs(), when we detect closed filesystem, abort the master.

Comment is welcome.


was (Author: yuzhihong@gmail.com):
Tentative patch.

> Master becomes a zombie if filesystem object closes
> ---------------------------------------------------
>
>                 Key: HBASE-17287
>                 URL: https://issues.apache.org/jira/browse/HBASE-17287
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Clay B.
>            Assignee: Ted Yu
>         Attachments: 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's HDFS client is unable to stabilize before {{dfs.client.failover.max.attempts}} then the master's filesystem object closes. This seems to result in an HBase master which will continue to run (process and znode exists) but no meaningful work can be done (e.g. assigning meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)