You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2016/02/19 05:05:18 UTC

[jira] [Commented] (HBASE-15265) Implement an asynchronous FSHLog

    [ https://issues.apache.org/jira/browse/HBASE-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153676#comment-15153676 ] 

Duo Zhang commented on HBASE-15265:
-----------------------------------

There are two problems here, so the comments of this test file.

https://github.com/Apache9/hbase/blob/HBASE-15265/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestAsyncLogRolling.java

First, {{FanOutOneBlockAsyncDFSOutput}} is fail-fast, which means the creation is fail-fast too. But in the current log rolling architecture, we will abort RS if log rolling failed. For the old {{FSHLog}} implementation, {{DFSClient}} and {{DFSOutputStream}} have done a lot of retries when calling namenode failed or connecting datanode failed so it is not a problem, but now we just throw exception out so... We need to solve this, may change the abort logic of {{LogRoller}} or add retry in {{AsyncFSWAL}}?

Second, AsyncFSWAL will not fail any sync request, instead, it will try rolling the WALWriter and try again. But in testcase, this could lead to an infinite waiting when shutdown. The shutdown timing is a little strange. We first mark RS as stopped, and then close all regions on this RS. And if the abort flag is false, we will flush the region and need to write something to WAL. If the WAL writer is broken just at this time, {{AsyncFSWAL}} will try rolling the WAL writer. But as said above, RS is marked as stopped, so LogRoller may have already exited, the rolling will never success and the shutdown process hang...
Yes, I think {{AsyncFSWAL}} should have the ability to quit the infinite waiting since we know that it will never success, but also I think we should revisit the shutdown timing since lots of modules in RS is depending on the stopped flag of RS.

Thanks.

> Implement an asynchronous FSHLog
> --------------------------------
>
>                 Key: HBASE-15265
>                 URL: https://issues.apache.org/jira/browse/HBASE-15265
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)