You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Atri Sharma (JIRA)" <ji...@apache.org> on 2016/10/13 06:45:20 UTC

[jira] [Commented] (HBASE-12074) TestLogRollingNoCluster#testContendedLogRolling() failed

    [ https://issues.apache.org/jira/browse/HBASE-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571053#comment-15571053 ] 

Atri Sharma commented on HBASE-12074:
-------------------------------------

Could a possible fix be to make rollWriter get the zig-zag latch and call doReplaceWriter as the first operation, before attempting to close and flush the log files? This will lead new HLog Writer threads to see the newPath already set and not wait for the flush to happen, and the old file cleanup can happen as a background thread.

> TestLogRollingNoCluster#testContendedLogRolling() failed
> --------------------------------------------------------
>
>                 Key: HBASE-12074
>                 URL: https://issues.apache.org/jira/browse/HBASE-12074
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Stephen Yuan Jiang
>
> TestLogRollingNoCluster#testContendedLogRolling() failed on a 0.98 run. I am trying to understand the context. 
> The failure is this: 
> {code}
> java.lang.AssertionError
> 	at org.junit.Assert.fail(Assert.java:86)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.junit.Assert.assertFalse(Assert.java:64)
> 	at org.junit.Assert.assertFalse(Assert.java:74)
> 	at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:80)
> {code}
> Caused because one of the Appenders calling FSHLog.sync() threw IOE because of concurrent close: 
> {code}
> 4-09-23 16:36:39,530 FATAL [pool-1-thread-1-WAL.AsyncSyncer0] wal.FSHLog$AsyncSyncer(1246): Error while AsyncSyncer sync, request close of hlog 
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> 2014-09-23 16:36:39,531 INFO  [32] wal.TestLogRollingNoCluster$Appender(137): Caught exception from Appender:32
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> 2014-09-23 16:36:39,532 INFO  [19] wal.TestLogRollingNoCluster$Appender(137): Caught exception from Appender:19
> java.io.IOException: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> 	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> 	... 2 more
> {code}
> The code is: 
> {code}
>   public void sync() throws IOException {
>     try {
>       this.output.flush();
>       this.output.sync();
>     } catch (NullPointerException npe) {
>       // Concurrent close...
>       throw new IOException(npe);
>     }
>   }
> {code}
> I think the test case written exactly to catch this case: 
> {code}
>    * Spin up a bunch of threads and have them all append to a WAL.  Roll the
>    * WAL frequently to try and trigger NPE.
> {code}
> This is why I am reporting since I don't have much context. It may not be a test issue, but an actual bug. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)