You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/02/06 00:40:00 UTC

[jira] [Reopened] (HBASE-19927) TestFullLogReconstruction flakey

     [ https://issues.apache.org/jira/browse/HBASE-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang reopened HBASE-19927:
-------------------------------

A bit strange

{noformat}
2018-02-05 19:05:43,537 INFO  [Time-limited test] regionserver.HRegionServer(2116): ***** STOPPING region server 'asf903.gq1.ygridcore.net,57911,1517857533524' *****
2018-02-05 19:05:43,537 INFO  [Time-limited test] regionserver.HRegionServer(2130): STOPPED: Shutdown requested
2018-02-05 19:05:43,538 INFO  [Time-limited test] regionserver.HRegionServer(2116): ***** STOPPING region server 'asf903.gq1.ygridcore.net,50054,1517857533606' *****
2018-02-05 19:05:43,538 INFO  [RS:0;asf903:57911] regionserver.SplitLogWorker(160): Sending interrupt to stop the worker thread
2018-02-05 19:05:43,538 INFO  [Time-limited test] regionserver.HRegionServer(2130): STOPPED: Shutdown requested
2018-02-05 19:05:43,538 INFO  [Time-limited test] regionserver.HRegionServer(2116): ***** STOPPING region server 'asf903.gq1.ygridcore.net,42069,1517857533678' *****

2018-02-05 19:05:43,974 ERROR [regionserver/asf903:0.logRoller] helpers.MarkerIgnoringBase(159): ***** ABORTING region server asf903.gq1.ygridcore.net,57911,1517857533524: IOE in log roller *****
{noformat}

The aborting still happens after the stopping in shutdown. Let me check.

> TestFullLogReconstruction flakey
> --------------------------------
>
>                 Key: HBASE-19927
>                 URL: https://issues.apache.org/jira/browse/HBASE-19927
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: stack
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-19927.patch, js, out
>
>
> Fails pretty frequently in hadoopqa builds.
> There is a recent hang in org.apache.hadoop.hbase.TestFullLogReconstruction.tearDownAfterClass(TestFullLogReconstruction.java:68)
> In here... https://builds.apache.org/job/PreCommit-HBASE-Build/11363/testReport/org.apache.hadoop.hbase/TestFullLogReconstruction/org_apache_hadoop_hbase_TestFullLogReconstruction/
> ... see here.
> Thread 1250 (RS_CLOSE_META-edd281aedb18:59863-0):
>   State: TIMED_WAITING
>   Blocked count: 92
>   Waited count: 278
>   Stack:
>     java.lang.Object.wait(Native Method)
>     org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:133)
>     org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
>     org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:605)
>     org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:154)
>     org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeFlushMarker(WALUtil.java:81)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2645)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2356)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2328)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2319)
>     org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1531)
>     org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1437)
>     org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
>     org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     java.lang.Thread.run(Thread.java:748)
> We missed a signal? We need to do an interrupt? The log is not all there in hadoopqa builds so hard to see all that is going on. This test is not in the flakey set either....



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)