You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2023/10/16 14:09:00 UTC

[jira] [Created] (HBASE-28154) TestZooKeeper could hang forever

Duo Zhang created HBASE-28154:
---------------------------------

             Summary: TestZooKeeper could hang forever
                 Key: HBASE-28154
                 URL: https://issues.apache.org/jira/browse/HBASE-28154
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: Duo Zhang


Recently saw this several times in pre commit result.

Checked the log output, it is stuck in testRegionServerSessionExpired.

When replaying the edit for meta region, in the end we need to flush the memstore, and the flush is stuck which causes the test to timeout.

This is the last log message for opening hbase:meta
{noformat}
2023-10-15T14:37:46,704 INFO  [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HRegion(2885): Flushing 1588230740 4/4 column families, dataSize=74 B heapSize=1.22 KB
{noformat}

And when the test timed out, we saw this
{noformat}
2023-10-15T14:47:57,360 WARN  [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HStore(846): Failed flushing store file for 1588230740/ns, retrying num=0
java.nio.channels.ClosedChannelException: null
	at org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73) ~[hadoop-hdfs-client-3.2.4.jar:?]
	at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) ~[hadoop-hdfs-client-3.2.4.jar:?]
	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) ~[hadoop-common-3.2.4.jar:?]
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) ~[hadoop-common-3.2.4.jar:?]
	at java.io.DataOutputStream.write(DataOutputStream.java:107) ~[?:1.8.0_352]
	at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:1045) ~[classes/:?]
	at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1032) ~[classes/:?]
	at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:539) ~[classes/:?]
	at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:615) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:70) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1969) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:3012) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2720) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5458) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1032) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:966) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7774) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7729) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7704) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7663) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7619) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:138) ~[classes/:?]
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) ~[classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
{noformat}

It is stuck on writing data to HDFS...

Not sure what is the root cause, need to dig more...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)