You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2023/10/16 14:09:00 UTC
[jira] [Created] (HBASE-28154) TestZooKeeper could hang forever
Duo Zhang created HBASE-28154:
---------------------------------
Summary: TestZooKeeper could hang forever
Key: HBASE-28154
URL: https://issues.apache.org/jira/browse/HBASE-28154
Project: HBase
Issue Type: Bug
Components: test
Reporter: Duo Zhang
Recently saw this several times in pre commit result.
Checked the log output, it is stuck in testRegionServerSessionExpired.
When replaying the edit for meta region, in the end we need to flush the memstore, and the flush is stuck which causes the test to timeout.
This is the last log message for opening hbase:meta
{noformat}
2023-10-15T14:37:46,704 INFO [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HRegion(2885): Flushing 1588230740 4/4 column families, dataSize=74 B heapSize=1.22 KB
{noformat}
And when the test timed out, we saw this
{noformat}
2023-10-15T14:47:57,360 WARN [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HStore(846): Failed flushing store file for 1588230740/ns, retrying num=0
java.nio.channels.ClosedChannelException: null
at org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73) ~[hadoop-hdfs-client-3.2.4.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) ~[hadoop-hdfs-client-3.2.4.jar:?]
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) ~[hadoop-common-3.2.4.jar:?]
at java.io.DataOutputStream.write(DataOutputStream.java:107) ~[?:1.8.0_352]
at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:1045) ~[classes/:?]
at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1032) ~[classes/:?]
at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:539) ~[classes/:?]
at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:615) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:70) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1969) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:3012) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2720) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5458) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1032) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:966) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7774) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7729) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7704) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7663) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7619) ~[classes/:?]
at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:138) ~[classes/:?]
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) ~[classes/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
{noformat}
It is stuck on writing data to HDFS...
Not sure what is the root cause, need to dig more...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)