You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Chia-Ping Tsai (JIRA)" <ji...@apache.org> on 2017/12/25 19:47:00 UTC
[jira] [Created] (HBASE-19624) TestIOFencing hangs
Chia-Ping Tsai created HBASE-19624:
--------------------------------------
Summary: TestIOFencing hangs
Key: HBASE-19624
URL: https://issues.apache.org/jira/browse/HBASE-19624
Project: HBase
Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai
Fix For: 2.0.0
RS calls CompactSplit#join to cease all compactSplit threads.
{code:title=CompactSplit.java}
private void waitFor(ThreadPoolExecutor t, String name) {
boolean done = false;
while (!done) {
try {
done = t.awaitTermination(60, TimeUnit.SECONDS);
LOG.info("Waiting for " + name + " to finish...");
if (!done) {
t.shutdownNow();
}
} catch (InterruptedException ie) {
LOG.warn("Interrupted waiting for " + name + " to finish...");
}
}
}
{code}
In the meantime, the async wal may wait for the sync signal. However, the single won't happen as the wal sync is failed.
{code}
synchronized long get(long timeoutNs) throws InterruptedException,
ExecutionException, TimeoutIOException {
final long done = System.nanoTime() + timeoutNs;
while (!isDone()) {
wait(1000);
if (System.nanoTime() >= done) {
throw new TimeoutIOException(
"Failed to get sync result after " + TimeUnit.NANOSECONDS.toMillis(timeoutNs)
+ " ms for txid=" + this.txid + ", WAL system stuck?");
}
}
if (this.throwable != null) {
throw new ExecutionException(this.throwable);
}
return this.doneTxid;
}
{code}
When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the interrupt single to all rs threads. And then catching the InterruptedException cause compactionsplit to skip the #shutdownNow, hence the compactionsplit threads were up until timeout (default is 5 min).
{code}
for (int i = 0; i < 100; ++i) {
boolean atLeastOneLiveServer = false;
for (RegionServerThread t : regionservers) {
if (t.isAlive()) {
atLeastOneLiveServer = true;
try {
LOG.warn("RegionServerThreads remaining, give one more chance before interrupting");
t.join(1000);
} catch (InterruptedException e) {
wasInterrupted = true;
}
}
}
if (!atLeastOneLiveServer) break;
for (RegionServerThread t : regionservers) {
if (t.isAlive()) {
LOG.warn("RegionServerThreads taking too long to stop, interrupting");
t.interrupt();
}
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)