You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Peter Bacsko (Jira)" <ji...@apache.org> on 2020/10/14 10:46:00 UTC
[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests
in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated YARN-10460:
--------------------------------
Description:
In our downstream build environment, we're using JUnit 4.13. Recently, we discovered a truly weird test failure in TestNodeStatusUpdater.
The problem is that timeout handling has changed in Junit 4.13. See the difference between these two snippets:
4.12
{noformat}
@Override
public void evaluate() throws Throwable {
CallableStatement callable = new CallableStatement();
FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
threadGroup = new ThreadGroup("FailOnTimeoutGroup");
Thread thread = new Thread(threadGroup, task, "Time-limited test");
thread.setDaemon(true);
thread.start();
callable.awaitStarted();
Throwable throwable = getResult(task, thread);
if (throwable != null) {
throw throwable;
}
}
{noformat}
4.13
{noformat}
@Override
public void evaluate() throws Throwable {
CallableStatement callable = new CallableStatement();
FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
Thread thread = new Thread(threadGroup, task, "Time-limited test");
try {
thread.setDaemon(true);
thread.start();
callable.awaitStarted();
Throwable throwable = getResult(task, thread);
if (throwable != null) {
throw throwable;
}
} finally {
try {
thread.join(1);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
try {
threadGroup.destroy(); <---- This
} catch (IllegalThreadStateException e) {
// If a thread from the group is still alive, the ThreadGroup cannot be destroyed.
// Swallow the exception to keep the same behavior prior to this change.
}
}
}
{noformat}
The change comes from [https://github.com/junit-team/junit4/pull/1517].
Unfortunately, destroying the thread group causes an issue because there are all sorts of object caching in the IPC layer. The exception is:
{noformat}
java.lang.IllegalThreadStateException
at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
at java.lang.Thread.init(Thread.java:402)
at java.lang.Thread.init(Thread.java:349)
at java.lang.Thread.<init>(Thread.java:675)
at java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
at com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
at org.apache.hadoop.ipc.Client.call(Client.java:1458)
at org.apache.hadoop.ipc.Client.call(Client.java:1405)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
{noformat}
Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the client object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} are stored as long as they're needed. But since the backing thread group is destroyed in the previous test, it's no longer possible to create new threads.
A quick workaround is to stop the clients and completely clear the {{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this and it solves the problem but it feels hacky. Not sure if there is a better approach.
was:
In our downstream build environment, we're using JUnit 4.13. Recently, we discovered a truly weird test failure in TestNodeStatusUpdater.
The problem is that timeout handling has changed in Junit 4.13. See the difference between these two snippets:
4.12
{noformat}
@Override
public void evaluate() throws Throwable {
CallableStatement callable = new CallableStatement();
FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
threadGroup = new ThreadGroup("FailOnTimeoutGroup");
Thread thread = new Thread(threadGroup, task, "Time-limited test");
thread.setDaemon(true);
thread.start();
callable.awaitStarted();
Throwable throwable = getResult(task, thread);
if (throwable != null) {
throw throwable;
}
}
{noformat}
4.13
{noformat}
@Override
public void evaluate() throws Throwable {
CallableStatement callable = new CallableStatement();
FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
Thread thread = new Thread(threadGroup, task, "Time-limited test");
try {
thread.setDaemon(true);
thread.start();
callable.awaitStarted();
Throwable throwable = getResult(task, thread);
if (throwable != null) {
throw throwable;
}
} finally {
try {
thread.join(1);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
try {
threadGroup.destroy(); <---- This
} catch (IllegalThreadStateException e) {
// If a thread from the group is still alive, the ThreadGroup cannot be destroyed.
// Swallow the exception to keep the same behavior prior to this change.
}
}
}
{noformat}
The change comes from [https://github.com/junit-team/junit4/pull/1517].
Unfortunately, destroying the thread group causes an issue because there are all sorts of object caching in the IPC layer. The exception is:
{noformat}
java.lang.IllegalThreadStateException
at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
at java.lang.Thread.init(Thread.java:402)
at java.lang.Thread.init(Thread.java:349)
at java.lang.Thread.<init>(Thread.java:675)
at java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
at com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
at org.apache.hadoop.ipc.Client.call(Client.java:1458)
at org.apache.hadoop.ipc.Client.call(Client.java:1405)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
{noformat}
Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the client object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} is stored as long as they're needed. But since the backing thread group is destroyed in the previous test, it's no longer possible to create new threads.
A quick workaround is to stop the clients and completely clear the {{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this and it solves the problem but it feels hacky. Not sure if there is a better approach.
> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> ---------------------------------------------------------------------
>
> Key: YARN-10460
> URL: https://issues.apache.org/jira/browse/YARN-10460
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager, test
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the difference between these two snippets:
> 4.12
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
> threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> }
> {noformat}
>
> 4.13
> {noformat}
> @Override
> public void evaluate() throws Throwable {
> CallableStatement callable = new CallableStatement();
> FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
> ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
> Thread thread = new Thread(threadGroup, task, "Time-limited test");
> try {
> thread.setDaemon(true);
> thread.start();
> callable.awaitStarted();
> Throwable throwable = getResult(task, thread);
> if (throwable != null) {
> throw throwable;
> }
> } finally {
> try {
> thread.join(1);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> try {
> threadGroup.destroy(); <---- This
> } catch (IllegalThreadStateException e) {
> // If a thread from the group is still alive, the ThreadGroup cannot be destroyed.
> // Swallow the exception to keep the same behavior prior to this change.
> }
> }
> }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
> at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
> at java.lang.Thread.init(Thread.java:402)
> at java.lang.Thread.init(Thread.java:349)
> at java.lang.Thread.<init>(Thread.java:675)
> at java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
> at com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
> at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
> at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
> at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
> at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
> at org.apache.hadoop.ipc.Client.call(Client.java:1458)
> at org.apache.hadoop.ipc.Client.call(Client.java:1405)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
> at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
> at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
> at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
> {noformat}
> Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the client object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} are stored as long as they're needed. But since the backing thread group is destroyed in the previous test, it's no longer possible to create new threads.
> A quick workaround is to stop the clients and completely clear the {{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this and it solves the problem but it feels hacky. Not sure if there is a better approach.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org