You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Ahmed Hussein (Jira)" <ji...@apache.org> on 2020/06/30 22:22:00 UTC

[jira] [Created] (YARN-10334) TestDistributedShell leaks resources on timeout/failure

Ahmed Hussein created YARN-10334:
------------------------------------

             Summary: TestDistributedShell leaks resources on timeout/failure
                 Key: YARN-10334
                 URL: https://issues.apache.org/jira/browse/YARN-10334
             Project: Hadoop YARN
          Issue Type: Bug
          Components: distributed-shell, test, yarn
            Reporter: Ahmed Hussein


{{TestDistributedShell}} times out on trunk. I found that the application, and containers will stay running in the background long after the unit test has failed.
This causes failure of other test cases and several false positives failures as result of:
* Ports will stay busy, so other tests cases fail to launch.
* Unit tests fail because of memory restrictions.

Although the unit test is already broken on trunk, we do not want its failures to other unit tests.
{{TestDistributedShell}} needs to be revisited to make sure that all {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of the each unit test (including exception and timeouts)

Steps to reproduce:



{code:bash}
mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers

## this will timeout as
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 90.234 s <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
[ERROR] testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)  Time elapsed: 90.018 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 90000 milliseconds
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117)
        at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089)
        at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   TestDistributedShell.testDSShellWithOpportunisticContainers:1438 ยป TestTimedOut
[INFO] 
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
{code}


Using {{ps}} command, you can find the yarn processes are still in the background

{code:bash}
/bin/bash -c $JRE_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --appname DistributedShell --homedir file:/Users/ahussein 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stdout 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stderr


$JRE_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --appname DistributedShell --homedir file:/Users/ahussein
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org