You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/05/10 11:45:00 UTC
[jira] [Comment Edited] (TEZ-4149) Speed up TezRecovery tests

    [ https://issues.apache.org/jira/browse/TEZ-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103751#comment-17103751 ] 

László Bodor edited comment on TEZ-4149 at 5/10/20, 11:44 AM:
--------------------------------------------------------------

some gaps found in [^org.apache.tez.test.TestRecovery-output.txt] which may be reduced:

7s
{code}
2020-05-10 12:39:59,406 INFO  [NM ContainerManager dispatcher] loghandler.NonAggregatingLogHandler (NonAggregatingLogHandler.java:handle(173)) - Scheduling Log Deletion for application: application_1589107183822_0001, with delay of 10800 seconds
2020-05-10 12:40:06,424 INFO  [Listener at MacBook-Pro.local/54645] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(762)) - Done waiting for Applications to be Finished. Still alive: [application_1589107183822_0001]
{code}

10s
{code}
2020-05-10 12:40:06,433 INFO  [IPC Server Responder] ipc.Server (Server.java:run(1466)) - Stopping IPC Server Responder
2020-05-10 12:40:16,451 INFO  [Listener at MacBook-Pro.local/54645] ipc.Server (Server.java:stop(3360)) - Stopping server on 54645
{code}

5s
{code}
2020-05-10 12:40:16,453 INFO  [Listener at MacBook-Pro.local/54645] nodemanager.NodeResourceMonitorImpl (NodeResourceMonitorImpl.java:isEnabled(85)) - Node Resource monitoring interval is <=0. org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is disabled.
2020-05-10 12:40:21,475 WARN  [Listener at MacBook-Pro.local/54645] server.MiniYARNCluster (MiniYARNCluster.java:waitForAppMastersToFinish(526)) - Stopping RM while some app masters are still alive
{code}

this single case took 51s, but ~20s of it seemed to be "idle", maybe with some minicluster / yarn configuration would help

[the last 5s cannot be reduced by simple config|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java#L524]


was (Author: abstractdog):
some gaps found in [^org.apache.tez.test.TestRecovery-output.txt] which may be reduced:

7s
{code}
2020-05-10 12:39:59,406 INFO  [NM ContainerManager dispatcher] loghandler.NonAggregatingLogHandler (NonAggregatingLogHandler.java:handle(173)) - Scheduling Log Deletion for application: application_1589107183822_0001, with delay of 10800 seconds
2020-05-10 12:40:06,424 INFO  [Listener at MacBook-Pro.local/54645] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(762)) - Done waiting for Applications to be Finished. Still alive: [application_1589107183822_0001]
{code}

10s
{code}
2020-05-10 12:40:06,433 INFO  [IPC Server Responder] ipc.Server (Server.java:run(1466)) - Stopping IPC Server Responder
2020-05-10 12:40:16,451 INFO  [Listener at MacBook-Pro.local/54645] ipc.Server (Server.java:stop(3360)) - Stopping server on 54645
{code}

5s
{code}
2020-05-10 12:40:16,453 INFO  [Listener at MacBook-Pro.local/54645] nodemanager.NodeResourceMonitorImpl (NodeResourceMonitorImpl.java:isEnabled(85)) - Node Resource monitoring interval is <=0. org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is disabled.
2020-05-10 12:40:21,475 WARN  [Listener at MacBook-Pro.local/54645] server.MiniYARNCluster (MiniYARNCluster.java:waitForAppMastersToFinish(526)) - Stopping RM while some app masters are still alive
{code}

this single case took 51s, but ~20s of it seemed to be "idle", maybe with some minicluster / yarn configuration would help

> Speed up TezRecovery tests
> --------------------------
>
>                 Key: TEZ-4149
>                 URL: https://issues.apache.org/jira/browse/TEZ-4149
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jonathan Turner Eagles
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: org.apache.tez.test.TestRecovery-output.txt
>
>
> Currently, approximately 50% of the tests cases are chosen to run as there are many failure points chosen to test recovery on.
> This can lead to the introduction of bugs into the code as not all test cases are run for every Tez QA run.
> In addition, this can be a real development bottleneck as tests take around 20 minutes per cycle if all tests are run (10 minutes if 50% of the tests are run as usual)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)