You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2018/05/08 20:51:00 UTC

[jira] [Commented] (TEZ-3932) TaskSchedulerManager can throw NullPointerException during DAGAppMaster container cleanup race

    [ https://issues.apache.org/jira/browse/TEZ-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467951#comment-16467951 ] 

Jonathan Eagles commented on TEZ-3932:
--------------------------------------

[~vserrao], thank you for providing the test logs as I was able to create a reliable test case that reproduces this issue. I was able to create an initial patch that will remove this intermittent issue you have been facing and I will work with the community to get this checked in. This logs show that this is not just a test issue but could happen in practice during shutdown scenarios. 

> TaskSchedulerManager can throw NullPointerException during DAGAppMaster container cleanup race
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3932
>                 URL: https://issues.apache.org/jira/browse/TEZ-3932
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: arch: x86 and ppc
> java: openjdk version "1.8.0_161"
>          OpenJDK Runtime Environment (build 1.8.0_161-b14)
>          OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
>            Reporter: Valencia Edna Serrao
>            Assignee: Jonathan Eagles
>            Priority: Major
>              Labels: ppc, x86
>         Attachments: TEZ-3932.001.patch, TEZ-3932.fail.patch, org.apache.tez.test.TestExceptionPropagation-output.txt
>
>
> Test org.apache.tez.test.TestExceptionPropagation.testExceptionPropagationSession on x86 and ppc. I found related JIRA's TEZ-3746 and TEZ-3748. Though the issue is marked as resolved in the related JIRA's, the issue exists. Below are the error details:
> {code:java}
> -------------------------------------------------------------------------------
> Test set: org.apache.tez.test.TestExceptionPropagation
> -------------------------------------------------------------------------------
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 96.433 sec <<< FAILURE!
> testExceptionPropagationSession(org.apache.tez.test.TestExceptionPropagation)  Time elapsed: 52.7 sec  <<< ERROR!
> org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1525667420557_0001, yarnApplicationState=FAILED, finalApplicationStatus=FAILED, trackingUrl=N/A, diagnostics=[DAG completed with an ERROR state. Shutting down AM, Session stats:submittedDAGs=11, successfulDAGs=0, failedDAGs=12, killedDAGs=0]
>         at org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910)
>         at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1024)
>         at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:1034)
>         at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:652)
>         at org.apache.tez.client.TezClient.submitDAG(TezClient.java:588)
>         at org.apache.tez.test.TestExceptionPropagation.testExceptionPropagationSession(TestExceptionPropagation.java:227
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)