You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2018/05/08 20:51:00 UTC
[jira] [Commented] (TEZ-3932) TaskSchedulerManager can throw
NullPointerException during DAGAppMaster container cleanup race
[ https://issues.apache.org/jira/browse/TEZ-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467951#comment-16467951 ]
Jonathan Eagles commented on TEZ-3932:
--------------------------------------
[~vserrao], thank you for providing the test logs as I was able to create a reliable test case that reproduces this issue. I was able to create an initial patch that will remove this intermittent issue you have been facing and I will work with the community to get this checked in. This logs show that this is not just a test issue but could happen in practice during shutdown scenarios.
> TaskSchedulerManager can throw NullPointerException during DAGAppMaster container cleanup race
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-3932
> URL: https://issues.apache.org/jira/browse/TEZ-3932
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.10.0
> Environment: arch: x86 and ppc
> java: openjdk version "1.8.0_161"
> OpenJDK Runtime Environment (build 1.8.0_161-b14)
> OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
> Reporter: Valencia Edna Serrao
> Assignee: Jonathan Eagles
> Priority: Major
> Labels: ppc, x86
> Attachments: TEZ-3932.001.patch, TEZ-3932.fail.patch, org.apache.tez.test.TestExceptionPropagation-output.txt
>
>
> Test org.apache.tez.test.TestExceptionPropagation.testExceptionPropagationSession on x86 and ppc. I found related JIRA's TEZ-3746 and TEZ-3748. Though the issue is marked as resolved in the related JIRA's, the issue exists. Below are the error details:
> {code:java}
> -------------------------------------------------------------------------------
> Test set: org.apache.tez.test.TestExceptionPropagation
> -------------------------------------------------------------------------------
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 96.433 sec <<< FAILURE!
> testExceptionPropagationSession(org.apache.tez.test.TestExceptionPropagation) Time elapsed: 52.7 sec <<< ERROR!
> org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1525667420557_0001, yarnApplicationState=FAILED, finalApplicationStatus=FAILED, trackingUrl=N/A, diagnostics=[DAG completed with an ERROR state. Shutting down AM, Session stats:submittedDAGs=11, successfulDAGs=0, failedDAGs=12, killedDAGs=0]
> at org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910)
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1024)
> at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:1034)
> at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:652)
> at org.apache.tez.client.TezClient.submitDAG(TezClient.java:588)
> at org.apache.tez.test.TestExceptionPropagation.testExceptionPropagationSession(TestExceptionPropagation.java:227
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)