You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/02/12 00:44:13 UTC

[jira] [Commented] (TEZ-2082) Failing test: TestPreemption::testPreemptionWithSession/

    [ https://issues.apache.org/jira/browse/TEZ-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317224#comment-14317224 ] 

Bikas Saha commented on TEZ-2082:
---------------------------------

This is likely a race condition introduced in TEZ-2045 and hence I am removing the 0.6.1 target version and reducing priority.
Explanation below. /cc [~sseth]

In TaskAttemptListenerImpTezDag.getTask(TaskAttemptListenerImpTezDag.java)
{code}
=== registeredContainers returns true here==== 
      if (!registeredContainers.containsKey(containerId)) {
        if(context.getAllContainers().get(containerId) == null) {
          LOG.info("Container with id: " + containerId
              + " is invalid and will be killed");
        } else {
          LOG.info("Container with id: " + containerId
              + " is valid, but no longer registered, and will be killed");
        }
        task = TASK_FOR_INVALID_JVM;
      } else {
        pingContainerHeartbeatHandler(containerId);
=== registeredContainers return null for the same cId inside getContainerTask ===
=== so it returns TASK_FOR_INVALID_JVM but code only checks for null ====
        task = getContainerTask(containerId);
        if (task == null) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("No task current assigned to Container with id: " + containerId);
          }
        } else {
            context.getEventHandler().handle(
=== so it crashes here while accessing getTaskSpec().getTaskAttemptID() since that is null for TASK_FOR_INVALID_JVM ===
                new TaskAttemptEventStartedRemotely(task.getTaskSpec()
                    .getTaskAttemptID(), containerId, context
                    .getApplicationACLs()));
            LOG.info("Container with id: " + containerId + " given task: "
                + task.getTaskSpec().getTaskAttemptID());
        }
      }{code}

Can't think of anyway to test for this race condition. So added a precondition that will help catch this more easily if it occurs again.

> Failing test: TestPreemption::testPreemptionWithSession/
> --------------------------------------------------------
>
>                 Key: TEZ-2082
>                 URL: https://issues.apache.org/jira/browse/TEZ-2082
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hitesh Shah
>            Assignee: Bikas Saha
>         Attachments: TEZ-2082.1.patch
>
>
> From https://builds.apache.org/job/Tez-Build/891/testReport/junit/org.apache.tez.dag.app/TestPreemption/testPreemptionWithSession/
> Exception in thread "Thread-27" java.lang.NullPointerException
> 	at org.apache.tez.dag.app.TaskAttemptListenerImpTezDag.getTask(TaskAttemptListenerImpTezDag.java:222)
> 	at org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.run(MockDAGAppMaster.java:230)
> 	at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)