You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2014/07/11 10:52:04 UTC

[jira] [Commented] (TEZ-1122) Race between canCommit and Task moving into RUNNING state

    [ https://issues.apache.org/jira/browse/TEZ-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058560#comment-14058560 ] 

Jeff Zhang commented on TEZ-1122:
---------------------------------

Attach the patch.

Changes in this patch
* Move the Task State Transition into getTask().  And make it sync call rather than async call.  This transition is very lightweight (LaunchTransition in TaskImpl is empty), so it wouldn't block the RPC call getTask()
* Update the testcase

> Race between canCommit and Task moving into RUNNING state
> ---------------------------------------------------------
>
>                 Key: TEZ-1122
>                 URL: https://issues.apache.org/jira/browse/TEZ-1122
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: Tez-1122.patch
>
>
> A task moves into RUNNING state via async events generated after a TaskAttempt moves into RUNNING state, which is triggered by getTask().
> canCommit() is a synchronous call on the umbilical - for short running tasks, a canCommit can come in before the async events are handled.
> {code}
> 2014-05-15 13:21:15,531 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: TaskAttempt: [attempt_1400183444139_0007_1_00_000000_0] started. Is using containerId: [container_1400183444139_0007_01_000002] on NM: []
> 2014-05-15 13:21:15,533 INFO [AsyncDispatcher event handler] org.apache.tez.dag.history.HistoryEventHandler: [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_STARTED]: vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0, startTime=1400185273335, containerId=container_1400183444139_0007_01_000002, nodeId=, inProgressLogs=/node/containerlogs/container_1400183444139_0007_01_000002/, completedLogs=localhost:19888/jobhistory/logs///container_1400183444139_0007_01_000002/v_datagen_attempt_1400183444139_0007_1_00_000000_0/
> 2014-05-15 13:21:15,534 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from START_WAIT to RUNNING due to event TA_STARTED_REMOTELY
> 2014-05-15 13:21:15,534 INFO [IPC Server handler 6 on 61779] org.apache.tez.dag.app.dag.impl.TaskImpl: Task not running. Issuing kill to bad commit attempt attempt_1400183444139_0007_1_00_000000_0
> 2014-05-15 13:21:15,534 INFO [AMRM Callback Handler Thread] org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu: -1 taskAllocations: 1
> 2014-05-15 13:21:15,537 INFO [AsyncDispatcher event handler] org.apache.tez.common.counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=128, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
> 2014-05-15 13:21:15,541 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskImpl: task_1400183444139_0007_1_00_000000 Task Transitioned from SCHEDULED to RUNNING
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler] org.apache.tez.dag.history.HistoryEventHandler: [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0, startTime=1400185273335, finishTime=1400185275542, timeTaken=2207, status=KILLED, diagnostics=, counters=Counters: 0
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from RUNNING to KILL_IN_PROGRESS due to event TA_KILL_REQUEST
> 2014-05-15 13:21:15,546 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_TA_ENDED
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)