You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/05/14 22:33:00 UTC

[jira] [Updated] (TEZ-1122) Race between canCommit and Task moving into RUNNING state

     [ https://issues.apache.org/jira/browse/TEZ-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hitesh Shah updated TEZ-1122:
-----------------------------
    Target Version/s: 0.8.0  (was: 0.6.0)

> Race between canCommit and Task moving into RUNNING state
> ---------------------------------------------------------
>
>                 Key: TEZ-1122
>                 URL: https://issues.apache.org/jira/browse/TEZ-1122
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: Tez-1122.patch
>
>
> A task moves into RUNNING state via async events generated after a TaskAttempt moves into RUNNING state, which is triggered by getTask().
> canCommit() is a synchronous call on the umbilical - for short running tasks, a canCommit can come in before the async events are handled.
> {code}
> 2014-05-15 13:21:15,531 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: TaskAttempt: [attempt_1400183444139_0007_1_00_000000_0] started. Is using containerId: [container_1400183444139_0007_01_000002] on NM: []
> 2014-05-15 13:21:15,533 INFO [AsyncDispatcher event handler] org.apache.tez.dag.history.HistoryEventHandler: [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_STARTED]: vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0, startTime=1400185273335, containerId=container_1400183444139_0007_01_000002, nodeId=, inProgressLogs=/node/containerlogs/container_1400183444139_0007_01_000002/, completedLogs=localhost:19888/jobhistory/logs///container_1400183444139_0007_01_000002/v_datagen_attempt_1400183444139_0007_1_00_000000_0/
> 2014-05-15 13:21:15,534 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from START_WAIT to RUNNING due to event TA_STARTED_REMOTELY
> 2014-05-15 13:21:15,534 INFO [IPC Server handler 6 on 61779] org.apache.tez.dag.app.dag.impl.TaskImpl: Task not running. Issuing kill to bad commit attempt attempt_1400183444139_0007_1_00_000000_0
> 2014-05-15 13:21:15,534 INFO [AMRM Callback Handler Thread] org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu: -1 taskAllocations: 1
> 2014-05-15 13:21:15,537 INFO [AsyncDispatcher event handler] org.apache.tez.common.counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=128, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
> 2014-05-15 13:21:15,541 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskImpl: task_1400183444139_0007_1_00_000000 Task Transitioned from SCHEDULED to RUNNING
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler] org.apache.tez.dag.history.HistoryEventHandler: [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0, startTime=1400185273335, finishTime=1400185275542, timeTaken=2207, status=KILLED, diagnostics=, counters=Counters: 0
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from RUNNING to KILL_IN_PROGRESS due to event TA_KILL_REQUEST
> 2014-05-15 13:21:15,546 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_TA_ENDED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)