You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/04/14 11:02:12 UTC

[jira] [Updated] (TEZ-2317) Successful task attempts getting killed

     [ https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated TEZ-2317:
------------------------------------
    Attachment: AM-taskkill.log

 For a complex DAG when there were lot of events generated and it could not process the events fast enough, we (me and [~bikassaha]) saw that many tasks were killed because only TA_SCHEDULE was processed and before it got to processing the RUNNING event that it got a commit go/no-go request which is a separate async call that does not go via the event queue. These issues were mostly with ONE-ONE edges Pig was using for distributed order by with sampling and  since it was not doing much except partitioning they were finishing too fast as well.

Issues to fix:
   - Optimize by not sending a commit go/no-go request if there is no hdfs output (DataSink) involved. In the above case, it is always intermediate output.
   - Handle the commit go/no-go request after processing events in the event queue. May be something like ask the task to come back after some time.
   - We saw that for 3058 KilledTaskAttempts TA_KILL_REQUEST events was 383519. This is way high. 
   - In the attached AM-taskkill.log which has grepped statements for a single task that was killed, it has 327 repeats of below message. Need to see why so much and fix that. 
{code}
2015-04-13 23:19:11,126 INFO [IPC Server handler 22 on 53043] app.TaskAttemptListenerImpTezDag: Commit go/no-go request from attempt_1428329756093_374362_1_29_008426_0
2015-04-13 23:19:11,126 INFO [IPC Server handler 22 on 53043] impl.TaskImpl: Task not running. Issuing kill to bad commit attempt attempt_1428329756093_374362_1_29_008426_0
{code}

Please create separate jiras as required.

> Successful task attempts getting killed
> ---------------------------------------
>
>                 Key: TEZ-2317
>                 URL: https://issues.apache.org/jira/browse/TEZ-2317
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>         Attachments: AM-taskkill.log
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)