You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2014/08/11 05:17:11 UTC

[jira] [Commented] (TEZ-1358) Display better diagnostics when tasks fail to launch

    [ https://issues.apache.org/jira/browse/TEZ-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092388#comment-14092388 ] 

Jeff Zhang commented on TEZ-1358:
---------------------------------

Verify the 2 cases ( bad environment settings,  localization failures ), which cause container launch failed. The error messages looks clear to me, no need to fix it. [~hitesh], please help confirm it.

The following are the error messages:

*bad environment settings*
{code}
DAG diagnostics: [Vertex failed, vertexName=tokenizer, vertexId=vertex_1407719224092_0019_1_00, diagnostics=[Task failed, taskId=task_1407719224092_0019_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1407719224092_0019_01_000002 finished with diagnostics set to [Container failed. Invalid environment variable name: "na=me"
]], TaskAttempt 1 failed, info=[Container container_1407719224092_0019_01_000003 finished with diagnostics set to [Container failed. Invalid environment variable name: "na=me"
]], TaskAttempt 2 failed, info=[Container container_1407719224092_0019_01_000004 finished with diagnostics set to [Container failed. Invalid environment variable name: "na=me"
]], TaskAttempt 3 failed, info=[Container container_1407719224092_0019_01_000005 finished with diagnostics set to [Container failed. Invalid environment variable name: "na=me"
]]], Vertex failed as one or more tasks failed. failedTasks:1], Vertex killed, vertexName=summer, vertexId=vertex_1407719224092_0019_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0], DAG failed due to vertex failure. failedVertices:1 killedVertices:1]
{code}

*localization failures*
{code}
DAG diagnostics: [Vertex failed, vertexName=tokenizer, vertexId=vertex_1407719224092_0018_1_00, diagnostics=[Task failed, taskId=task_1407719224092_0018_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1407719224092_0018_01_000002 finished with diagnostics set to [Container failed. Resource hdfs://0.0.0.0:9000/user/jzhang/tez-mapreduce-0.5.0-SNAPSHOT.jar changed on src filesystem (expected 1407725193190, was 1407725193090
]], TaskAttempt 1 failed, info=[Container container_1407719224092_0018_01_000003 finished with diagnostics set to [Container failed. Resource hdfs://0.0.0.0:9000/user/jzhang/tez-mapreduce-0.5.0-SNAPSHOT.jar changed on src filesystem (expected 1407725193190, was 1407725193090
]], TaskAttempt 2 failed, info=[Container container_1407719224092_0018_01_000004 finished with diagnostics set to [Container failed. Resource hdfs://0.0.0.0:9000/user/jzhang/tez-mapreduce-0.5.0-SNAPSHOT.jar changed on src filesystem (expected 1407725193190, was 1407725193090
]], TaskAttempt 3 failed, info=[Container container_1407719224092_0018_01_000005 finished with diagnostics set to [Container failed. Resource hdfs://0.0.0.0:9000/user/jzhang/tez-mapreduce-0.5.0-SNAPSHOT.jar changed on src filesystem (expected 1407725193190, was 1407725193090
]]], Vertex failed as one or more tasks failed. failedTasks:1], Vertex killed, vertexName=summer, vertexId=vertex_1407719224092_0018_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0], DAG failed due to vertex failure. failedVertices:1 killedVertices:1]
{code}


> Display better diagnostics when tasks fail to launch 
> -----------------------------------------------------
>
>                 Key: TEZ-1358
>                 URL: https://issues.apache.org/jira/browse/TEZ-1358
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>
> Tasks could fail to launch due to various issues - bad environment settings, localization failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)