You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Prabhu Joseph (JIRA)" <ji...@apache.org> on 2017/06/09 11:25:18 UTC

[jira] [Created] (TEZ-3756) Tez Query fails because of a weed node and all four attempts are placed on same node

Prabhu Joseph created TEZ-3756:
----------------------------------

             Summary: Tez Query fails because of a weed node and all four attempts are placed on same node
                 Key: TEZ-3756
                 URL: https://issues.apache.org/jira/browse/TEZ-3756
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.7.1
            Reporter: Prabhu Joseph


Tez query fails due to a task failing on all four attempts with "Error: Could not find or load main class org.apache.tez.runtime.task.TezChild". There is a weed node where all containers are failing with this error. Tez library tez.tar.gz cached is corrupt on that machine. But the concern is all the four attempts are placed on same problematic node. 

{code}
HW12691:TEZ pjoseph$ cat application_1495721159191_10342.log | grep attempt_1495721159191_10342_6_00_001808 | grep "Assigning container"

Assigning container to task: containerId=container_1495721159191_10342_01_000395, task=attempt_1495721159191_10342_6_00_001808_0

Assigning container to task: containerId=container_1495721159191_10342_01_000397, task=attempt_1495721159191_10342_6_00_001808_1

Assigning container to task: containerId=container_1495721159191_10342_01_000399, task=attempt_1495721159191_10342_6_00_001808_2

Assigning container to task: containerId=container_1495721159191_10342_01_000401, task=attempt_1495721159191_10342_6_00_001808_3

All the four containers are placed on same nodemanager

Container: container_1495721159191_10342_01_000395 on xxx_45454
Container: container_1495721159191_10342_01_000397 on xxx_45454
Container: container_1495721159191_10342_01_000399 on xxx_45454
Container: container_1495721159191_10342_01_000401 on xxx_45454

{code}






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)