You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2023/02/13 13:17:00 UTC

[jira] [Updated] (TEZ-4475) VertexStatus is missing in TestLocalMode if DAG finishes too early - causing NPE in unit test

     [ https://issues.apache.org/jira/browse/TEZ-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor updated TEZ-4475:
------------------------------
    Description: 
this problem is very hard to reproduce, but when I was able to do so, it was like:
{code}
2023-02-13 11:32:16,302 INFO  [DAGAppMaster Thread] app.DAGAppMaster (DAGAppMaster.java:startDAG(2545)) - Running DAG: testMultipleClientsWithoutSession2_useDfs
...
2023-02-13 11:32:16,406 INFO  [Thread-675] client.DAGClientImpl (DAGClientImpl.java:getVertexStatusInternal(280)) - getVertexStatusInternal for Sleep, dagCompleted: true, in cache: false
{code}

in this case, the latter log message was added [here|https://github.com/apache/tez/blob/e3e91a150dad44a9daa3102da04542e2e365203d/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L305] as:
{code}
    LOG.info("getVertexStatusInternal for {}, dagCompleted: {}, in cache: {}", vertexName, dagCompleted,
        cachedVertexStatus.containsKey(vertexName));
{code}

so, the dag has already completed, but there were no vertex status updates yet (cache was empty), so unit tests failed with an inconvenient NPE

this bug was always there, but got exposed by unit tests added in TEZ-4447
the easiest way to solve this is to simply wait for dag completion by a tez client call which collects vertex status as well, like: waitForCompletionWithStatusUpdates (instead of waitForCompletion)

  was:
this problem is very hard to reproduce, but when I was able to do so, it was like:
{code}
2023-02-13 11:32:16,302 INFO  [DAGAppMaster Thread] app.DAGAppMaster (DAGAppMaster.java:startDAG(2545)) - Running DAG: testMultipleClientsWithoutSession2_useDfs
...
2023-02-13 11:32:16,406 INFO  [Thread-675] client.DAGClientImpl (DAGClientImpl.java:getVertexStatusInternal(280)) - getVertexStatusInternal for Sleep, dagCompleted: true, in cache: false
{code}

in this case, the latter log message was added [here|https://github.com/apache/tez/blob/e3e91a150dad44a9daa3102da04542e2e365203d/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L305] as:
{code}
    LOG.info("getVertexStatusInternal for {}, dagCompleted: {}, in cache: {}", vertexName, dagCompleted,
        cachedVertexStatus.containsKey(vertexName));
{code}

so, the dag has already completed, but there were no vertex status updates yet (cache was empty), so unit tests failed with an inconvenient NPE

this bug was always there, but got exposed by TEZ-4447
the easiest way to solve this is to simply wait for dag completion by a tez client call which collects vertex status as well, like: waitForCompletionWithStatusUpdates (instead of waitForCompletion)


> VertexStatus is missing in TestLocalMode if DAG finishes too early - causing NPE in unit test
> ---------------------------------------------------------------------------------------------
>
>                 Key: TEZ-4475
>                 URL: https://issues.apache.org/jira/browse/TEZ-4475
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>             Fix For: 0.10.3
>
>
> this problem is very hard to reproduce, but when I was able to do so, it was like:
> {code}
> 2023-02-13 11:32:16,302 INFO  [DAGAppMaster Thread] app.DAGAppMaster (DAGAppMaster.java:startDAG(2545)) - Running DAG: testMultipleClientsWithoutSession2_useDfs
> ...
> 2023-02-13 11:32:16,406 INFO  [Thread-675] client.DAGClientImpl (DAGClientImpl.java:getVertexStatusInternal(280)) - getVertexStatusInternal for Sleep, dagCompleted: true, in cache: false
> {code}
> in this case, the latter log message was added [here|https://github.com/apache/tez/blob/e3e91a150dad44a9daa3102da04542e2e365203d/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L305] as:
> {code}
>     LOG.info("getVertexStatusInternal for {}, dagCompleted: {}, in cache: {}", vertexName, dagCompleted,
>         cachedVertexStatus.containsKey(vertexName));
> {code}
> so, the dag has already completed, but there were no vertex status updates yet (cache was empty), so unit tests failed with an inconvenient NPE
> this bug was always there, but got exposed by unit tests added in TEZ-4447
> the easiest way to solve this is to simply wait for dag completion by a tez client call which collects vertex status as well, like: waitForCompletionWithStatusUpdates (instead of waitForCompletion)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)