You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2022/08/19 11:49:00 UTC

[jira] [Assigned] (TEZ-4440) When tez app run in yarn fed cluster, may throw NPE

     [ https://issues.apache.org/jira/browse/TEZ-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor reassigned TEZ-4440:
---------------------------------

    Assignee: zhengchenyu

> When tez app run in yarn fed cluster, may throw NPE
> ---------------------------------------------------
>
>                 Key: TEZ-4440
>                 URL: https://issues.apache.org/jira/browse/TEZ-4440
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> For hadoop version before YARN-8933. When tez app is running in yarn fed cluster, getAvailableResources may return null, then throw NPE.
> {code:java}
> 2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Got Error from RMClient
> java.lang.NullPointerException
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
>     at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
> 2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler Thread,5,main] threw an Exception.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException
>     at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432)
> Caused by: java.lang.NullPointerException
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
>     at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428){code}
> In yarn federatiaon, AMRMProxy connect multi-rm in async way, so AllocateResponse::getAvailableResources may return null, then throw NPE.
> In my PR, I replace Resource.Instance(0,0) to null. Because null may means yarn is busy, return 0 is reasonable. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)