You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2022/08/29 11:52:00 UTC

[jira] [Commented] (TEZ-4440) When tez app run in yarn fed cluster, may throw NPE

    [ https://issues.apache.org/jira/browse/TEZ-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597165#comment-17597165 ] 

László Bodor commented on TEZ-4440:
-----------------------------------

merged to master and pushed to branch-0.9
thanks [~zhengchenyu] for the patch!

> When tez app run in yarn fed cluster, may throw NPE
> ---------------------------------------------------
>
>                 Key: TEZ-4440
>                 URL: https://issues.apache.org/jira/browse/TEZ-4440
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>             Fix For: 0.9.3, 0.10.3
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> For hadoop version before YARN-8933. When tez app is running in yarn fed cluster, getAvailableResources may return null, then throw NPE.
> {code:java}
> 2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Got Error from RMClient
> java.lang.NullPointerException
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
>     at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
> 2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler Thread,5,main] threw an Exception.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException
>     at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432)
> Caused by: java.lang.NullPointerException
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
>     at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
>     at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428){code}
> In yarn federatiaon, AMRMProxy connect multi-rm in async way, so AllocateResponse::getAvailableResources may return null, then throw NPE.
> In my PR, I replace Resource.Instance(0,0) to null. Because null may means yarn is busy, return 0 is reasonable. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)