You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yang Wang (Jira)" <ji...@apache.org> on 2020/01/09 09:43:00 UTC

[jira] [Commented] (FLINK-15534) YARNSessionCapacitySchedulerITCase#perJobYarnClusterWithParallelism failed due to NPE

    [ https://issues.apache.org/jira/browse/FLINK-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011620#comment-17011620 ] 

Yang Wang commented on FLINK-15534:
-----------------------------------

After diving into the Yarn code, i found that it is known bug of Yarn. See [YARN-7007|https://issues.apache.org/jira/browse/YARN-7007].

It has been fix in branch-2 of hadoop repository, however no new 2.8.x version has been released after merging. I suggest to close it, since the NPE happens in Yarn ResourceManager internally. We could not do anything in Flink. After we upgrade to the new hadoop version(2.9, 3.x), it will not be a problem.

> YARNSessionCapacitySchedulerITCase#perJobYarnClusterWithParallelism failed due to NPE
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15534
>                 URL: https://issues.apache.org/jira/browse/FLINK-15534
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Yu Li
>            Priority: Blocker
>
> As titled, travis run fails with below error:
> {code}
> 07:29:22.417 [ERROR] perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)  Time elapsed: 16.263 s  <<< ERROR!
> java.lang.NullPointerException: 
> java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
> 	at org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> Caused by: org.apache.hadoop.ipc.RemoteException: 
> java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
> 	at org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> {code}
> https://api.travis-ci.org/v3/job/634588108/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)