You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yang Wang (Jira)" <ji...@apache.org> on 2020/01/09 11:39:00 UTC

[jira] [Comment Edited] (FLINK-15534) YARNSessionCapacitySchedulerITCase#perJobYarnClusterWithParallelism failed due to NPE

    [ https://issues.apache.org/jira/browse/FLINK-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011620#comment-17011620 ] 

Yang Wang edited comment on FLINK-15534 at 1/9/20 11:38 AM:
------------------------------------------------------------

After diving into the Yarn code, i found that it is known bug of Yarn. See YARN-7007.

It has been fix in branch-2 of hadoop repository, however no new 2.8.x version has been released after merging. I suggest to close it, since the NPE happens in Yarn ResourceManager internally. We could not do anything in Flink.

After the new hadoop version for 2.8 is released(2.8.6), we need to bump the flink-shaded-hadoop version to 2.8.6. If we use 2.9.x and 3.x hadoop instead, it also works.


was (Author: fly_in_gis):
After diving into the Yarn code, i found that it is known bug of Yarn. See [YARN-7007|https://issues.apache.org/jira/browse/YARN-7007].

It has been fix in branch-2 of hadoop repository, however no new 2.8.x version has been released after merging. I suggest to close it, since the NPE happens in Yarn ResourceManager internally. We could not do anything in Flink. After we upgrade to the new hadoop version(2.9, 3.x), it will not be a problem.

> YARNSessionCapacitySchedulerITCase#perJobYarnClusterWithParallelism failed due to NPE
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15534
>                 URL: https://issues.apache.org/jira/browse/FLINK-15534
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Yu Li
>            Priority: Blocker
>
> As titled, travis run fails with below error:
> {code}
> 07:29:22.417 [ERROR] perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)  Time elapsed: 16.263 s  <<< ERROR!
> java.lang.NullPointerException: 
> java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
> 	at org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> Caused by: org.apache.hadoop.ipc.RemoteException: 
> java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
> 	at org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> {code}
> https://api.travis-ci.org/v3/job/634588108/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)