You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (Jira)" <ji...@apache.org> on 2022/02/24 12:33:00 UTC
[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szilard Nemeth updated YARN-10438:
----------------------------------
Description:
Here is the Exception trace which we are seeing, we are suspecting because of this exception RM is reaching in a state where it is no more allowing any new job to run on the cluster.
{code:java}
2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 8032, call Call#1463486 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 10.39.91.205:49564 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{code}
We are seeing this issue with this specific node only, we do run this cluster at a scale of around 500 nodes.
was:
Here is the Exception trace which we are seeing, we are suspecting because of this exception RM is reaching in a state where it is no more allowing any new job to run on the cluster.
{noformat}
2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 8032, call Call#1463486 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 10.39.91.205:49564 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{noformat}
We are seeing this issue with this specific node only, we do run this cluster at a scale of around 500 nodes.
> Handle null containerId in ClientRMService#getContainerReport()
> ---------------------------------------------------------------
>
> Key: YARN-10438
> URL: https://issues.apache.org/jira/browse/YARN-10438
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.2.1
> Reporter: Raghvendra Singh
> Assignee: Shubham Gupta
> Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
>
> Here is the Exception trace which we are seeing, we are suspecting because of this exception RM is reaching in a state where it is no more allowing any new job to run on the cluster.
> {code:java}
> 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 8032, call Call#1463486 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 10.39.91.205:49564 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
> {code}
> We are seeing this issue with this specific node only, we do run this cluster at a scale of around 500 nodes.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org