You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Alejandro Abdelnur (Created) (JIRA)" <ji...@apache.org> on 2012/04/05 11:23:33 UTC

[jira] [Created] (MAPREDUCE-4109) availability of a job info in HS should be atomic

availability of a job info in HS should be atomic
-------------------------------------------------

                 Key: MAPREDUCE-4109
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster, jobhistoryserver, mrv2
    Affects Versions: 2.0.0
            Reporter: Alejandro Abdelnur
            Priority: Blocker
             Fix For: 2.0.0


It seems that the HS starts serving info about a job before it has all the info available.

In the trace below, a RunningJob throws a NPE when trying to access the counters.

This is happening on & off, thus I assume it is related to either the AM not flushing all job info to HDFS before notifying HS or the HS not loading all the job info from HDFS before start serving it.

In case it helps to diagnose the issue, this is happening in a secure cluster.

This makes Oozie to mark jobs as failed.

{code}
java.lang.NullPointerException
	at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214)
	at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149)
	at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206)
	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654)
 at LocalTrace: 
	org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163)
	at $Proxy31.getCounters(Unknown Source)
	at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325)
	at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472)
	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714)
	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
	at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711)
	at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
	at org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296)
	at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886)
	at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162)
	at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51)
	at org.apache.oozie.command.XCommand.call(XCommand.java:260)
	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)
{code}
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-4109) availability of a job info in HS should be atomic

Posted by "Alejandro Abdelnur (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur resolved MAPREDUCE-4109.
-------------------------------------------

    Resolution: Invalid

after looking at the code my assumptions proven incorrect, it is not possible for such scenario.

What may be happening is MAPREDUCE-3972.
                
> availability of a job info in HS should be atomic
> -------------------------------------------------
>
>                 Key: MAPREDUCE-4109
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, jobhistoryserver, mrv2
>    Affects Versions: 2.0.0
>            Reporter: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> It seems that the HS starts serving info about a job before it has all the info available.
> In the trace below, a RunningJob throws a NPE when trying to access the counters.
> This is happening on & off, thus I assume it is related to either the AM not flushing all job info to HDFS before notifying HS or the HS not loading all the job info from HDFS before start serving it.
> In case it helps to diagnose the issue, this is happening in a secure cluster.
> This makes Oozie to mark jobs as failed.
> {code}
> java.lang.NullPointerException
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214)
> 	at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149)
> 	at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206)
> 	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654)
>  at LocalTrace: 
> 	org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
> 	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163)
> 	at $Proxy31.getCounters(Unknown Source)
> 	at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:616)
> 	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
> 	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325)
> 	at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472)
> 	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714)
> 	at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:416)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> 	at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711)
> 	at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
> 	at org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296)
> 	at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886)
> 	at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162)
> 	at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:260)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:679)
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira