You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2012/06/20 22:25:42 UTC

[jira] [Commented] (MAPREDUCE-4300) OOM in AM can turn it into a zombie.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397833#comment-13397833 ] 

Robert Joseph Evans commented on MAPREDUCE-4300:
------------------------------------------------

I would like some feedback on this.  I see several different threads that got OOM errors in them.  None of them handled the errors gracefully and shut down.  Arguably I am not sure that the Metrics System, HDFS library, or the default speculator thread should try to shut down the process in such a case.  I personally would prefer to see us install a defaultUncaughtExceptionHandler for all threads.  If a thread throws a Throwable that is not caught implying that we did not expect to see it we can try and shut down the process cleanly.

I don't really want to start putting catch(Throwable t) everywhere it is too easy to miss a single thread and then be back in the same boat.
                
> OOM in AM can turn it into a zombie.
> ------------------------------------
>
>                 Key: MAPREDUCE-4300
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>
> It looks like 4 threads in the AM died with OOM but not the one pinging the RM.
> stderr for this AM
> {noformat}
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
> May 30, 2012 4:49:55 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
> WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
> May 30, 2012 4:49:55 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM'
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
> May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
> May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
> Exception in thread "ResponseProcessor for block BP-1114822160-<IP>-1322528669066:blk_-6528896407411719649_34227308" java.lang.OutOfMemoryError: Java heap space
> 	at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538)
> 	at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)
> 	at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201)
> 	at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738)
> 	at org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> Exception in thread "DefaultSpeculator background processing" java.lang.OutOfMemoryError: Java heap space
> 	at java.util.HashMap.resize(HashMap.java:462)
> 	at java.util.HashMap.addEntry(HashMap.java:755)
> 	at java.util.HashMap.put(HashMap.java:385)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176)
> 	at java.lang.Thread.run(Thread.java:619)
> Exception in thread "Timer for 'MRAppMaster' metrics system" java.lang.OutOfMemoryError: Java heap space
> Exception in thread "Socket Reader #4 for port 50500" java.lang.OutOfMemoryError: Java heap space
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira