You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Xiao Chen (JIRA)" <ji...@apache.org> on 2017/08/05 04:03:00 UTC

[jira] [Comment Edited] (HADOOP-14727) Socket not closed properly when reading Configurations with BlockReaderRemote

    [ https://issues.apache.org/jira/browse/HADOOP-14727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115260#comment-16115260 ] 

Xiao Chen edited comment on HADOOP-14727 at 8/5/17 4:02 AM:
------------------------------------------------------------

+1. Checkstyles are related, but they're only 80-chars and could be fixed during commit IMO.
branch-2 backport's conflicts are also trivial, import-only.

I'll commit to trunk and branch-2 on Monday morning in case Steve and others want to comment. :)


was (Author: xiaochen):
+1. Checkstyles are related, but they're only 80-chars and could be fixed during commit.

> Socket not closed properly when reading Configurations with BlockReaderRemote
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-14727
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14727
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 2.9.0, 3.0.0-alpha4
>            Reporter: Xiao Chen
>            Assignee: Jonathan Eagles
>            Priority: Blocker
>         Attachments: HADOOP-14727.001-branch-2.patch, HADOOP-14727.001.patch, HADOOP-14727.002.patch
>
>
> This is caught by Cloudera's internal testing over the alpha4 release.
> We got reports that some hosts ran out of FDs. Triaging that, found out both oozie server and Yarn JobHistoryServer have tons of sockets on {{CLOSE_WAIT}} state.
> [~haibochen] helped narrow down to a consistent reproduction by simply visiting the JHS web UI, and clicking through a job and its logs.
> I then look at the {{BlockReaderRemote}} and related code, and didn't spot any leaks in the implementation. After adding a debug log whenever a {{Peer}} is created/closed/in/out {{PeerCache}}, it looks like all the {{CLOSE_WAIT}} sockets are created from this call stack:
> {noformat}
> 2017-08-02 13:58:59,901 INFO org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: ____ associated peer NioInetPeer(Socket[addr=/10.17.196.28,port=20002,localport=42512]) with blockreader org.apache.hadoop.hdfs.client.impl.BlockReaderRemote@717ce109
> java.lang.Exception: test
>         at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:745)
>         at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385)
>         at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:636)
>         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
>         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:807)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at com.ctc.wstx.io.StreamBootstrapper.ensureLoaded(StreamBootstrapper.java:482)
>         at com.ctc.wstx.io.StreamBootstrapper.resolveStreamEncoding(StreamBootstrapper.java:306)
>         at com.ctc.wstx.io.StreamBootstrapper.bootstrapInput(StreamBootstrapper.java:167)
>         at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:573)
>         at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:633)
>         at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:647)
>         at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:366)
>         at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2649)
>         at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2697)
>         at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2662)
>         at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2545)
>         at org.apache.hadoop.conf.Configuration.get(Configuration.java:1076)
>         at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1126)
>         at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1344)
>         at org.apache.hadoop.mapreduce.counters.Limits.init(Limits.java:45)
>         at org.apache.hadoop.mapreduce.counters.Limits.reset(Limits.java:130)
>         at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:363)
>         at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:105)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:473)
>         at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
>         at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
>         at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
>         at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
>         at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
>         at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
>         at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
>         at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>         at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>         at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
>         at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
>         at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
>         at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
>         at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:220)
>         at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.requireJob(AppController.java:416)
>         at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.attempts(AppController.java:277)
>         at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.attempts(HsController.java:152)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>         at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
>         at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
>         at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
>         at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
>         at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
>         at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
>         at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
>         at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
>         at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1552)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>         at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>         at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>         at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>         at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>         at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>         at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>         at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I was able to further confirm this theory by backing out the 4 recent commits to {{Configuration}} on alpha3 and no longer seeing {{CLOSE_WAIT}} sockets.
> - HADOOP-14501. 
> - HADOOP-14399. (only reverted to make other reverts easier)
> - HADOOP-14216. Addendum 
> - HADOOP-14216. 
> It's not clear to me who's responsible to close the InputStream though.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org