You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2018/11/20 11:40:08 UTC

[GitHub] chetanmeh opened a new pull request #4126: Fix netty leak in CosmosDBArtifactStore`

chetanmeh opened a new pull request #4126: Fix netty leak in CosmosDBArtifactStore`
URL: https://github.com/apache/incubator-openwhisk/pull/4126
 
 
   Fixes a netty leak due to how query is done in `ComosDBArtifactStore`
   
   ## Description
   
   In some of the setups we are observing Netty leak warnings like 
   
   ```
   [2018-11-15T17:59:45.025Z] [ERROR] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
   ```
   
   Further in some cases Controller process got killed with `OutOfDirectMemoryError`
   
   ```
   [2018-11-16T11:08:34.576Z] [WARN] An exception was thrown by io.reactivex.netty.pipeline.ssl.SslCompletionHandler$1.operationComplete()
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 7147094023, max: 7158628352)
       at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640)
       at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594)
       at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764)
       at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740)
       at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
       at io.netty.buffer.PoolArena.allocate(PoolArena.java:226)
       at io.netty.buffer.PoolArena.allocate(PoolArena.java:146)
       at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324)
   ```
   
   Per [Netty Leak Detection][1] documentation this warning would be emitted if its resource leak detection logic (which works by sampling 1% allocations) detects a leak. 
   
   ### Root cause and Fix
   
   To determine the leak source one can set `-Dio.netty.leakDetection.level=PARANOID` and run the setup/test. After setting this property the test still passed fine and no warning was seen. Then we reduced the memory to `-Xmx32m` and re ran `CosmosDBArtifactStoreTests` and the warning showed up in 1 of 10 runs.
   
   Digging further issue was determined to be present in the way we current apply limit
   
   https://github.com/apache/incubator-openwhisk/blob/f35e23ae19b87bf6c3d27f579ca42a4da8ebb2b0/common/scala/src/main/scala/org/apache/openwhisk/core/database/cosmosdb/CosmosDBArtifactStore.scala#L221-L239
   
   Here we use the `take` operator to ensure that stream does not consume more than `limit` items. If there are more item then source is cancelled. Its so happening if we set limit=1 (which is the case for Identity/auth related queries) then this prevents proper cleanup of underlying source. 
   
   As a fix we now consume whole stream and then apply limit on the resulting list. This should be safe 
   
   1. For query related to `actovations` and `whisks` limit is made part of query also via `TOP X` clause. So underlying db does not return more than limit item
   2. For query related to subject we cannot pass `TOP` as we need to perform join which may result in more entries in output. But here the result set would not be more than 1 in most cases
   
   ### Test Setup
   
   To recreate the issuse a new test `CosmosDBLeakTests` is added which repeatedly performs the auth query multiple times. This test fails much often when running within IDE (with `-Xmx20m`) however I was not able to make it fail when running via command line. Post fix test consistently passes in IDE also. In case of failure it dumps the stack trace where the leaked resource was accessed
   
   ```
   [2018-11-19T12:48:34.810Z] [ERROR] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
   Recent access records: 
   #1:
       io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88)
       io.reactivex.netty.protocol.http.client.ClientRequestResponseConverter.invokeContentOnNext(ClientRequestResponseConverter.java:274)
       io.reactivex.netty.protocol.http.client.ClientRequestResponseConverter.channelRead(ClientRequestResponseConverter.java:162)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
       io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
       io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
       io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1389)
       io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1159)
       io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1203)
       io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
       io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
       io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
       io.reactivex.netty.metrics.BytesInspector.channelRead(BytesInspector.java:59)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.reactivex.netty.pipeline.InternalReadTimeoutHandler.channelRead(InternalReadTimeoutHandler.java:108)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
       io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
       io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
       io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945)
       io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:146)
       io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
       io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
       io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
       io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
       io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
       io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
       java.lang.Thread.run(Thread.java:748)
   ```
   
   The stack trace for now does not provide much clue as why stream cancellation lead to leak.
   
   This PR also introduces some leak detection logic in `CosmosDBArtifactStoreTests` which for now work fine with and without fix. It may help in surfacing such issues later
   
   
   ## Related issue and scope
   <!--- Please include a link to a related issue if there is one. -->
   - [ ] I opened an issue to propose and discuss this change (#????)
   
   ## My changes affect the following components
   <!--- Select below all system components are affected by your change. -->
   <!--- Enter an `x` in all applicable boxes. -->
   - [ ] API
   - [x] Controller
   - [ ] Message Bus (e.g., Kafka)
   - [ ] Loadbalancer
   - [ ] Invoker
   - [ ] Intrinsic actions (e.g., sequences, conductors)
   - [x] Data stores (e.g., CouchDB)
   - [ ] Tests
   - [ ] Deployment
   - [ ] CLI
   - [ ] General tooling
   - [ ] Documentation
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Use `x` in all the boxes that apply: -->
   - [x] Bug fix (generally a non-breaking change which closes an issue).
   - [ ] Enhancement or new feature (adds new functionality).
   - [ ] Breaking change (a bug fix or enhancement which changes existing behavior).
   
   ## Checklist:
   <!--- Please review the points below which help you make sure you've covered all aspects of the change you're making. -->
   
   - [x] I signed an [Apache CLA](https://github.com/apache/incubator-openwhisk/blob/master/CONTRIBUTING.md).
   - [x] I reviewed the [style guides](https://github.com/apache/incubator-openwhisk/wiki/Contributing:-Git-guidelines#code-readiness) and followed the recommendations (Travis CI will check :).
   - [x] I added tests to cover my changes.
   - [ ] My changes require further changes to the documentation.
   - [ ] I updated the documentation where necessary.
   
   [1]: https://netty.io/wiki/reference-counted-objects.html#troubleshooting-buffer-leaks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services