You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <se...@james.apache.org> on 2021/12/06 07:23:00 UTC

[jira] [Closed] (JAMES-3676) S3 blobStore leaks HTTP connexions when reading streams partially (even when closing them))

     [ https://issues.apache.org/jira/browse/JAMES-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benoit Tellier closed JAMES-3676.
---------------------------------
    Resolution: Fixed

> S3 blobStore leaks HTTP connexions when reading streams partially (even when closing them))
> -------------------------------------------------------------------------------------------
>
>                 Key: JAMES-3676
>                 URL: https://issues.apache.org/jira/browse/JAMES-3676
>             Project: James Server
>          Issue Type: Improvement
>          Components: Blob
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Priority: Major
>              Labels: bug
>             Fix For: 3.7.0, 3.6.1
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. Symptoms
> Recently I found those delightful logs on one of my production environment:
> {code:java}
> 	
> {
>   "timestamp": "2021-11-25T04:19:56.897Z",
>   "level": "ERROR",
>   "thread": "imapserver-executor-560",
>   "mdc": {
>     "fetchedData": "FetchData{flags=false, uid=true, internalDate=false, size=true, envelope=false, body=false, bodyStructure=false, setSeen=false, bodyElements=[BodyFetchElement{firstOctet=null, numberOfOctets=null, name=BODY[], sectionType=CONTENT, fieldNames=Optional.empty}], modSeq=false, changedSince=-1, vanished=false}",
>     "protocol": "IMAP",
>     "useUid": "true",
>     "action": "FETCH",
>     "sessionId": "SID-cdkbuyectzcc",
>     "user": "xxx@linagora.com"
>   },
>   "logger": "org.apache.james.imap.processor.AbstractMailboxProcessor",
>   "message": "Unexpected error during IMAP processing",
>   "context": "default",
>   "exception": "software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.\nConsider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.\nIncreasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.\nIf the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.\n\tat software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)\n\tat software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)\n\tat software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)\n\tat software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200)\n\tat software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179)\n\tat software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)\n\tat java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)\n\tat software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76)\n\tat java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)\n\tat software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:104)\n\tat java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)\n\tat software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$WrappedErrorForwardingResponseHandler.onError(MakeAsyncHttpRequestStage.java:158)\n\tat software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:300)\n\tat software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:172)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)\n\tat io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35)\n\tat io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502)\n\tat io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Unknown Source)\nCaused by: java.lang.Throwable: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.\nConsider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.\nIncreasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.\nIf the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.\n\tat software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.decorateException(NettyRequestExecutor.java:306)\n\tat software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:299)\n\t... 11 common frames omitted\nCaused by: java.util.concurrent.TimeoutException: Acquire operation took longer than 10000 milliseconds.\n\tat software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77)\n\tat software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67)\n\tat org.glowroot.agent.plugin.executor.RunnableWrapper.run(RunnableWrapper.java:35)\n\tat io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)\n\t... 6 common frames omitted\n"
> }
> {code}
> TLDR the S3 driver cannot get HTTP connexion from the pool and fails randomly, which is very inconvenient.
> I attempted a reboot, which temporarily solved the issue but we eventually run out again of HTTP connections after some time...
> h3. Diagnostic
> This sounds like a leak to me!
> I quickly wrote a unit test reproducing the exact same issue by partially reading  many blobs. Closing the resulting or not proved to have no effect. 
> This pointed me to a defect within ReactorUtils.toInputStream was not cancelling the related publisher when closing the stream: only full consumption was releasing the subscription! Once again an easy unit test properly demonstrate the issue.
> Propagating the "close" from the SequencedInputStream back to the subscribed publisher cannot be done with an iterator intermediate representation, but adopting a Java8 Stream instead allows to propagate closure and cancel the subscription if need be. This approach solves both failing tests.
> Possible causes that "revealed" this issue includes:
>   - S3 driver update <3
>   - Additionnal usage of the BlobStore stream API for JMAP blob uploads?
> h3. Complementary work
> We might benefit to invest into tools for automatic leak detections.
> I found some java agents to do so: https://file-leak-detector.kohsuke.org/ could do the job!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org