You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Antonio Vespoli (Jira)" <ji...@apache.org> on 2023/03/30 16:43:00 UTC

[jira] [Updated] (FLINK-31675) Deadlock in AWS Connectors following content-length AWS SDK exception

     [ https://issues.apache.org/jira/browse/FLINK-31675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antonio Vespoli updated FLINK-31675:
------------------------------------
    Description: 
Connector calls to AWS services can hang on a canceled future following a content-length mismatch that isn't handled gracefully by the SDK:

 
{code:java}
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.FutureCancelledException: java.io.IOException: Response had content-length of 31 bytes, but only received 0 bytes before the connection was closed.
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$null$3(NettyRequestExecutor.java:136)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at org.apache.flink.kinesis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at org.apache.flink.kinesis.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Response had content-length of 31 bytes, but only received 0 bytes before the connection was closed.
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.validateResponseContentLength(ResponseHandler.java:163)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$700(ResponseHandler.java:75)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:369)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.complete(HandlerPublisher.java:447)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.channelInactive(HandlerPublisher.java:430)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at org.apache.flink.kinesis.shaded.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
at org.apache.flink.kinesis.shaded.io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at org.apache.flink.kinesis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at org.apache.flink.kinesis.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
at org.apache.flink.kinesis.shaded.io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe$2.run(AbstractHttp2StreamChannel.java:737)
... 6 more {code}
Related AWS SDK issue: [https://github.com/aws/aws-sdk-java-v2/issues/3335]

AWS SDK fix: [https://github.com/aws/aws-sdk-java-v2/pull/3855/files]

 

This mishandled exception creates a deadlock situation that prevents the connectors from making any progress.

We should update the AWS SDK v2 to 2.20.32: [https://github.com/aws/aws-sdk-java-v2/commit/eb5619e24e4eaca6f80effa1c43c0cd409cdd53e]

  was:
Connector calls to AWS services can hang on a canceled future following a content-length mismatch that isn't handled gracefully by the SDK:

 
{code:java}
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.FutureCancelledException: java.io.IOException: Response had content-length of 31 bytes, but only received 0 bytes before the connection was closed.
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$null$3(NettyRequestExecutor.java:136)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at org.apache.flink.kinesis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at org.apache.flink.kinesis.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Response had content-length of 31 bytes, but only received 0 bytes before the connection was closed.
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.validateResponseContentLength(ResponseHandler.java:163)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$700(ResponseHandler.java:75)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:369)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.complete(HandlerPublisher.java:447)
at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.channelInactive(HandlerPublisher.java:430)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at org.apache.flink.kinesis.shaded.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
at org.apache.flink.kinesis.shaded.io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at org.apache.flink.kinesis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at org.apache.flink.kinesis.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
at org.apache.flink.kinesis.shaded.io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe$2.run(AbstractHttp2StreamChannel.java:737)
... 6 more {code}

Related AWS SDK issue: [https://github.com/aws/aws-sdk-java-v2/issues/3335]

AWS SDK fix: [https://github.com/aws/aws-sdk-java-v2/pull/3855/files]

 

This mishandled exception creates a deadlock situation that prevents the connectors from making any progress.

We should update the AWS SDK v2 to 20.20.32: [https://github.com/aws/aws-sdk-java-v2/commit/eb5619e24e4eaca6f80effa1c43c0cd409cdd53e]


> Deadlock in AWS Connectors following content-length AWS SDK exception
> ---------------------------------------------------------------------
>
>                 Key: FLINK-31675
>                 URL: https://issues.apache.org/jira/browse/FLINK-31675
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / AWS
>    Affects Versions: 1.17.0, 1.16.1, 1.15.4
>            Reporter: Antonio Vespoli
>            Priority: Major
>             Fix For: 1.16.2, 1.17.1, 1.15.5
>
>
> Connector calls to AWS services can hang on a canceled future following a content-length mismatch that isn't handled gracefully by the SDK:
>  
> {code:java}
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.FutureCancelledException: java.io.IOException: Response had content-length of 31 bytes, but only received 0 bytes before the connection was closed.
> at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$null$3(NettyRequestExecutor.java:136)
> at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
> at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
> at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
> at org.apache.flink.kinesis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
> at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
> at org.apache.flink.kinesis.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.io.IOException: Response had content-length of 31 bytes, but only received 0 bytes before the connection was closed.
> at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.validateResponseContentLength(ResponseHandler.java:163)
> at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$700(ResponseHandler.java:75)
> at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:369)
> at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.complete(HandlerPublisher.java:447)
> at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.channelInactive(HandlerPublisher.java:430)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
> at org.apache.flink.kinesis.shaded.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
> at org.apache.flink.kinesis.shaded.io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
> at org.apache.flink.kinesis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at org.apache.flink.kinesis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at org.apache.flink.kinesis.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
> at org.apache.flink.kinesis.shaded.io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe$2.run(AbstractHttp2StreamChannel.java:737)
> ... 6 more {code}
> Related AWS SDK issue: [https://github.com/aws/aws-sdk-java-v2/issues/3335]
> AWS SDK fix: [https://github.com/aws/aws-sdk-java-v2/pull/3855/files]
>  
> This mishandled exception creates a deadlock situation that prevents the connectors from making any progress.
> We should update the AWS SDK v2 to 2.20.32: [https://github.com/aws/aws-sdk-java-v2/commit/eb5619e24e4eaca6f80effa1c43c0cd409cdd53e]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)