You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Chen Qin <qi...@gmail.com> on 2017/12/09 19:35:24 UTC

ayncIO & TM akka response

Hi there,

In recent, our production fink jobs observed some weird performance issue.
When job tailing kafka source failed and try to catch up, asyncIO after
event trigger get much higher load on task thread. Since each TM allocated
two virtual CPU in docker, my assumption was akka message between JM and TM
shouldn't be impacted.

What I observed was TM get closed and keep restart with same error message
below. Any suggestion is appreciated!


> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager '
> ​xxxxxxx
> /
> ​xxxxxxx
> :5841'. This might indicate that the remote task manager was lost.
> at org.apache.flink.runtime.io.network.netty.
> PartitionRequestClientHandler.channelInactive(
> PartitionRequestClientHandler.java:115)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(
> AbstractChannelHandlerContext.java:237)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(
> AbstractChannelHandlerContext.java:223)
> at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(
> ChannelInboundHandlerAdapter.java:75)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(
> AbstractChannelHandlerContext.java:237)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(
> AbstractChannelHandlerContext.java:223)
> at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(
> ByteToMessageDecoder.java:294)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(
> AbstractChannelHandlerContext.java:237)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(
> AbstractChannelHandlerContext.java:223)
> at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(
> DefaultChannelPipeline.java:829)
> at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(
> AbstractChannel.java:610)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
> SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:748)


​Chen​

Re: ayncIO & TM akka response

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Hi,

Please search the task manager logs for the potential reason of failure/disconnecting around the time when you got this error on the job manager. There should be some clearly visible exception. 

Thanks, Piotrek

> On 9 Dec 2017, at 20:35, Chen Qin <qi...@gmail.com> wrote:
> 
> Hi there,
> 
> In recent, our production fink jobs observed some weird performance issue. When job tailing kafka source failed and try to catch up, asyncIO after event trigger get much higher load on task thread. Since each TM allocated two virtual CPU in docker, my assumption was akka message between JM and TM shouldn't be impacted.
> 
> What I observed was TM get closed and keep restart with same error message below. Any suggestion is appreciated!
> 
> 
> org.apache.flink.runtime.io <http://org.apache.flink.runtime.io/>.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager '​xxxxxxx/​xxxxxxx:5841'. This might indicate that the remote task manager was lost.
> at org.apache.flink.runtime.io <http://org.apache.flink.runtime.io/>.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:115)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
> at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
> at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
> at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829)
> at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:748)
> 
> ​Chen​