You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by 潘明文 <pa...@163.com> on 2022/03/07 02:18:28 UTC

io.network.netty.exception

HI 读kafka,入hbase和kafka
flink任务经常性报错

org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'cdh02/xxx:42892'. This might indicate that the remote task manager was lost.

Re: Re: io.network.netty.exception

Posted by yue ma <ma...@gmail.com>.
hi , 解决这个问题需要对症下药,刚刚上面的回答也说到了,导致这个问题的原因很多,比如 gc
、网络原因等等。我觉得可以先看相关日志看到定位具体是什么原因,然后再看如何解决。比如 gc 问题 我们可以加大内存,或者优化代码等等


潘明文 <pa...@163.com> 于2022年3月8日周二 09:24写道:

> HI ,
>   谢谢,有没有好的解决方案解决该问题呀?
>
>
>
>
>
>
>
>
>
>
>
> 在 2022-03-08 02:20:57,"Zhilong Hong" <zh...@gmail.com> 写道:
> >Hi, 明文:
> >
>
> >这个报错实际上是TM失联,一般是TM被kill导致的,可以根据TM的Flink日志和GC日志、集群层面的NM日志(YARN环境)或者是K8S日志查看TM被kill的原因。一般情况下可能是:gc时间过长导致TM心跳超时被kill、TM内存超用导致container/pod被kill等等。
> >
> >Best.
> >Zhilong
> >
> >On Mon, Mar 7, 2022 at 10:18 AM 潘明文 <pa...@163.com> wrote:
> >
> >> HI 读kafka,入hbase和kafka
> >> flink任务经常性报错
> >>
> >> org.apache.flink.runtime.io
> .network.netty.exception.RemoteTransportException:
> >> Connection unexpectedly closed by remote task manager 'cdh02/xxx:42892'.
> >> This might indicate that the remote task manager was lost.
>

Re:Re: io.network.netty.exception

Posted by 潘明文 <pa...@163.com>.
HI ,
  谢谢,有没有好的解决方案解决该问题呀?











在 2022-03-08 02:20:57,"Zhilong Hong" <zh...@gmail.com> 写道:
>Hi, 明文:
>
>这个报错实际上是TM失联,一般是TM被kill导致的,可以根据TM的Flink日志和GC日志、集群层面的NM日志(YARN环境)或者是K8S日志查看TM被kill的原因。一般情况下可能是:gc时间过长导致TM心跳超时被kill、TM内存超用导致container/pod被kill等等。
>
>Best.
>Zhilong
>
>On Mon, Mar 7, 2022 at 10:18 AM 潘明文 <pa...@163.com> wrote:
>
>> HI 读kafka,入hbase和kafka
>> flink任务经常性报错
>>
>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>> Connection unexpectedly closed by remote task manager 'cdh02/xxx:42892'.
>> This might indicate that the remote task manager was lost.

Re: io.network.netty.exception

Posted by Zhilong Hong <zh...@gmail.com>.
Hi, 明文:

这个报错实际上是TM失联,一般是TM被kill导致的,可以根据TM的Flink日志和GC日志、集群层面的NM日志(YARN环境)或者是K8S日志查看TM被kill的原因。一般情况下可能是:gc时间过长导致TM心跳超时被kill、TM内存超用导致container/pod被kill等等。

Best.
Zhilong

On Mon, Mar 7, 2022 at 10:18 AM 潘明文 <pa...@163.com> wrote:

> HI 读kafka,入hbase和kafka
> flink任务经常性报错
>
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager 'cdh02/xxx:42892'.
> This might indicate that the remote task manager was lost.