You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by 陈卓宇 <25...@qq.com.INVALID> on 2022/04/20 07:05:04 UTC

The file STDOUT does not exist on the TaskExecutor 异常

大佬您好:
&nbsp;&nbsp;&nbsp;&nbsp; 小弟想问一下这个异常是什么原因产生的,对生产有何影响,如何消除

java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: The file STDOUT does not exist on the TaskExecutor.

	at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064) ~[flink-dist_2.11-1.13.1.jar:1.13.1]

	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202]

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]

	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]

Caused by: org.apache.flink.util.FlinkException: The file STDOUT does not exist on the TaskExecutor.

	... 5 more

2022-04-20 14:51:47,370 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler [] - Unhandled exception.

org.apache.flink.util.FlinkException: The file STDOUT does not exist on the TaskExecutor.

	at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064) ~[flink-dist_2.11-1.13.1.jar:1.13.1]

	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202]

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]

	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
卓宇


&nbsp;

回复: The file STDOUT does not exist on the TaskExecutor 异常

Posted by 陈卓宇 <25...@qq.com.INVALID>.
感谢您解答


陈卓宇


&nbsp;




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "user-zh"                                                                                    <zhlonghong@gmail.com&gt;;
发送时间:&nbsp;2022年4月20日(星期三) 下午4:50
收件人:&nbsp;"user-zh"<user-zh@flink.apache.org&gt;;

主题:&nbsp;Re: The file STDOUT does not exist on the TaskExecutor 异常



Hello, 卓宇:

这个是REST API的报错,说明你在Flink
Dashboard中TaskManager页面点击了Stdout选项卡,但对应的TaskManager上访问不到stdout文件,因此报错。该错误不会影响任务的正常运行,可以忽略。

Best,
Zhilong

On Wed, Apr 20, 2022 at 3:06 PM 陈卓宇 <2572805166@qq.com.invalid&gt; wrote:

&gt; 大佬您好:
&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 小弟想问一下这个异常是什么原因产生的,对生产有何影响,如何消除
&gt;
&gt; java.util.concurrent.CompletionException:
&gt; org.apache.flink.util.FlinkException: The file STDOUT does not exist on the
&gt; TaskExecutor.
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064)
&gt; ~[flink-dist_2.11-1.13.1.jar:1.13.1]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
&gt; ~[?:1.8.0_202]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
&gt; ~[?:1.8.0_202]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
&gt; ~[?:1.8.0_202]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
&gt;
&gt; Caused by: org.apache.flink.util.FlinkException: The file STDOUT does not
&gt; exist on the TaskExecutor.
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ... 5 more
&gt;
&gt; 2022-04-20 14:51:47,370 ERROR
&gt; org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler
&gt; [] - Unhandled exception.
&gt;
&gt; org.apache.flink.util.FlinkException: The file STDOUT does not exist on
&gt; the TaskExecutor.
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064)
&gt; ~[flink-dist_2.11-1.13.1.jar:1.13.1]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
&gt; ~[?:1.8.0_202]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
&gt; ~[?:1.8.0_202]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
&gt; java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
&gt; ~[?:1.8.0_202]
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
&gt; 卓宇
&gt;
&gt;
&gt; &amp;nbsp;

Re: Filnk: Job leader for job id xxxx lost leadership

Posted by huweihua <hu...@gmail.com>.
补充一个 case,检查下 hudi connector 是否使用了 OperatorCoordinator 来跟 hudi 进行一些交互,这部分操作是在 JobMaster 主线程内的,如果耗时比较长会导致 TaskManager 跟 JobMaster 断开链接.

> 2022年4月20日 下午6:42,Zhanghao Chen <zh...@outlook.com> 写道:
> 
> 出现 Job leader for job id xxxx lost 说明是 jm leader 在 zk 上的 session timeout 了。可能的原因有
> 
>  1.  JM 和 ZK 网络连接有抖动,ZK 连接进入 suspended,并且你没有配置容忍 zk 连接 suspended(1.14 及以上版本配置 high-availability.zookeeper.client.tolerate-suspended-connections 参数)或者配了但是 session timeout 时间设的太短触发丢主
>  2.  JM 确实经常挂
>  3.  JM GC 很严重,导致了和 zk 连接有问题进入 suspended 状态
> 
> Best,
> Zhanghao Chen
> ________________________________
> From: magic <gu...@foxmail.com>
> Sent: Wednesday, April 20, 2022 17:49
> To: user-zh <us...@flink.apache.org>
> Subject: Filnk: Job leader for job id xxxx lost leadership
> 
> Hi,all
> 我们在使用Flink 消费kafka数据写入hudi时,经常会报错:Job leader for job id xxxx lost&nbsp;&nbsp;leadership, 但是同集群 其他flink 任务就没问题,请教下前辈们,这是什么原因呢,感觉不太像zk的问题


Re: Filnk: Job leader for job id xxxx lost leadership

Posted by Zhanghao Chen <zh...@outlook.com>.
出现 Job leader for job id xxxx lost 说明是 jm leader 在 zk 上的 session timeout 了。可能的原因有

  1.  JM 和 ZK 网络连接有抖动,ZK 连接进入 suspended,并且你没有配置容忍 zk 连接 suspended(1.14 及以上版本配置 high-availability.zookeeper.client.tolerate-suspended-connections 参数)或者配了但是 session timeout 时间设的太短触发丢主
  2.  JM 确实经常挂
  3.  JM GC 很严重,导致了和 zk 连接有问题进入 suspended 状态

Best,
Zhanghao Chen
________________________________
From: magic <gu...@foxmail.com>
Sent: Wednesday, April 20, 2022 17:49
To: user-zh <us...@flink.apache.org>
Subject: Filnk: Job leader for job id xxxx lost leadership

Hi,all
我们在使用Flink 消费kafka数据写入hudi时,经常会报错:Job leader for job id xxxx lost&nbsp;&nbsp;leadership, 但是同集群 其他flink 任务就没问题,请教下前辈们,这是什么原因呢,感觉不太像zk的问题

Filnk: Job leader for job id xxxx lost leadership

Posted by magic <gu...@foxmail.com>.
Hi,all
我们在使用Flink 消费kafka数据写入hudi时,经常会报错:Job leader for job id xxxx lost&nbsp;&nbsp;leadership, 但是同集群 其他flink 任务就没问题,请教下前辈们,这是什么原因呢,感觉不太像zk的问题

Re: The file STDOUT does not exist on the TaskExecutor 异常

Posted by Zhilong Hong <zh...@gmail.com>.
Hello, 卓宇:

这个是REST API的报错,说明你在Flink
Dashboard中TaskManager页面点击了Stdout选项卡,但对应的TaskManager上访问不到stdout文件,因此报错。该错误不会影响任务的正常运行,可以忽略。

Best,
Zhilong

On Wed, Apr 20, 2022 at 3:06 PM 陈卓宇 <25...@qq.com.invalid> wrote:

> 大佬您好:
> &nbsp;&nbsp;&nbsp;&nbsp; 小弟想问一下这个异常是什么原因产生的,对生产有何影响,如何消除
>
> java.util.concurrent.CompletionException:
> org.apache.flink.util.FlinkException: The file STDOUT does not exist on the
> TaskExecutor.
>
>         at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064)
> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>
>         at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> ~[?:1.8.0_202]
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_202]
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_202]
>
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
>
> Caused by: org.apache.flink.util.FlinkException: The file STDOUT does not
> exist on the TaskExecutor.
>
>         ... 5 more
>
> 2022-04-20 14:51:47,370 ERROR
> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler
> [] - Unhandled exception.
>
> org.apache.flink.util.FlinkException: The file STDOUT does not exist on
> the TaskExecutor.
>
>         at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064)
> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>
>         at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> ~[?:1.8.0_202]
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_202]
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_202]
>
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
> 卓宇
>
>
> &nbsp;