You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by SmileSmile <a5...@163.com> on 2022/07/18 12:40:59 UTC

flink on yarn 作业挂掉反复重启

hi,all
遇到这种场景,flink on yarn,并行度3000的场景下,作业包含了多个agg操作,作业recover from checkpoint 或者savepoint必现无法恢复的情况,作业反复重启
jm报错org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED S
IGNAL 15: SIGTERM. Shutting down as requested.

请问有什么好的排查思路吗





Re: flink on yarn 作业挂掉反复重启

Posted by Weihua Hu <hu...@gmail.com>.
可以检查下是不是 JobManager 内存不足被 OOM kill 了,如果有更多的日志也可以贴出来

Best,
Weihua


On Mon, Jul 18, 2022 at 8:41 PM SmileSmile <a5...@163.com> wrote:

> hi,all
> 遇到这种场景,flink on yarn,并行度3000的场景下,作业包含了多个agg操作,作业recover from checkpoint
> 或者savepoint必现无法恢复的情况,作业反复重启
> jm报错org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
> RECEIVED S
> IGNAL 15: SIGTERM. Shutting down as requested.
>
> 请问有什么好的排查思路吗
>
>
>
>
>