You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Xiaolong Wang <xi...@smartnews.com> on 2022/05/11 11:17:43 UTC

Flink on Native K8s jobs turn in to `SUSPENDED` status unexpectedly.

Hello,

Recently our Flink jobs on Native K8s encountered failing in the
`SUSPENDED` status and got restarted for no reason.

Flink version: 1.13.2

Logs:
```
2022-05-11 05:01:41

2022-05-10 21:01:41,771 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 17921 (type=CHECKPOINT) @ 1652216501302 for job
00000000000000000000000000000000.\n
2022-05-11 05:01:43

2022-05-10 21:01:42,860 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 17921 for job 00000000000000000000000000000000 (11840 bytes in
866 ms).\n
2022-05-11 05:04:34

2022-05-10 21:04:34,550 INFO
org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating a
new watch on TaskManager pods.\n
2022-05-11 05:06:43

2022-05-10 21:06:43,512 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 17922 (type=CHECKPOINT) @ 1652216802860 for job
00000000000000000000000000000000.\n
2022-05-11 05:06:44

2022-05-10 21:06:44,441 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 17922 for job 00000000000000000000000000000000 (11840 bytes in
977 ms).\n
2022-05-11 05:11:45

2022-05-10 21:11:44,826 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 17923 (type=CHECKPOINT) @ 1652217104441 for job
00000000000000000000000000000000.\n
2022-05-11 05:11:45

2022-05-10 21:11:45,537 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 17923 for job 00000000000000000000000000000000 (11840 bytes in
646 ms).\n
2022-05-11 05:12:36

2022-05-10 21:12:36,746 INFO
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
[] - Stopping SessionDispatcherLeaderProcess.\n
2022-05-11 05:12:36

2022-05-10 21:12:36,747 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping
dispatcher akka.tcp://flink@10.2.70.34:6123/user/rpc/dispatcher_1.\n
2022-05-11 05:12:36

2022-05-10 21:12:36,747 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping all
currently running jobs of dispatcher akka.tcp://
flink@10.2.70.34:6123/user/rpc/dispatcher_1.\n
2022-05-11 05:12:36

2022-05-10 21:12:36,749 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Stopping the JobMaster for job
insert-into_default_catalog.default_database.sn_fstore_location_cluster_raw_scylla_sink(00000000000000000000000000000000).\n
2022-05-11 05:12:36

2022-05-10 21:12:36,752 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job
00000000000000000000000000000000 reached terminal state SUSPENDED.\n
2022-05-11 05:12:36

2022-05-10 21:12:36,752 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job
insert-xxx_sink (00000000000000000000000000000000) switched from state
RUNNING to SUSPENDED.\n
2022-05-11 05:12:36

org.apache.flink.util.FlinkException: Scheduler is being stopped.\n
2022-05-11 05:12:36

at
org.apache.flink.runtime.scheduler.SchedulerBase.closeAsync(SchedulerBase.java:607)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at
org.apache.flink.runtime.jobmaster.JobMaster.stopScheduling(JobMaster.java:962)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at
org.apache.flink.runtime.jobmaster.JobMaster.stopJobExecution(JobMaster.java:926)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:398)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.actor.ActorCell.invoke(ActorCell.scala:561)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.dispatch.Mailbox.run(Mailbox.scala:225)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n
2022-05-11 05:12:36

at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[flink-dist_2.11-1.13.2.jar:1.13.2]\n"
...
```

What does the  state `SUSPENDED` mean ? And what may possibly cause this
issue ?

Moreover, I described the jobmanager pod, and got this:
```

    Last State:     Terminated

      Reason:       Error

      Exit Code:    239

      Started:      Tue, 10 May 2022 22:59:42 +0800

      Finished:     Wed, 11 May 2022 05:12:42 +0800

```


Here, what does `Exit Code: 239` mean ?

Thanks in advanced,

Yours.

Re: Flink on Native K8s jobs turn in to `SUSPENDED` status unexpectedly.

Posted by Yang Wang <da...@gmail.com>.
It will help a lot if you could share the logs of JobManager and
TaskManager for the unexpected `SUSPENDED` job.

Best,
Yang


Xiaolong Wang <xi...@smartnews.com> 于2022年5月16日周一 13:30写道:

> Sorry for the late reply.
>
> I checked the logs in both jobmanager & taskmanager.
>
> During that time, there were no more logs there.
>
> How can I reproduce the issue ?
>
> On Thu, May 12, 2022 at 10:35 AM Yang Wang <da...@gmail.com> wrote:
>
>> The SUSPENDED state is usually caused by lost leadership. Maybe you could
>> find more information about leader in the JobManager and TaskManager logs.
>>
>> Best,
>> Yang
>>
>> Xiaolong Wang <xi...@smartnews.com> 于2022年5月11日周三 19:18写道:
>>
>>> Hello,
>>>
>>> Recently our Flink jobs on Native K8s encountered failing in the
>>> `SUSPENDED` status and got restarted for no reason.
>>>
>>> Flink version: 1.13.2
>>>
>>> Logs:
>>> ```
>>> 2022-05-11 05:01:41
>>>
>>> 2022-05-10 21:01:41,771 INFO
>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
>>> checkpoint 17921 (type=CHECKPOINT) @ 1652216501302 for job
>>> 00000000000000000000000000000000.\n
>>> 2022-05-11 05:01:43
>>>
>>> 2022-05-10 21:01:42,860 INFO
>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
>>> checkpoint 17921 for job 00000000000000000000000000000000 (11840 bytes in
>>> 866 ms).\n
>>> 2022-05-11 05:04:34
>>>
>>> 2022-05-10 21:04:34,550 INFO
>>> org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating a
>>> new watch on TaskManager pods.\n
>>> 2022-05-11 05:06:43
>>>
>>> 2022-05-10 21:06:43,512 INFO
>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
>>> checkpoint 17922 (type=CHECKPOINT) @ 1652216802860 for job
>>> 00000000000000000000000000000000.\n
>>> 2022-05-11 05:06:44
>>>
>>> 2022-05-10 21:06:44,441 INFO
>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
>>> checkpoint 17922 for job 00000000000000000000000000000000 (11840 bytes in
>>> 977 ms).\n
>>> 2022-05-11 05:11:45
>>>
>>> 2022-05-10 21:11:44,826 INFO
>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
>>> checkpoint 17923 (type=CHECKPOINT) @ 1652217104441 for job
>>> 00000000000000000000000000000000.\n
>>> 2022-05-11 05:11:45
>>>
>>> 2022-05-10 21:11:45,537 INFO
>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
>>> checkpoint 17923 for job 00000000000000000000000000000000 (11840 bytes in
>>> 646 ms).\n
>>> 2022-05-11 05:12:36
>>>
>>> 2022-05-10 21:12:36,746 INFO
>>> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
>>> [] - Stopping SessionDispatcherLeaderProcess.\n
>>> 2022-05-11 05:12:36
>>>
>>> 2022-05-10 21:12:36,747 INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping
>>> dispatcher akka.tcp://flink@10.2.70.34:6123/user/rpc/dispatcher_1.\n
>>> 2022-05-11 05:12:36
>>>
>>> 2022-05-10 21:12:36,747 INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping all
>>> currently running jobs of dispatcher akka.tcp://
>>> flink@10.2.70.34:6123/user/rpc/dispatcher_1.\n
>>> 2022-05-11 05:12:36
>>>
>>> 2022-05-10 21:12:36,749 INFO
>>> org.apache.flink.runtime.jobmaster.JobMaster [] - Stopping the JobMaster
>>> for job
>>> insert-into_default_catalog.default_database.sn_fstore_location_cluster_raw_scylla_sink(00000000000000000000000000000000).\n
>>> 2022-05-11 05:12:36
>>>
>>> 2022-05-10 21:12:36,752 INFO
>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job
>>> 00000000000000000000000000000000 reached terminal state SUSPENDED.\n
>>> 2022-05-11 05:12:36
>>>
>>> 2022-05-10 21:12:36,752 INFO
>>> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job
>>> insert-xxx_sink (00000000000000000000000000000000) switched from state
>>> RUNNING to SUSPENDED.\n
>>> 2022-05-11 05:12:36
>>>
>>> org.apache.flink.util.FlinkException: Scheduler is being stopped.\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.scheduler.SchedulerBase.closeAsync(SchedulerBase.java:607)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.jobmaster.JobMaster.stopScheduling(JobMaster.java:962)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.jobmaster.JobMaster.stopJobExecution(JobMaster.java:926)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:398)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
>>> 2022-05-11 05:12:36
>>>
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> [flink-dist_2.11-1.13.2.jar:1.13.2]\n"
>>> ...
>>> ```
>>>
>>> What does the  state `SUSPENDED` mean ? And what may possibly cause this
>>> issue ?
>>>
>>> Moreover, I described the jobmanager pod, and got this:
>>> ```
>>>
>>>     Last State:     Terminated
>>>
>>>       Reason:       Error
>>>
>>>       Exit Code:    239
>>>
>>>       Started:      Tue, 10 May 2022 22:59:42 +0800
>>>
>>>       Finished:     Wed, 11 May 2022 05:12:42 +0800
>>>
>>> ```
>>>
>>>
>>> Here, what does `Exit Code: 239` mean ?
>>>
>>> Thanks in advanced,
>>>
>>> Yours.
>>>
>>

Re: Flink on Native K8s jobs turn in to `SUSPENDED` status unexpectedly.

Posted by Yang Wang <da...@gmail.com>.
The SUSPENDED state is usually caused by lost leadership. Maybe you could
find more information about leader in the JobManager and TaskManager logs.

Best,
Yang

Xiaolong Wang <xi...@smartnews.com> 于2022年5月11日周三 19:18写道:

> Hello,
>
> Recently our Flink jobs on Native K8s encountered failing in the
> `SUSPENDED` status and got restarted for no reason.
>
> Flink version: 1.13.2
>
> Logs:
> ```
> 2022-05-11 05:01:41
>
> 2022-05-10 21:01:41,771 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
> checkpoint 17921 (type=CHECKPOINT) @ 1652216501302 for job
> 00000000000000000000000000000000.\n
> 2022-05-11 05:01:43
>
> 2022-05-10 21:01:42,860 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
> checkpoint 17921 for job 00000000000000000000000000000000 (11840 bytes in
> 866 ms).\n
> 2022-05-11 05:04:34
>
> 2022-05-10 21:04:34,550 INFO
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating a
> new watch on TaskManager pods.\n
> 2022-05-11 05:06:43
>
> 2022-05-10 21:06:43,512 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
> checkpoint 17922 (type=CHECKPOINT) @ 1652216802860 for job
> 00000000000000000000000000000000.\n
> 2022-05-11 05:06:44
>
> 2022-05-10 21:06:44,441 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
> checkpoint 17922 for job 00000000000000000000000000000000 (11840 bytes in
> 977 ms).\n
> 2022-05-11 05:11:45
>
> 2022-05-10 21:11:44,826 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
> checkpoint 17923 (type=CHECKPOINT) @ 1652217104441 for job
> 00000000000000000000000000000000.\n
> 2022-05-11 05:11:45
>
> 2022-05-10 21:11:45,537 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
> checkpoint 17923 for job 00000000000000000000000000000000 (11840 bytes in
> 646 ms).\n
> 2022-05-11 05:12:36
>
> 2022-05-10 21:12:36,746 INFO
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
> [] - Stopping SessionDispatcherLeaderProcess.\n
> 2022-05-11 05:12:36
>
> 2022-05-10 21:12:36,747 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping
> dispatcher akka.tcp://flink@10.2.70.34:6123/user/rpc/dispatcher_1.\n
> 2022-05-11 05:12:36
>
> 2022-05-10 21:12:36,747 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping all
> currently running jobs of dispatcher akka.tcp://
> flink@10.2.70.34:6123/user/rpc/dispatcher_1.\n
> 2022-05-11 05:12:36
>
> 2022-05-10 21:12:36,749 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Stopping the JobMaster for job
> insert-into_default_catalog.default_database.sn_fstore_location_cluster_raw_scylla_sink(00000000000000000000000000000000).\n
> 2022-05-11 05:12:36
>
> 2022-05-10 21:12:36,752 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job
> 00000000000000000000000000000000 reached terminal state SUSPENDED.\n
> 2022-05-11 05:12:36
>
> 2022-05-10 21:12:36,752 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job
> insert-xxx_sink (00000000000000000000000000000000) switched from state
> RUNNING to SUSPENDED.\n
> 2022-05-11 05:12:36
>
> org.apache.flink.util.FlinkException: Scheduler is being stopped.\n
> 2022-05-11 05:12:36
>
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.closeAsync(SchedulerBase.java:607)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at
> org.apache.flink.runtime.jobmaster.JobMaster.stopScheduling(JobMaster.java:962)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at
> org.apache.flink.runtime.jobmaster.JobMaster.stopJobExecution(JobMaster.java:926)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:398)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n
> 2022-05-11 05:12:36
>
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [flink-dist_2.11-1.13.2.jar:1.13.2]\n"
> ...
> ```
>
> What does the  state `SUSPENDED` mean ? And what may possibly cause this
> issue ?
>
> Moreover, I described the jobmanager pod, and got this:
> ```
>
>     Last State:     Terminated
>
>       Reason:       Error
>
>       Exit Code:    239
>
>       Started:      Tue, 10 May 2022 22:59:42 +0800
>
>       Finished:     Wed, 11 May 2022 05:12:42 +0800
>
> ```
>
>
> Here, what does `Exit Code: 239` mean ?
>
> Thanks in advanced,
>
> Yours.
>