You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dian Fu (Jira)" <ji...@apache.org> on 2020/04/17 02:23:00 UTC

[jira] [Updated] (FLINK-14316) Stuck in "Job leader ... lost leadership" error

     [ https://issues.apache.org/jira/browse/FLINK-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dian Fu updated FLINK-14316:
----------------------------
    Summary: Stuck in "Job leader ... lost leadership" error  (was: stuck in "Job leader ... lost leadership" error)

> Stuck in "Job leader ... lost leadership" error
> -----------------------------------------------
>
>                 Key: FLINK-14316
>                 URL: https://issues.apache.org/jira/browse/FLINK-14316
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.7.2
>            Reporter: Steven Zhen Wu
>            Assignee: Till Rohrmann
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.9.3, 1.10.1, 1.11.0
>
>         Attachments: FLINK-14316.tgz, RpcConnection.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is the first exception caused restart loop. Later exceptions are the same. Job seems to stuck in this permanent failure state.
> {code}
> 2019-10-03 21:42:46,159 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: clpevents -> device_filter -> processed_imps -> ios_processed_impression -> i
> mps_ts_assigner (449/1360) (d237f5e99b6a4a580498821473763edb) switched from SCHEDULED to FAILED.
> java.lang.Exception: Job leader for job id ecb9ad9be934edf7b1a4f7b9dd6df365 lost leadership.
>         at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$1(TaskExecutor.java:1526)
>         at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
>         at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
>         at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
>         at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>         at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)