You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Xiangyu Su <xi...@smaato.com> on 2021/09/02 09:46:35 UTC

Job leader ... lost leadership with version 1.13.2

Hello Everyone,
Hello Till,
We upgrade flink to 1.13.2, and we were facing randomly the "Job leader ...
lost leadership" error, the job keep restarting and failing...
It behaviours like this ticket
https://issues.apache.org/jira/browse/FLINK-14316

Did anybody had same issue or any suggestions?

Best Regards,

-- 
Xiangyu Su
Java Developer
xiangyu@smaato.com

Smaato Inc.
San Francisco - New York - Hamburg - Singapore
www.smaato.com

Germany:

Barcastraße 5

22087 Hamburg

Germany
M 0049(176)43330282

The information contained in this communication may be CONFIDENTIAL and is
intended only for the use of the recipient(s) named above. If you are not
the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of its contents, is
strictly prohibited. If you have received this communication in error,
please notify the sender and delete/destroy the original message and any
copy of it from your computer or paper files.

Re: Job leader ... lost leadership with version 1.13.2

Posted by Till Rohrmann <tr...@apache.org>.
Forwarding the discussion back to the user mailing list.

On Thu, Sep 2, 2021 at 12:25 PM Till Rohrmann <tr...@apache.org> wrote:

> The stack trace looks ok. This happens whenever the leader loses
> leadership and this can have different reasons. What's more interesting is
> what happens before and after and what's happening on the system you use
> for HA (probably ZooKeeper). Maybe the connection to ZooKeeper is unstable
> or there is some other problem.
>
> Cheers,
> Till
>
> On Thu, Sep 2, 2021 at 12:20 PM Xiangyu Su <xi...@smaato.com> wrote:
>
>> Hi Till,
>> thank you very much for this fast reply!
>> This issue happens very randomly, I did some tries to reproduce that, but
>> not easy...
>> and here is the exception stacktrace from JM logs and TM logs:
>>
>> java.lang.Exception: Job leader for job id
>> 6fd38dedbca7bf65bfa57cb306930fa9 lost leadership.
>> at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$null$2(TaskExecutor.java:2189)
>> at java.util.Optional.ifPresent(Optional.java:159)
>> at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$3(TaskExecutor.java:2187)
>> at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
>> at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
>> at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> On Thu, 2 Sept 2021 at 12:14, Till Rohrmann <tr...@apache.org> wrote:
>>
>>> Hi Xiangyu,
>>>
>>> Do you have the logs of the problematic test run available? Ideally, we
>>> can enable the DEBUG log level to get some more information. I think this
>>> information would be needed to figure out the problem.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Sep 2, 2021 at 11:47 AM Xiangyu Su <xi...@smaato.com> wrote:
>>>
>>>> Hello Everyone,
>>>> Hello Till,
>>>> We upgrade flink to 1.13.2, and we were facing randomly the "Job leader
>>>> ... lost leadership" error, the job keep restarting and failing...
>>>> It behaviours like this ticket
>>>> https://issues.apache.org/jira/browse/FLINK-14316
>>>>
>>>> Did anybody had same issue or any suggestions?
>>>>
>>>> Best Regards,
>>>>
>>>> --
>>>> Xiangyu Su
>>>> Java Developer
>>>> xiangyu@smaato.com
>>>>
>>>> Smaato Inc.
>>>> San Francisco - New York - Hamburg - Singapore
>>>> www.smaato.com
>>>>
>>>> Germany:
>>>>
>>>> Barcastraße 5
>>>>
>>>> 22087 Hamburg
>>>>
>>>> Germany
>>>> M 0049(176)43330282
>>>>
>>>> The information contained in this communication may be CONFIDENTIAL and
>>>> is intended only for the use of the recipient(s) named above. If you are
>>>> not the intended recipient, you are hereby notified that any dissemination,
>>>> distribution, or copying of this communication, or any of its contents, is
>>>> strictly prohibited. If you have received this communication in error,
>>>> please notify the sender and delete/destroy the original message and any
>>>> copy of it from your computer or paper files.
>>>>
>>>
>>
>> --
>> Xiangyu Su
>> Java Developer
>> xiangyu@smaato.com
>>
>> Smaato Inc.
>> San Francisco - New York - Hamburg - Singapore
>> www.smaato.com
>>
>> Germany:
>>
>> Barcastraße 5
>>
>> 22087 Hamburg
>>
>> Germany
>> M 0049(176)43330282
>>
>> The information contained in this communication may be CONFIDENTIAL and
>> is intended only for the use of the recipient(s) named above. If you are
>> not the intended recipient, you are hereby notified that any dissemination,
>> distribution, or copying of this communication, or any of its contents, is
>> strictly prohibited. If you have received this communication in error,
>> please notify the sender and delete/destroy the original message and any
>> copy of it from your computer or paper files.
>>
>

Re: Job leader ... lost leadership with version 1.13.2

Posted by Till Rohrmann <tr...@apache.org>.
Hi Xiangyu,

Do you have the logs of the problematic test run available? Ideally, we can
enable the DEBUG log level to get some more information. I think this
information would be needed to figure out the problem.

Cheers,
Till

On Thu, Sep 2, 2021 at 11:47 AM Xiangyu Su <xi...@smaato.com> wrote:

> Hello Everyone,
> Hello Till,
> We upgrade flink to 1.13.2, and we were facing randomly the "Job leader
> ... lost leadership" error, the job keep restarting and failing...
> It behaviours like this ticket
> https://issues.apache.org/jira/browse/FLINK-14316
>
> Did anybody had same issue or any suggestions?
>
> Best Regards,
>
> --
> Xiangyu Su
> Java Developer
> xiangyu@smaato.com
>
> Smaato Inc.
> San Francisco - New York - Hamburg - Singapore
> www.smaato.com
>
> Germany:
>
> Barcastraße 5
>
> 22087 Hamburg
>
> Germany
> M 0049(176)43330282
>
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>