You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gary Yao (JIRA)" <ji...@apache.org> on 2018/03/06 20:17:00 UTC

[jira] [Created] (FLINK-8887) ClusterClient.getJobStatus can throw FencingTokenException

Gary Yao created FLINK-8887:
-------------------------------

             Summary: ClusterClient.getJobStatus can throw FencingTokenException
                 Key: FLINK-8887
                 URL: https://issues.apache.org/jira/browse/FLINK-8887
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.5.0
            Reporter: Gary Yao
             Fix For: 1.5.0


*Description*
Calling {{RestClusterClient.getJobStatus}} or {{MiniClusterClient.getJobStatus}} can result in a {{FencingTokenException}}. 

*Analysis*
{{Dispatcher.requestJobStatus}} first looks the {{JobManagerRunner}} up by job id. If a reference is found, {{requestJobStatus}} is called on the respective instance. If not, the {{ArchivedExecutionGraphStore}} is queried. However, between the lookup and the method call, the {{JobMaster}} of the respective job may have lost leadership already (job finished), and has set the fencing token to {{null}}.

*Stacktrace*

{noformat}
Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token mismatch: Ignoring message LocalFencedMessage(null, LocalRpcInvocation(requestJobStatus(Time))) because the fencing token null did not match the expected fencing token b8423c75bc6838244b8c93c8bd4a4f51.
	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:73)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132)
	at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
	at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
	at akka.actor.ActorCell.invoke(ActorCell.scala:495)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{noformat}

{noformat}
Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(null, LocalRpcInvocation(requestJobStatus(Time))) because the fencing token is null.
	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:56)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132)
	at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
	at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
	at akka.actor.ActorCell.invoke(ActorCell.scala:495)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)