You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Julio Biason <ju...@azion.com> on 2018/10/02 19:03:45 UTC
Cancelled job not showing its details
Hello,
I had a job that was failing -- a bug on our code -- so I decided to cancel
it and deploy the fix. Because I couldn't create a savepoint due the job
restarting, I decided to kill it anyway and use the web interface to get
the last successful checkpoint.
The problem is: the interface is not showing anything for the job. The
details page show nothing, not even the pipeline.
The only thing that seems related in the JobManager logs is this:
2018-10-02 19:03:14,214 [flink-akka.actor.default-dispatcher-4158] ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler -
Implementation error: Unhandled exception.
java.lang.IllegalArgumentException: Negative number of in progress
checkpoints
at
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
at
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
at
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
at
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
at
org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
at
org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
--
*Julio Biason*, Sofware Engineer
*AZION* | Deliver. Accelerate. Protect.
Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51
<callto:+5551996209291>*99907 0554*
Re: Cancelled job not showing its details
Posted by Andrey Zagrebin <an...@data-artisans.com>.
Hi Julio,
this might be a bug in job stats. Can you please create an issue in Jira describing the steps you were doing and complete logs?
Best,
Andrey
> On 2 Oct 2018, at 21:11, Julio Biason <ju...@azion.com> wrote:
>
> Oh, another piece of information:
>
> Because the job was failing and restarting, I did a cancel via the CLI tool during one of the restarts.
>
> On Tue, Oct 2, 2018 at 4:03 PM, Julio Biason <julio.biason@azion.com <ma...@azion.com>> wrote:
> Hello,
>
> I had a job that was failing -- a bug on our code -- so I decided to cancel it and deploy the fix. Because I couldn't create a savepoint due the job restarting, I decided to kill it anyway and use the web interface to get the last successful checkpoint.
>
> The problem is: the interface is not showing anything for the job. The details page show nothing, not even the pipeline.
>
> The only thing that seems related in the JobManager logs is this:
>
> 2018-10-02 19:03:14,214 [flink-akka.actor.default-dispatcher-4158] ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Implementation error: Unhandled exception.
> java.lang.IllegalArgumentException: Negative number of in progress checkpoints
> at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
> at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
> at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
> at org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
> at org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
> at org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
> at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
> at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> --
> Julio Biason, Sofware Engineer
> AZION | Deliver. Accelerate. Protect.
> Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51 <callto:+5551996209291>99907 0554
>
>
>
> --
> Julio Biason, Sofware Engineer
> AZION | Deliver. Accelerate. Protect.
> Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51 <callto:+5551996209291>99907 0554
Re: Cancelled job not showing its details
Posted by Julio Biason <ju...@azion.com>.
Oh, another piece of information:
Because the job was failing and restarting, I did a cancel via the CLI tool
during one of the restarts.
On Tue, Oct 2, 2018 at 4:03 PM, Julio Biason <ju...@azion.com> wrote:
> Hello,
>
> I had a job that was failing -- a bug on our code -- so I decided to
> cancel it and deploy the fix. Because I couldn't create a savepoint due the
> job restarting, I decided to kill it anyway and use the web interface to
> get the last successful checkpoint.
>
> The problem is: the interface is not showing anything for the job. The
> details page show nothing, not even the pipeline.
>
> The only thing that seems related in the JobManager logs is this:
>
> 2018-10-02 19:03:14,214 [flink-akka.actor.default-dispatcher-4158] ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler -
> Implementation error: Unhandled exception.
> java.lang.IllegalArgumentException: Negative number of in progress
> checkpoints
> at org.apache.flink.util.Preconditions.checkArgument(
> Preconditions.java:139)
> at org.apache.flink.runtime.checkpoint.
> CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
> at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.
> createSnapshot(CheckpointStatsCounts.java:177)
> at org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.
> createSnapshot(CheckpointStatsTracker.java:166)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.
> getCheckpointStatsSnapshot(ExecutionGraph.java:553)
> at org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.
> createFrom(ArchivedExecutionGraph.java:340)
> at org.apache.flink.runtime.jobmaster.JobMaster.
> requestJob(JobMaster.java:923)
> at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.
> handleRpcInvocation(AkkaRpcActor.java:247)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.
> handleRpcMessage(AkkaRpcActor.java:162)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.
> handleRpcMessage(FencedAkkaRpcActor.java:70)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(
> AkkaRpcActor.java:142)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.
> onReceive(FencedAkkaRpcActor.java:40)
> at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:165)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>
>
> --
> *Julio Biason*, Sofware Engineer
> *AZION* | Deliver. Accelerate. Protect.
> Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51
> <callto:+5551996209291>*99907 0554*
>
--
*Julio Biason*, Sofware Engineer
*AZION* | Deliver. Accelerate. Protect.
Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51
<callto:+5551996209291>*99907 0554*