You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/06/08 14:19:00 UTC

[GitHub] [flink-kubernetes-operator] Aitozi commented on pull request #260: [FLINK-27497] Track terminal job states in the observer

Aitozi commented on PR #260:
URL: https://github.com/apache/flink-kubernetes-operator/pull/260#issuecomment-1149982267

   Example of a failed flinkapp
   
   ```
   Status:
     Cluster Info:
       Flink - Revision:  3a4c113 @ 2022-04-20T19:50:32+02:00
       Flink - Version:   1.15.0
     Error:               org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
                          at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
                          at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
                          at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:301)
                          at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:291)
                          at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:282)
                          at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:739)
                          at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78)
                          at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:443)
                          at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                          at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
                          at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
                          at java.base/java.lang.reflect.Method.invoke(Unknown Source)
                          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:304)
                          at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
                          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:302)
                          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
                          at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
                          at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
                          at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
                          at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
                          at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
                          at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
                          at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
                          at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
                          at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
                          at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
                          at akka.actor.Actor.aroundReceive(Actor.scala:537)
                          at akka.actor.Actor.aroundReceive$(Actor.scala:535)
                          at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
                          at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
                          at akka.actor.ActorCell.invoke(ActorCell.scala:548)
                          at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
                          at akka.dispatch.Mailbox.run(Mailbox.scala:231)
                          at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
                          at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
                          at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
                          at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
                          at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
                          at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
   Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to decode the word
     at org.apache.flink.streaming.examples.wordcount.WordCount$Tokenizer.flatMap(WordCount.java:187)
     at org.apache.flink.streaming.examples.wordcount.WordCount$Tokenizer.flatMap(WordCount.java:176)
     at org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:47)
     at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
     at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
     at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
     at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
     at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:519)
     at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
     at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
     at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
     at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
     at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
     at java.base/java.lang.Thread.run(Unknown Source)
   
     Job Manager Deployment Status:  READY
     Job Status:
       Job Id:    00000000000000000000000000000000
       Job Name:  WordCount
       Savepoint Info:
         Last Periodic Savepoint Timestamp:  0
         Savepoint History:
         Trigger Id:
         Trigger Timestamp:  0
         Trigger Type:       UNKNOWN
       Start Time:           1654697747801
       State:                FAILED
       Update Time:          1654697833534
     Reconciliation Status:
       Last Reconciled Spec:      {"job":{"jarURI":"local:///opt/flink/usrlib/myjob.jar","parallelism":2,"entryClass":null,"args":[],"state":"running","savepointTriggerNonce":0,"initialSavepointPath":null,"upgradeMode":"savepoint","allowNonRestoredState":null},"restartNonce":null,"flinkConfiguration":{"high-availability":"org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory","high-availability.storageDir":"file:///flink-data/ha","state.checkpoints.dir":"file:///flink-data/checkpoints","state.savepoints.dir":"file:///flink-data/savepoints","taskmanager.numberOfTaskSlots":"2"},"image":"flink:1.15","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_15","ingress":null,"podTemplate":{"apiVersion":"v1","kind":"Pod","spec":{"containers":[{"name":"flink-main-container","volumeMounts":[{"mountPath":"/flink-data","name":"flink-volume"},{"mountPath":"/opt/flink/usrlib","name":"flink-artifact"}]}],"initContainers":[{"command":["wget","http://blink-alipay.oss-c
 n-hangzhou-zmf.aliyuncs.com/yuli/flink-examples-streaming_2.12-1.16-SNAPSHOT-WordCount.jar","-O","/flink-artifact/myjob.jar"],"image":"busybox:latest","imagePullPolicy":"IfNotPresent","name":"artifacts-fetcher","volumeMounts":[{"mountPath":"/flink-artifact","name":"flink-artifact"}]}],"volumes":[{"hostPath":{"path":"/tmp/flink","type":"Directory"},"name":"flink-volume"},{"emptyDir":{},"name":"flink-artifact"}]}},"jobManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":1,"podTemplate":null},"taskManager":{"resource":{"cpu":1.0,"memory":"2048m"},"podTemplate":null},"logConfiguration":null,"apiVersion":"v1beta1"}
       Last Stable Spec:          {"job":{"jarURI":"local:///opt/flink/usrlib/myjob.jar","parallelism":2,"entryClass":null,"args":[],"state":"running","savepointTriggerNonce":0,"initialSavepointPath":null,"upgradeMode":"savepoint","allowNonRestoredState":null},"restartNonce":null,"flinkConfiguration":{"high-availability":"org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory","high-availability.storageDir":"file:///flink-data/ha","state.checkpoints.dir":"file:///flink-data/checkpoints","state.savepoints.dir":"file:///flink-data/savepoints","taskmanager.numberOfTaskSlots":"2"},"image":"flink:1.15","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_15","ingress":null,"podTemplate":{"apiVersion":"v1","kind":"Pod","spec":{"containers":[{"name":"flink-main-container","volumeMounts":[{"mountPath":"/flink-data","name":"flink-volume"},{"mountPath":"/opt/flink/usrlib","name":"flink-artifact"}]}],"initContainers":[{"command":["wget","http://blink-alipay.oss-c
 n-hangzhou-zmf.aliyuncs.com/yuli/flink-examples-streaming_2.12-1.16-SNAPSHOT-WordCount.jar","-O","/flink-artifact/myjob.jar"],"image":"busybox:latest","imagePullPolicy":"IfNotPresent","name":"artifacts-fetcher","volumeMounts":[{"mountPath":"/flink-artifact","name":"flink-artifact"}]}],"volumes":[{"hostPath":{"path":"/tmp/flink","type":"Directory"},"name":"flink-volume"},{"emptyDir":{},"name":"flink-artifact"}]}},"jobManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":1,"podTemplate":null},"taskManager":{"resource":{"cpu":1.0,"memory":"2048m"},"podTemplate":null},"logConfiguration":null,"apiVersion":"v1beta1"}
       Reconciliation Timestamp:  1654697738529
       State:                     DEPLOYED
   Events:
     Type    Reason          Age    From                  Message
     ----    ------          ----   ----                  -------
     Normal  Status Updated  2m13s  JobManagerDeployment  Job status updated from RECONCILING to RUNNING
     Normal  Status Updated  57s    JobManagerDeployment  Job status updated from RUNNING to FAILED
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org