You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ritesh Nadhani <ri...@gmail.com> on 2023/02/19 06:10:59 UTC

[Flink-Kubernetes-Operator] Beam example application getting into "State: RECONCILING" after minikube restart

Hello

This is my first setup with Apache Beam and trying to use the flink runner
for it. Following the example of
https://github.com/apache/flink-kubernetes-operator/tree/main/examples/flink-beam-example
I got my app running inside my local minikube. The app is pretty stateless
- minimal transformation between two IO.

Now, while I was testing things, my Mac decided to upgrade and thus
minikube was also restarted. After the restart, it seems like the
Flinkdeployment was still installed but no jobmanager or task managers pods
were running and doing get flinkdeployment shows it to be:

Job Status: Reconciling and Reconciliation: Deployed.

Description of the job is as follows:


Name:         myapp-beam-example
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  flink.apache.org/v1beta1
Kind:         FlinkDeployment
Metadata:
  Creation Timestamp:  2023-02-17T05:38:33Z
  Finalizers:
    flinkdeployments.flink.apache.org/finalizer
  Generation:  3
  Managed Fields:
    API Version:  flink.apache.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"flinkdeployments.flink.apache.org/finalizer":
      f:spec:
        f:job:
          f:state:
        f:jobManager:
          f:replicas:
    Manager:      fabric8-kubernetes-client
    Operation:    Update
    Time:         2023-02-17T05:38:33Z
    API Version:  flink.apache.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:flinkConfiguration:
          .:
          f:taskmanager.numberOfTaskSlots:
        f:flinkVersion:
        f:image:
        f:job:
          .:
          f:args:
          f:entryClass:
          f:jarURI:
          f:upgradeMode:
        f:jobManager:
          .:
          f:resource:
            .:
            f:cpu:
            f:memory:
        f:serviceAccount:
        f:taskManager:
          .:
          f:resource:
            .:
            f:cpu:
            f:memory:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2023-02-17T05:38:33Z
    API Version:  flink.apache.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        f:job:
          f:parallelism:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-02-17T05:45:11Z
    API Version:  flink.apache.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:clusterInfo:
          .:
          f:flink-revision:
          f:flink-version:
        f:jobManagerDeploymentStatus:
        f:jobStatus:
          .:
          f:jobId:
          f:jobName:
          f:savepointInfo:
            .:
            f:lastPeriodicSavepointTimestamp:
            f:savepointHistory:
          f:startTime:
          f:state:
          f:updateTime:
        f:reconciliationStatus:
          .:
          f:lastReconciledSpec:
          f:lastStableSpec:
          f:reconciliationTimestamp:
          f:state:
        f:taskManager:
          .:
          f:labelSelector:
          f:replicas:
    Manager:         fabric8-kubernetes-client
    Operation:       Update
    Subresource:     status
    Time:            2023-02-18T10:15:42Z
  Resource Version:  192678
  UID:               71b851e5-4b99-434a-9027-d528eb8b1180
Spec:
  Flink Configuration:
    taskmanager.numberOfTaskSlots:  1
  Flink Version:                    v1_16
  Image:                            myapp-beam:latest
  Job:
    Args:
      --runner=FlinkRunner
      --bootstrapServers=kafka-service:9092
    Entry Class:   com.netskope.myapp.myapp
    Jar URI:       local:///opt/flink/usrlib/beam-runner.jar
    Parallelism:   2
    State:         running
    Upgrade Mode:  stateless
  Job Manager:
    Replicas:  1
    Resource:
      Cpu:          1
      Memory:       2048m
  Service Account:  flink
  Task Manager:
    Resource:
      Cpu:     1
      Memory:  2048m
Status:
  Cluster Info:
    Flink - Revision:             DeadD0d0 @ 1970-01-01T01:00:00+01:00
    Flink - Version:              1.16.1
  Job Manager Deployment Status:  MISSING
  Job Status:
    Job Id:    adb5186e96f6d36637f919fb5d9aad70
    Job Name:  myapp-flink-0217055545-790af232
    Savepoint Info:
      Last Periodic Savepoint Timestamp:  0
      Savepoint History:
    Start Time:   1676613348905
    State:        RECONCILING
    Update Time:  1676628937180
  Reconciliation Status:
    Last Reconciled Spec:
 {"spec":{"job":{"jarURI":"local:///opt/flink/usrlib/beam-runner.jar","parallelism":2,"entryClass":"com.netskope.myapp.myapp","args":["--runner=FlinkRunner","--bootstrapServers=kafka-service:9092"],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"upgradeMode":"stateless","allowNonRestoredState":null},"restartNonce":null,"flinkConfiguration":{"taskmanager.numberOfTaskSlots":"1"},"image":"myapp-beam:latest","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_16","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":1,"podTemplate":null},"taskManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":null,"podTemplate":null},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"
flink.apache.org/v1beta1
","metadata":{"generation":3},"firstDeployment":false}}
    Last Stable Spec:
 {"spec":{"job":{"jarURI":"local:///opt/flink/usrlib/beam-runner.jar","parallelism":2,"entryClass":"com.netskope.myapp.myapp","args":["--runner=FlinkRunner","--bootstrapServers=kafka-service:9092"],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"upgradeMode":"stateless","allowNonRestoredState":null},"restartNonce":null,"flinkConfiguration":{"taskmanager.numberOfTaskSlots":"1"},"image":"myapp-beam:latest","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_16","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":1,"podTemplate":null},"taskManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":null,"podTemplate":null},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"
flink.apache.org/v1beta1
","metadata":{"generation":3},"firstDeployment":false}}
    Reconciliation Timestamp:  1676612715537
    State:                     DEPLOYED
  Task Manager:
    Label Selector:  component=taskmanager,app=myapp-beam-example
    Replicas:        2
Events:              <none>

tried to search for this on Github and in the mailing list archives,
without luck.

Just trying to understand what this means and how will it affect things if
some failures like this happen in production.. If somebody can point me to
the relevant document about this, that would be helpful too.
-- 
Ritesh