You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Pierre Bedoucha <Pi...@tv2.no> on 2023/03/31 12:38:34 UTC

[Kubernetes Operator] NullPointerException from KubernetesApplicationClusterEntrypoint

Hi,

We are trying to use Flink Kubernetes Operator 1.4.0 with Flink 1.16.

However, at the job-manager deployment step we get the following error:
```
Exception in thread "main" java.lang.NullPointerException
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.shutDownAsync(ClusterEntrypoint.java:585)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:242)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:729)
        at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86)

```
It sems it is related to the following line:
```

this.clusterId = checkNotNull(flinkConfig.getString(KubernetesConfigOptions.CLUSTER_ID), "ClusterId must be specified!");

```
We specified the CLUSTER_ID but it seems that the flinkConfig object is not handled correctly.

We have the following flinkConfiguration defined in deployment.yaml:
```
spec:

  flinkConfiguration:

    execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION

    execution.checkpointing.interval: 120s

    execution.checkpointing.min-pause: 120s

    execution.checkpointing.mode: AT_LEAST_ONCE

    execution.checkpointing.snapshot-compression: "false"

    execution.checkpointing.timeout: 3000s

    execution.checkpointing.tolerable-failed-checkpoints: "5"

    execution.checkpointing.unaligned: "false"

    fs.hdfs.hadoopconf: /opt/hadoop-conf/

    high-availability.storageDir: gs://<path/to/environment>/ha

    high-availability: kubernetes

    high-availability.cluster-id: <cluster-id>

    kubernetes.operator.periodic.savepoint.interval: 6h

    kubernetes.operator.savepoint.history.max.age: 72h

    kubernetes.operator.savepoint.history.max.count: "15"

    metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter

    metrics.reporter.prom.port: "2112"

    metrics.reporters: prom

    rest.flamegraph.enabled: "false"

    state.backend: rocksdb

    state.backend.incremental: "false"

    state.backend.rocksdb.localdir: /rocksdb

    state.checkpoint-storage: filesystem

    state.checkpoints.dir: gs://<path/to/environment>/checkpoints

    state.savepoints.dir: gs://<path/to/environment>/savepoints

    taskmanager.memory.managed.fraction: "0"

    taskmanager.network.memory.buffer-debloat.enabled: "false"

    taskmanager.network.memory.buffer-debloat.period: "200"

    taskmanager.network.memory.buffers-per-channel: "2"

    taskmanager.network.memory.floating-buffers-per-gate: "8"

    taskmanager.network.memory.max-buffers-per-channel: "10"

    taskmanager.network.sort-shuffle.min-buffers: "512"

    taskmanager.numberOfTaskSlots: "1"

    kubernetes.taskmanager.cpu.limit-factor: "4"

    kubernetes.taskmanager.cpu: "0.5"

    kubernetes.cluster-id: <cluster-id>
```
Have someone encountered the issue before?

Thanks,
PB

Svar: [Kubernetes Operator] NullPointerException from KubernetesApplicationClusterEntrypoint

Posted by Pierre Bedoucha <Pi...@tv2.no>.

Hi Gyula, and thanks for your answer,

We tried without any cluster-id reference and still got the same error message. It seems to be related with flink 1.16 as we have other jobs running with the same flinkConfig and flink 1.15.

PB

Fra: Gyula Fóra <gy...@gmail.com>
Dato: fredag, 31. mars 2023 kl. 14:41
Til: Pierre Bedoucha <Pi...@tv2.no>
Kopi: user@flink.apache.org <us...@flink.apache.org>
Emne: Re: [Kubernetes Operator] NullPointerException from KubernetesApplicationClusterEntrypoint
Never seen this before but also you should not set the cluster-id in your config as that should be controlled by the operator itself.

Gyula

On Fri, Mar 31, 2023 at 2:39 PM Pierre Bedoucha <Pi...@tv2.no>> wrote:
Hi,

We are trying to use Flink Kubernetes Operator 1.4.0 with Flink 1.16.

However, at the job-manager deployment step we get the following error:
```
Exception in thread "main" java.lang.NullPointerException
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.shutDownAsync(ClusterEntrypoint.java:585)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:242)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:729)
        at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86)

```
It sems it is related to the following line:
```

this.clusterId = checkNotNull(flinkConfig.getString(KubernetesConfigOptions.CLUSTER_ID), "ClusterId must be specified!");

```
We specified the CLUSTER_ID but it seems that the flinkConfig object is not handled correctly.

We have the following flinkConfiguration defined in deployment.yaml:
```
spec:

  flinkConfiguration:

    execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION

    execution.checkpointing.interval: 120s

    execution.checkpointing.min-pause: 120s

    execution.checkpointing.mode: AT_LEAST_ONCE

    execution.checkpointing.snapshot-compression: "false"

    execution.checkpointing.timeout: 3000s

    execution.checkpointing.tolerable-failed-checkpoints: "5"

    execution.checkpointing.unaligned: "false"

    fs.hdfs.hadoopconf: /opt/hadoop-conf/

    high-availability.storageDir: gs://<path/to/environment>/ha

    high-availability: kubernetes

    high-availability.cluster-id: <cluster-id>

    kubernetes.operator.periodic.savepoint.interval: 6h

    kubernetes.operator.savepoint.history.max.age: 72h

    kubernetes.operator.savepoint.history.max.count: "15"

    metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter

    metrics.reporter.prom.port: "2112"

    metrics.reporters: prom

    rest.flamegraph.enabled: "false"

    state.backend: rocksdb

    state.backend.incremental: "false"

    state.backend.rocksdb.localdir: /rocksdb

    state.checkpoint-storage: filesystem

    state.checkpoints.dir: gs://<path/to/environment>/checkpoints

    state.savepoints.dir: gs://<path/to/environment>/savepoints

    taskmanager.memory.managed.fraction: "0"

    taskmanager.network.memory.buffer-debloat.enabled: "false"

    taskmanager.network.memory.buffer-debloat.period: "200"

    taskmanager.network.memory.buffers-per-channel: "2"

    taskmanager.network.memory.floating-buffers-per-gate: "8"

    taskmanager.network.memory.max-buffers-per-channel: "10"

    taskmanager.network.sort-shuffle.min-buffers: "512"

    taskmanager.numberOfTaskSlots: "1"

    kubernetes.taskmanager.cpu.limit-factor: "4"

    kubernetes.taskmanager.cpu: "0.5"

    kubernetes.cluster-id: <cluster-id>
```
Have someone encountered the issue before?

Thanks,
PB

Re: [Kubernetes Operator] NullPointerException from KubernetesApplicationClusterEntrypoint

Posted by Gyula Fóra <gy...@gmail.com>.

Never seen this before but also you should not set the cluster-id in your
config as that should be controlled by the operator itself.

Gyula

On Fri, Mar 31, 2023 at 2:39 PM Pierre Bedoucha <Pi...@tv2.no>
wrote:

> Hi,
>
>
>
> We are trying to use Flink Kubernetes Operator 1.4.0 with Flink 1.16.
>
>
>
> However, at the job-manager deployment step we get the following error:
> ```
>
> Exception in thread "main" java.lang.NullPointerException
>
>         at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.shutDownAsync(ClusterEntrypoint.java:585)
>
>         at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:242)
>
>         at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:729)
>
>         at
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86)
>
>
>
> ```
> It sems it is related to the following line:
>
> ```
>
> this.clusterId =
> checkNotNull(flinkConfig.getString(KubernetesConfigOptions.CLUSTER_ID),
> "ClusterId must be specified!");
>
> ```
> We specified the CLUSTER_ID but it seems that the flinkConfig object is
> not handled correctly.
>
> We have the following flinkConfiguration defined in deployment.yaml:
> ```
> spec:
>
>   flinkConfiguration:
>
>     execution.checkpointing.externalized-checkpoint-retention:
> RETAIN_ON_CANCELLATION
>
>     execution.checkpointing.interval: 120s
>
>     execution.checkpointing.min-pause: 120s
>
>     execution.checkpointing.mode: AT_LEAST_ONCE
>
>     execution.checkpointing.snapshot-compression: "false"
>
>     execution.checkpointing.timeout: 3000s
>
>     execution.checkpointing.tolerable-failed-checkpoints: "5"
>
>     execution.checkpointing.unaligned: "false"
>
>     fs.hdfs.hadoopconf: /opt/hadoop-conf/
>
>     high-availability.storageDir: gs://<path/to/environment>/ha
>
>     high-availability: kubernetes
>
>     high-availability.cluster-id: <cluster-id>
>
>     kubernetes.operator.periodic.savepoint.interval: 6h
>
>     kubernetes.operator.savepoint.history.max.age: 72h
>
>     kubernetes.operator.savepoint.history.max.count: "15"
>
>     metrics.reporter.prom.class:
> org.apache.flink.metrics.prometheus.PrometheusReporter
>
>     metrics.reporter.prom.port: "2112"
>
>     metrics.reporters: prom
>
>     rest.flamegraph.enabled: "false"
>
>     state.backend: rocksdb
>
>     state.backend.incremental: "false"
>
>     state.backend.rocksdb.localdir: /rocksdb
>
>     state.checkpoint-storage: filesystem
>
>     state.checkpoints.dir: gs://<path/to/environment>/checkpoints
>
>     state.savepoints.dir: gs://<path/to/environment>/savepoints
>
>     taskmanager.memory.managed.fraction: "0"
>
>     taskmanager.network.memory.buffer-debloat.enabled: "false"
>
>     taskmanager.network.memory.buffer-debloat.period: "200"
>
>     taskmanager.network.memory.buffers-per-channel: "2"
>
>     taskmanager.network.memory.floating-buffers-per-gate: "8"
>
>     taskmanager.network.memory.max-buffers-per-channel: "10"
>
>     taskmanager.network.sort-shuffle.min-buffers: "512"
>
>     taskmanager.numberOfTaskSlots: "1"
>
>     kubernetes.taskmanager.cpu.limit-factor: "4"
>
>     kubernetes.taskmanager.cpu: "0.5"
>
>     kubernetes.cluster-id: <cluster-id>
>
> ```
> Have someone encountered the issue before?
>
> Thanks,
> PB
>