You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "ChangZhuo Chen (陳昌倬)" <cz...@czchen.org> on 2020/12/30 01:35:23 UTC

Cannot start from savepoint using Flink 1.12 in standalone Kubernetes + Kubernetes HA

Hi,

We cannot start job from savepoint (created by Flink 1.12, Standalone
Kubernetes + zookeeper HA) in Flink 1.12, Standalone Kubernetes +
Kubernetes HA. The following is the exception that stops the job.

    Caused by: java.util.concurrent.CompletionException: org.apache.flink.kubernetes.kubeclient.resources.KubernetesException: Cannot retry checkAndUpdateConfigMap with configMap name-51e5afd90227d537ff442403d1b279da-jobmanager-leader because it does not exist.


Cluster can start new job from scratch, so we think cluster
configuration is good.

The following is HA related config:

    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: gs://some/path/recovery
    kubernetes.cluster-id: cluster-name
    kubernetes.context: kubernetes-context
    kubernetes.namespace: kubernetes-namespace


-- 
ChangZhuo Chen (陳昌倬) czchen@{czchen,debconf,debian}.org
http://czchen.info/
Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B

Re: Cannot start from savepoint using Flink 1.12 in standalone Kubernetes + Kubernetes HA

Posted by Yang Wang <da...@gmail.com>.

This is a known issue. Please refer here[1] for more information. And it is
already fixed in master and 1.12 branch.
Also the next minor Flink release version(1.12.1) will include it. Maybe
you could help to verify that.


[1]. https://issues.apache.org/jira/browse/FLINK-20648

Best,
Yang

ChangZhuo Chen (陳昌倬) <cz...@czchen.org> 于2020年12月30日周三 上午9:35写道：

> Hi,
>
> We cannot start job from savepoint (created by Flink 1.12, Standalone
> Kubernetes + zookeeper HA) in Flink 1.12, Standalone Kubernetes +
> Kubernetes HA. The following is the exception that stops the job.
>
>     Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.kubernetes.kubeclient.resources.KubernetesException:
> Cannot retry checkAndUpdateConfigMap with configMap
> name-51e5afd90227d537ff442403d1b279da-jobmanager-leader because it does not
> exist.
>
>
> Cluster can start new job from scratch, so we think cluster
> configuration is good.
>
> The following is HA related config:
>
>     high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: gs://some/path/recovery
>     kubernetes.cluster-id: cluster-name
>     kubernetes.context: kubernetes-context
>     kubernetes.namespace: kubernetes-namespace
>
>
> --
> ChangZhuo Chen (陳昌倬) czchen@{czchen,debconf,debian}.org
> http://czchen.info/
> Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B
>