You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by mejri houssem <me...@gmail.com> on 2021/08/27 12:31:30 UTC

k8S HA mode

hello i am deploying a flink application cluster with kubernetes HA mode, but i am facing this  recurrent problem and i didn't know how to solve it.

Any help would be appreciated.



this of the jobManager:

{"@timestamp":"2021-08-27T14:19:42.447+02:00","@version":"1","message":"Exception occurred while renewing lock: Unable to update ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector","thread_name":"pool-4092-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update ConfigMapLock\n\tat io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureT
 ask.java:266)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader. Message: Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], gro
 up=null, kind=configmaps, name=elifibre-00000000000000000000000000000000-jobmanager-leader, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat io.fabric8.kubernet
 es.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)\n\tat io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)\n\tat io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)\n\t... 10 common frames omitted\n"}


Re: k8S HA mode

Posted by Yang Wang <da...@gmail.com>.
Could you please share the full JobManager logs?

AFAIK, you attached exceptions are normal logs when the JobManager is
trying to acquire the configmap lock.

Best,
Yang

houssem <me...@gmail.com> 于2021年8月31日周二 上午4:36写道:

> Hello, thanks for the response
>
> I am using kubernetes standalone application mode not the native one.
>
> and this error happens randomly at some point while running the job.
>
> Also i am using just one replicas of the jobmanager
>
> here is some other logs::
>
>
> {"@timestamp":"2021-08-30T15:43:44.970+02:00","@version":"1","message":"Exception
> occurred while renewing lock: Unable to update
> ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector",
> "thread_name":"pool-685-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
> Unable to update ConfigMapLock
>
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)
>
>  io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)
>
>  io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)
>
>  io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)
>  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
>  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748)
>  Caused by: io.fabric8.kubernetes.client.KubernetesClientException:
> Failure executing: PUT at:
>
> https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader
> .
>  Message: Operation cannot be fulfilled on configmaps
> \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object
> has been modified; please apply your changes to the latest version and try
> again.
>  Received status: Status(apiVersion=v1, code=409,
> details=StatusDetails(causes=[], group=null, kind=configmaps,
> name=elifibre-00000000000000000000000000000000-jobmanager-leader,
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
> message=Operation cannot be fulfilled on configmaps
>  \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the
> object has been modified;
>  please apply your changes to the latest version and try again,
> metadata=ListMeta(_continue=null, remainingItemCount=null,
> resourceVersion=null, selfLink=null, additionalProperties={}),
> reason=Conflict, status=Failure, additionalProperties={}).
>
>  io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
>
>  io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)
>
>  io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
>
>  io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
>
>  io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)
>
>  io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)
>
>  io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)
>
>  io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)
>
>  io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)
>
>  io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)
>
>  io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)
>
>  io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)
>
>  io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)
>  ... 10 common frames omitted\n"}
>
>
> **********************************************************************************************************
>
>
>
>
>
> On 2021/08/30 10:53:10, Roman Khachatryan <ro...@apache.org> wrote:
> > Hello,
> >
> > Do I understand correctly that you are using native Kubernetes
> > deployment in application mode;
> > and the issue *only* happens if you set kubernetes-jobmanager-replicas
> > [1] to a value greater than 1?
> >
> > Does it happen during deployment or at some point while running the job?
> >
> > Could you share Flink and Kubernetes versions and HA configuration
> > [2]? (I'm assuming you're using Kubernetes for HA, not ZK).
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#kubernetes-jobmanager-replicas
> > [2]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/
> >
> > Regards,
> > Roman
> >
> > On Fri, Aug 27, 2021 at 2:31 PM mejri houssem <me...@gmail.com>
> wrote:
> > >
> > > hello i am deploying a flink application cluster with kubernetes HA
> mode, but i am facing this  recurrent problem and i didn't know how to
> solve it.
> > >
> > > Any help would be appreciated.
> > >
> > >
> > >
> > > this of the jobManager:
> > >
> > >
> {"@timestamp":"2021-08-27T14:19:42.447+02:00","@version":"1","message":"Exception
> occurred while renewing lock: Unable to update
> ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector","thread_name":"pool-4092-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
> Unable to update ConfigMapLock\n\tat
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)\n\tat
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)\n\tat
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)\n\tat
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)\n\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
> java.util.concurrent.FutureTask.run(Fut
>  ureT
> > >  ask.java:266)\n\tat
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
> java.lang.Thread.run(Thread.java:748)\nCaused by:
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing:
> PUT at:
> https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader.
> Message: Operation cannot be fulfilled on configmaps
> \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object
> has been modified; please apply your changes to the latest version and try
> again. Received status: Status(apiVersion=v1, code=409,
> details=StatusDetails(causes=[],
>   gro
> > >  up=null, kind=configmaps,
> name=elifibre-00000000000000000000000000000000-jobmanager-leader,
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
> message=Operation cannot be fulfilled on configmaps
> \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object
> has been modified; please apply your changes to the latest version and try
> again, metadata=ListMeta(_continue=null, remainingItemCount=null,
> resourceVersion=null, selfLink=null, additionalProperties={}),
> reason=Conflict, status=Failure, additionalProperties={}).\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat
> io.fabric8.kube
>  rnet
> > >
> es.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)\n\tat
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)\n\tat
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)\n\tat
> io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)\n\tat
> io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)\n\tat
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)\n\tat
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)\n\tat
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)\n\t...
> 10 common frames omitted\n"}
> > >
> >
>

Re: k8S HA mode

Posted by houssem <me...@gmail.com>.
Hello, thanks for the response

I am using kubernetes standalone application mode not the native one.

and this error happens randomly at some point while running the job.

Also i am using just one replicas of the jobmanager

here is some other logs::


{"@timestamp":"2021-08-30T15:43:44.970+02:00","@version":"1","message":"Exception occurred while renewing lock: Unable to update ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector",
"thread_name":"pool-685-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update ConfigMapLock
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)
 io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)
 io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)
 io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 java.lang.Thread.run(Thread.java:748)
 Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at:
 https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader.
 Message: Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again.
 Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=configmaps, name=elifibre-00000000000000000000000000000000-jobmanager-leader, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on configmaps
 \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified;
 please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).
 io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
 io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)
 io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
 io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
 io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)
 io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)
 io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)
 io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)
 io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)
 io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)
 io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)
 io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)
 io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)
 ... 10 common frames omitted\n"}

**********************************************************************************************************





On 2021/08/30 10:53:10, Roman Khachatryan <ro...@apache.org> wrote: 
> Hello,
> 
> Do I understand correctly that you are using native Kubernetes
> deployment in application mode;
> and the issue *only* happens if you set kubernetes-jobmanager-replicas
> [1] to a value greater than 1?
> 
> Does it happen during deployment or at some point while running the job?
> 
> Could you share Flink and Kubernetes versions and HA configuration
> [2]? (I'm assuming you're using Kubernetes for HA, not ZK).
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#kubernetes-jobmanager-replicas
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/
> 
> Regards,
> Roman
> 
> On Fri, Aug 27, 2021 at 2:31 PM mejri houssem <me...@gmail.com> wrote:
> >
> > hello i am deploying a flink application cluster with kubernetes HA mode, but i am facing this  recurrent problem and i didn't know how to solve it.
> >
> > Any help would be appreciated.
> >
> >
> >
> > this of the jobManager:
> >
> > {"@timestamp":"2021-08-27T14:19:42.447+02:00","@version":"1","message":"Exception occurred while renewing lock: Unable to update ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector","thread_name":"pool-4092-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update ConfigMapLock\n\tat io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(Fut
 ureT
> >  ask.java:266)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader. Message: Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[],
  gro
> >  up=null, kind=configmaps, name=elifibre-00000000000000000000000000000000-jobmanager-leader, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat io.fabric8.kube
 rnet
> >  es.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)\n\tat io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)\n\tat io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)\n\t... 10 common frames omitted\n"}
> >
> 

Re: k8S HA mode

Posted by Roman Khachatryan <ro...@apache.org>.
Hello,

Do I understand correctly that you are using native Kubernetes
deployment in application mode;
and the issue *only* happens if you set kubernetes-jobmanager-replicas
[1] to a value greater than 1?

Does it happen during deployment or at some point while running the job?

Could you share Flink and Kubernetes versions and HA configuration
[2]? (I'm assuming you're using Kubernetes for HA, not ZK).

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#kubernetes-jobmanager-replicas
[2]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/

Regards,
Roman

On Fri, Aug 27, 2021 at 2:31 PM mejri houssem <me...@gmail.com> wrote:
>
> hello i am deploying a flink application cluster with kubernetes HA mode, but i am facing this  recurrent problem and i didn't know how to solve it.
>
> Any help would be appreciated.
>
>
>
> this of the jobManager:
>
> {"@timestamp":"2021-08-27T14:19:42.447+02:00","@version":"1","message":"Exception occurred while renewing lock: Unable to update ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector","thread_name":"pool-4092-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update ConfigMapLock\n\tat io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureT
>  ask.java:266)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader. Message: Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], gro
>  up=null, kind=configmaps, name=elifibre-00000000000000000000000000000000-jobmanager-leader, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on configmaps \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat io.fabric8.kubernet
>  es.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)\n\tat io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)\n\tat io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)\n\tat io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)\n\tat io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)\n\t... 10 common frames omitted\n"}
>