You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Evgeniy Lyutikov <eb...@avito.ru> on 2023/03/21 08:00:59 UTC

Kubernetes skip sidecar failure

Hello everybody!
We're using Flink 1.14 and kubernetes operator 1.2.0, the pod template configures the use of the haproxy sidecar container for load balancing on a persistence checkpoint in S3 storage.
Sometimes this haproxy sidecar exits and flink completely restarts the taskmamager module and the running job.

2023-03-20 04:59:59,526 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker my-job-taskmanager-5-15 is terminated. Diagnostics: Pod terminated, container termination statuses: [haproxy(exitCode=139, reason=Error, message=null)]
2023-03-20 04:59:59,526 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Closing TaskExecutor connection my-job-taskmanager-5-15 because: Pod terminated, container termination statuses: [haproxy(exitCode=139, reason=Error, message=null)]
2023-03-20 04:59:59,527 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=10.0, taskHeapSize=5.400gb (5798205768 bytes), taskOffHeapSize=1024.000mb (1073741824 bytes), networkMemSize=1024.000mb (1073741824 bytes), managedMemSize=5.100gb (5476083384 bytes), numSlots=3}, current pending count: 1.
2023-03-20 04:59:59,527 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2023-03-20 04:59:59,529 INFO  org.apache.flink.configuration.Configuration                 [] - Config uses fallback configuration key 'kubernetes.service-account' instead of key 'kubernetes.taskmanager.service-account'
2023-03-20 04:59:59,529 INFO  org.apache.flink.configuration.Configuration                 [] - Config uses fallback configuration key 'kubernetes.service-account' instead of key 'kubernetes.taskmanager.service-account'
2023-03-20 04:59:59,529 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - The service account configured in pod template will be overwritten to 'flink' because of explicitly configured options.
2023-03-20 04:59:59,531 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Creating new TaskManager pod with name my-job-taskmanager-5-45 and resource <14336,10.0>.
2023-03-20 04:59:59,560 WARN  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Discard registration from TaskExecutor my-job-taskmanager-5-15 at (akka.tcp://flink@10.68.15.205:6122/user/rpc/taskmanager_0) because the framework did not recognize it
2023-03-20 04:59:59,607 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod my-job-taskmanager-5-45 is created.
2023-03-20 04:59:59,617 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Received new TaskManager pod: my-job-taskmanager-5-45
2023-03-20 04:59:59,617 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker my-job-taskmanager-5-45 with resource spec WorkerResourceSpec {cpuCores=10.0, taskHeapSize=5.400gb (5798205768 bytes), taskOffHeapSize=1024.000mb (1073741824 bytes), networkMemSize=1024.000mb (1073741824 bytes), managedMemSize=5.100gb (5476083384 bytes), numSlots=3}.

Is there some way to specify that only the flink-main-container status should be monitored and not react to sidecar crashes?

________________________________
"This message contains confidential information/commercial secret. If you are not the intended addressee of this message you may not copy, save, print or forward it to any third party and you are kindly requested to destroy this message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом отправителя электронным письмом."