You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (Jira)" <ji...@apache.org> on 2020/07/03 09:29:00 UTC

[jira] [Commented] (FLINK-17177) Handle ERROR event correctly in KubernetesResourceManager#onError

    [ https://issues.apache.org/jira/browse/FLINK-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150863#comment-17150863 ] 

Robert Metzger commented on FLINK-17177:
----------------------------------------

Looking at the code, it seems that we are only logging (any) event on DEBUG level.

Maybe as an intermediate step, we could log on "WARN" that we've received an error from K8s?
Otherwise, we might have error reports from users which will be hard to debug.
Also, this might help us understand in the long run, which types of errors K8s is reporting here.

> Handle ERROR event correctly in KubernetesResourceManager#onError
> -----------------------------------------------------------------
>
>                 Key: FLINK-17177
>                 URL: https://issues.apache.org/jira/browse/FLINK-17177
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.10.0, 1.10.1
>            Reporter: Canbin Zheng
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Currently, once we receive an *ERROR* event that is sent from the K8s API server via the K8s {{Watcher}}, then {{KubernetesResourceManager#onError}} will handle it by calling the {{KubernetesResourceManager#removePodIfTerminated}}. This may be incorrect since the *ERROR* event may indicate an exception in the HTTP layer, which means the previously created {{Watcher}} may be no longer available and we'd better re-create the {{Watcher}} immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)