You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrew Otto <ot...@wikimedia.org> on 2023/05/22 18:47:37 UTC
Flink Kubernetes Operator lifecycle state count metrics question
Hello!
I'm doing some grafana+prometheus dashboarding for
flink-kubernetes-operator. Reading metrics docs
<https://stackoverflow.com/a/61795256>, I see that I have nice per k8s
namespace lifecycle current count gauge metrics in Prometheus.
Using kubectl, I can see that I have one FlinkDeployment in my namespace:
# kubectl -n stream-enrichment-poc get flinkdeployments
NAME JOB STATUS LIFECYCLE STATE
flink-app-main RUNNING STABLE
But, prometheus is reporting that I have 2 FlinkDeployments in the STABLE
state.
# curl -s <pod_ip>:<prom_port> | grep
flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
2.0
I'm not sure why I see 2.0 reported.
flink_k8soperator_namespace_JmDeploymentStatus_READY_Count reports only one
FlinkDeployment.
# curl <pod_ip>:<prom_port>/metrics | grep
flink_k8soperator_namespace_JmDeploymentStatus_READY_Count
flink_k8soperator_namespace_JmDeploymentStatus_READY_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
1.0
Is it possible that flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
is being reported as an incrementing counter instead of a guage?
Thanks
-Andrew Otto
Wikimedia Foundation
Re: Flink Kubernetes Operator lifecycle state count metrics question
Posted by Gyula Fóra <gy...@gmail.com>.
Hi Andrew!
I think you are completely right, this is a bug. The per namespace metrics
do not seem to filter per namespace and show the aggregated global count
for each namespace:
I opened a ticket:
https://issues.apache.org/jira/browse/FLINK-32164
Thanks for reporting this!
Gyula
On Mon, May 22, 2023 at 10:49 PM Andrew Otto <ot...@wikimedia.org> wrote:
> Also! I do have 2 FlinkDeployments deployed with this operator, but they
> are in different namespaces, and each of the per namespace metrics reports
> that it has 2 Deployments in them, even though there is only one according
> to kubectl.
>
> Actually...we just tried to deploy a change (enabling some checkpointing)
> that caused one of our FlinkDeployments to fail. Now, both namespace
> STABLE_Counts each report 1.
>
> # curl -s <pod_ip>:<prom_port> | grep
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
> 1.0
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="rdf_streaming_updater",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
> 1.0
>
> It looks like maybe this metric is not reporting per namespace, but a
> global count.
>
>
>
> On Mon, May 22, 2023 at 2:56 PM Andrew Otto <ot...@wikimedia.org> wrote:
>
>> Oh, FWIW, I do have operator HA enabled with 2 replicas running, but in
>> my examples there, I am curl-ing the leader flink operator pod.
>>
>>
>>
>> On Mon, May 22, 2023 at 2:47 PM Andrew Otto <ot...@wikimedia.org> wrote:
>>
>>> Hello!
>>>
>>> I'm doing some grafana+prometheus dashboarding for
>>> flink-kubernetes-operator. Reading metrics docs
>>> <https://stackoverflow.com/a/61795256>, I see that I have nice per k8s
>>> namespace lifecycle current count gauge metrics in Prometheus.
>>>
>>> Using kubectl, I can see that I have one FlinkDeployment in my namespace:
>>>
>>> # kubectl -n stream-enrichment-poc get flinkdeployments
>>> NAME JOB STATUS LIFECYCLE STATE
>>> flink-app-main RUNNING STABLE
>>>
>>> But, prometheus is reporting that I have 2 FlinkDeployments in the
>>> STABLE state.
>>>
>>> # curl -s <pod_ip>:<prom_port> | grep
>>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
>>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
>>> 2.0
>>>
>>> I'm not sure why I see 2.0 reported.
>>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count reports only
>>> one FlinkDeployment.
>>>
>>> # curl <pod_ip>:<prom_port>/metrics | grep
>>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count
>>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
>>> 1.0
>>>
>>> Is it possible that
>>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count is being
>>> reported as an incrementing counter instead of a guage?
>>>
>>> Thanks
>>> -Andrew Otto
>>> Wikimedia Foundation
>>>
>>>
Re: Flink Kubernetes Operator lifecycle state count metrics question
Posted by Andrew Otto <ot...@wikimedia.org>.
Also! I do have 2 FlinkDeployments deployed with this operator, but they
are in different namespaces, and each of the per namespace metrics reports
that it has 2 Deployments in them, even though there is only one according
to kubectl.
Actually...we just tried to deploy a change (enabling some checkpointing)
that caused one of our FlinkDeployments to fail. Now, both namespace
STABLE_Counts each report 1.
# curl -s <pod_ip>:<prom_port> | grep
flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
1.0
flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="rdf_streaming_updater",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
1.0
It looks like maybe this metric is not reporting per namespace, but a
global count.
On Mon, May 22, 2023 at 2:56 PM Andrew Otto <ot...@wikimedia.org> wrote:
> Oh, FWIW, I do have operator HA enabled with 2 replicas running, but in my
> examples there, I am curl-ing the leader flink operator pod.
>
>
>
> On Mon, May 22, 2023 at 2:47 PM Andrew Otto <ot...@wikimedia.org> wrote:
>
>> Hello!
>>
>> I'm doing some grafana+prometheus dashboarding for
>> flink-kubernetes-operator. Reading metrics docs
>> <https://stackoverflow.com/a/61795256>, I see that I have nice per k8s
>> namespace lifecycle current count gauge metrics in Prometheus.
>>
>> Using kubectl, I can see that I have one FlinkDeployment in my namespace:
>>
>> # kubectl -n stream-enrichment-poc get flinkdeployments
>> NAME JOB STATUS LIFECYCLE STATE
>> flink-app-main RUNNING STABLE
>>
>> But, prometheus is reporting that I have 2 FlinkDeployments in the STABLE
>> state.
>>
>> # curl -s <pod_ip>:<prom_port> | grep
>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
>> 2.0
>>
>> I'm not sure why I see 2.0 reported.
>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count reports only
>> one FlinkDeployment.
>>
>> # curl <pod_ip>:<prom_port>/metrics | grep
>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count
>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
>> 1.0
>>
>> Is it possible that
>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count is being
>> reported as an incrementing counter instead of a guage?
>>
>> Thanks
>> -Andrew Otto
>> Wikimedia Foundation
>>
>>
Re: Flink Kubernetes Operator lifecycle state count metrics question
Posted by Andrew Otto <ot...@wikimedia.org>.
Oh, FWIW, I do have operator HA enabled with 2 replicas running, but in my
examples there, I am curl-ing the leader flink operator pod.
On Mon, May 22, 2023 at 2:47 PM Andrew Otto <ot...@wikimedia.org> wrote:
> Hello!
>
> I'm doing some grafana+prometheus dashboarding for
> flink-kubernetes-operator. Reading metrics docs
> <https://stackoverflow.com/a/61795256>, I see that I have nice per k8s
> namespace lifecycle current count gauge metrics in Prometheus.
>
> Using kubectl, I can see that I have one FlinkDeployment in my namespace:
>
> # kubectl -n stream-enrichment-poc get flinkdeployments
> NAME JOB STATUS LIFECYCLE STATE
> flink-app-main RUNNING STABLE
>
> But, prometheus is reporting that I have 2 FlinkDeployments in the STABLE
> state.
>
> # curl -s <pod_ip>:<prom_port> | grep
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
> 2.0
>
> I'm not sure why I see 2.0 reported.
> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count reports only
> one FlinkDeployment.
>
> # curl <pod_ip>:<prom_port>/metrics | grep
> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count
> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
> 1.0
>
> Is it possible that
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count is being
> reported as an incrementing counter instead of a guage?
>
> Thanks
> -Andrew Otto
> Wikimedia Foundation
>
>