You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Helmut Zechmann <he...@adeven.com> on 2018/08/17 15:14:29 UTC
Operator metrics do not get unregistered after job finishes
Hi all,
we are using flink 1.5.2 in batch mode with prometheus monitoring.
We noticed that a few metrics do not get unregistered after a job is finished:
flink_taskmanager_job_task_operator_numRecordsIn
flink_taskmanager_job_task_operator_numRecordsInPerSecond
flink_taskmanager_job_task_operator_numRecordsOut
flink_taskmanager_job_task_operator_numRecordsOutPerSecond
Those metrics stay in the taksmanager metrics list until the task manger gets restarted.
Our metrics config is:
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 7000-7001
metrics.scope.jm: flink.<host>.jobmanager
metrics.scope.tm: flink.<host>.taskmanager.<tm_id>
metrics.scope.jm.job: flink.<host>.jobmanager.<job_name>
metrics.scope.tm.job: flink.<host>.taskmanager.<tm_id>.<job_name>
metrics.scope.task: flink.<host>.taskmanager.<tm_id>.<job_name>.<task_name>.<subtask_index>
metrics.scope.operator: flink.<host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>
Since we run many batch jobs, this makes prometheus monitoring unusable for us. Is this a known issue?
Best,
Helmut
Re: Operator metrics do not get unregistered after job finishes
Posted by Helmut Zechmann <he...@adeven.com>.
Hi Vino,
The log shows no problems. The problem can be reproduced easily. I created https://issues.apache.org/jira/browse/FLINK-10300 <https://issues.apache.org/jira/browse/FLINK-10300>.
Best,
Helmut
> On 18. Aug 2018, at 04:53, vino yang <ya...@gmail.com> wrote:
>
> Hi Helmut,
>
> Is the metrics of all the sub task instances of a job not unregistered, or part of it is not unregistered. Is there any exception log information available?
>
> Please feel free to create a JIRA issue and clearly describe your problem.
>
> Thanks, vino.
>
> Helmut Zechmann <helmut@adeven.com <ma...@adeven.com>> 于2018年8月17日周五 下午11:14写道:
> Hi all,
>
>
> we are using flink 1.5.2 in batch mode with prometheus monitoring.
>
> We noticed that a few metrics do not get unregistered after a job is finished:
>
> flink_taskmanager_job_task_operator_numRecordsIn
> flink_taskmanager_job_task_operator_numRecordsInPerSecond
> flink_taskmanager_job_task_operator_numRecordsOut
> flink_taskmanager_job_task_operator_numRecordsOutPerSecond
>
>
> Those metrics stay in the taksmanager metrics list until the task manger gets restarted.
>
> Our metrics config is:
>
> metrics.reporters: prom
> metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.prom.port: 7000-7001
>
> metrics.scope.jm <http://metrics.scope.jm/>: flink.<host>.jobmanager
> metrics.scope.tm <http://metrics.scope.tm/>: flink.<host>.taskmanager.<tm_id>
> metrics.scope.jm.job: flink.<host>.jobmanager.<job_name>
> metrics.scope.tm.job: flink.<host>.taskmanager.<tm_id>.<job_name>
> metrics.scope.task: flink.<host>.taskmanager.<tm_id>.<job_name>.<task_name>.<subtask_index>
> metrics.scope.operator: flink.<host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>
>
>
> Since we run many batch jobs, this makes prometheus monitoring unusable for us. Is this a known issue?
>
>
> Best,
>
> Helmut
Re: Operator metrics do not get unregistered after job finishes
Posted by vino yang <ya...@gmail.com>.
Hi Helmut,
Is the metrics of all the sub task instances of a job not unregistered, or
part of it is not unregistered. Is there any exception log information
available?
Please feel free to create a JIRA issue and clearly describe your problem.
Thanks, vino.
Helmut Zechmann <he...@adeven.com> 于2018年8月17日周五 下午11:14写道:
> Hi all,
>
>
> we are using flink 1.5.2 in batch mode with prometheus monitoring.
>
> We noticed that a few metrics do not get unregistered after a job is
> finished:
>
> flink_taskmanager_job_task_operator_numRecordsIn
> flink_taskmanager_job_task_operator_numRecordsInPerSecond
> flink_taskmanager_job_task_operator_numRecordsOut
> flink_taskmanager_job_task_operator_numRecordsOutPerSecond
>
>
> Those metrics stay in the taksmanager metrics list until the task manger
> gets restarted.
>
> Our metrics config is:
>
> metrics.reporters: prom
> metrics.reporter.prom.class:
> org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.prom.port: 7000-7001
>
> metrics.scope.jm: flink.<host>.jobmanager
> metrics.scope.tm: flink.<host>.taskmanager.<tm_id>
> metrics.scope.jm.job: flink.<host>.jobmanager.<job_name>
> metrics.scope.tm.job: flink.<host>.taskmanager.<tm_id>.<job_name>
> metrics.scope.task:
> flink.<host>.taskmanager.<tm_id>.<job_name>.<task_name>.<subtask_index>
> metrics.scope.operator:
> flink.<host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>
>
>
> Since we run many batch jobs, this makes prometheus monitoring unusable
> for us. Is this a known issue?
>
>
> Best,
>
> Helmut