You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Piper Piper <pi...@gmail.com> on 2020/09/10 21:07:45 UTC

Measure CPU utilization

Hello,

What is the best way to measure the CPU utilization of a TaskManager in
Flink, as opposed to using Linux's "top" command? Is querying the REST
endpoint http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=Status.JVM.CPU.Load\
the best option? Roman's reply (copied below) from the archives suggests
that it returns the CPU usage for the whole system including
other processes currently in the system, and would not give the CPU
utilization only of that Task Manager.

Based on Roman's reply that JVM.CPU.Time is a more clear indicator of CPU
usage, can you suggest how I would use it to calculate CPU utilization? Is
there any way I can get the CPU utilization for a Job that is distributed
over several nodes in the cluster?

Also, what is the difference between the two REST API endpoints below:

1. http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=Status.JVM.CPU.Load\
2. http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=System.CPU.Usage\

Thanks,

Piper

Hi,

JVM.CPU.Load is just a wrapper (MetricUtils.instantiateCPUMetrics) on
top of OperatingSystemMXBean.getProcessCpuLoad (see
https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0
<https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad%3Chttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0>>())

Usually it looks weird if you have multiple CPU cores. For example, if
you have a job with a single slot 100% utilizing a single CPU core on
a 8 core machine, the JVM.CPU.Load will be 1.0/8.0 = 0.125. It's also
a point-in-time snapshot of current CPU usage, so if you're collecting
your metrics every minute, and the job has spiky workload within this
minute (like it's idle almost always and once in a minute it consumes
100% CPU for one second), so you have a chance to completely miss this
from the metrics.

As for me personally, JVM.CPU.Time is more clear indicator of CPU
usage, which is always increasing amount of milliseconds CPU spent
executing your code. And it will also catch CPU usage spikes.

Roman Grebennikov | grv@dfdx.me<ma...@dfdx.me>

Re: Measure CPU utilization

Posted by Robert Metzger <rm...@apache.org>.
Hi Piper,

I personally like looking at the system load (if Flink is the only major
process on the system). It nicely captures the "stress" Flink puts on the
system (this would be the "System.CPU.Load5min class of metrics") (there
are a lot of articles about understanding linux load averages)

I don't think there's something built into Flink for getting the CPU
utilization across the cluster.

For the difference in the REST endpoints:
According to the Flink documentation (1) captures the process CPU usage
(with the issue Roman described), (2) captures the overall system CPU usage
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#cpu

Best,
Robert


On Thu, Sep 10, 2020 at 11:08 PM Piper Piper <pi...@gmail.com> wrote:

> Hello,
>
> What is the best way to measure the CPU utilization of a TaskManager in
> Flink, as opposed to using Linux's "top" command? Is querying the REST
> endpoint http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=Status.JVM.CPU.Load\
> the best option? Roman's reply (copied below) from the archives suggests
> that it returns the CPU usage for the whole system including
> other processes currently in the system, and would not give the CPU
> utilization only of that Task Manager.
>
> Based on Roman's reply that JVM.CPU.Time is a more clear indicator of CPU
> usage, can you suggest how I would use it to calculate CPU utilization? Is
> there any way I can get the CPU utilization for a Job that is distributed
> over several nodes in the cluster?
>
> Also, what is the difference between the two REST API endpoints below:
>
> 1. http://
> <IP>:<port>/taskmanagers/<TM_ID>/metrics?get=Status.JVM.CPU.Load\
> 2. http://<IP>:<port>/taskmanagers/<TM_ID>/metrics?get=System.CPU.Usage\
>
> Thanks,
>
> Piper
>
> Hi,
>
> JVM.CPU.Load is just a wrapper (MetricUtils.instantiateCPUMetrics) on top of OperatingSystemMXBean.getProcessCpuLoad (see https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0 <https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad%3Chttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fjre%2Fapi%2Fmanagement%2Fextension%2Fcom%2Fsun%2Fmanagement%2FOperatingSystemMXBean.html%23getProcessCpuLoad&data=01%7C01%7C%7Ce32e547897104433cdef08d83eae5912%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=1GFnINqDDVLZGLUQnFMEz7W%2Fcnm36HnViOsVpEikrVE%3D&reserved=0>>())
>
> Usually it looks weird if you have multiple CPU cores. For example, if you have a job with a single slot 100% utilizing a single CPU core on a 8 core machine, the JVM.CPU.Load will be 1.0/8.0 = 0.125. It's also a point-in-time snapshot of current CPU usage, so if you're collecting your metrics every minute, and the job has spiky workload within this minute (like it's idle almost always and once in a minute it consumes 100% CPU for one second), so you have a chance to completely miss this from the metrics.
>
> As for me personally, JVM.CPU.Time is more clear indicator of CPU usage, which is always increasing amount of milliseconds CPU spent executing your code. And it will also catch CPU usage spikes.
>
> Roman Grebennikov | grv@dfdx.me<ma...@dfdx.me>
>
>