You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Lydia Ickler <ic...@googlemail.com> on 2016/12/21 18:55:03 UTC

Monitoring REST API

Hi all,

I have a question regarding the Monitoring REST API;

I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document <https://hal.inria.fr/hal-01347638v2/document>)
From the JSON file at http:master:8081/jobs/jobid/ I get a summary including the information of read/write records and read/write bytes.
Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am running my program on a cluster with up to 32 nodes.

Where can I find the values for e.g. CPU or Network?

Thanks in advance!
Lydia


Re: Monitoring REST API

Posted by Shannon Carey <sc...@expedia.com>.
Although Flink exposes some metrics in the API/UI, it probably only does that because it was easy to do and convenient for users. However, I don't think Flink is intended to be a complete monitoring solution for your cluster.

Instead, you should take a look at collectd https://collectd.org/ which is meant for monitoring OS-level metrics and has, for example, a Graphite plugin which you can use to write to a Graphite server or statsd instance… or you can integrate it some other way depending on what you have & what you want.

-Shannon

From: Lydia Ickler <ic...@googlemail.com>>
Date: Wednesday, December 21, 2016 at 12:55 PM
To: <us...@flink.apache.org>>
Subject: Monitoring REST API

Hi all,

I have a question regarding the Monitoring REST API;

I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document)
From the JSON file at http:master:8081/jobs/jobid/ I get a summary including the information of read/write records and read/write bytes.
Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am running my program on a cluster with up to 32 nodes.

Where can I find the values for e.g. CPU or Network?

Thanks in advance!
Lydia


Re: Monitoring REST API

Posted by Ovidiu-Cristian MARCU <ov...@inria.fr>.
Hi Lydia,

I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average over multiple nodes.

1)So for each node you can collect the sar output, and obtain for example:

Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr) 	2016-01-27 	_x86_64_	(16 CPU)
12:54:09        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:54:10        all      4.63      0.00      3.25      0.13      0.00     91.99
12:54:09    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact
12:54:10    129538812   2525308      1.91      1292     85876   3662636      2.69   2111652     55132
12:54:09          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:54:10          sda     28.71   2708.91     87.13     97.38      0.03      1.10      0.97      2.77
12:54:09        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
12:54:10         eth0    632.67    587.13   3173.60     58.47      0.00      0.00      0.00

2) Calculate the average over your nodes (sync clocks) and obtain a final output over which you run some plot scripts:

LINE      DATE      FILENAME                 CPU_user  CPU_SYS   KBMEMFREE KBMEMUSED MEMUSED   DISK_UTIL DISK_RKBs DISK_WKBs _IO_RSTs  _IO_WSTs
1         12:54:10  res1Avg                  6.12      1.25      129554704 2509412   1.90      6.00      4253.63   87.04     3944.00   88.00     
2         12:54:11  res1Avg                  3.41      0.28      129523432 2540690   1.92      4.00      2335.82   51.62     2692.00   0.00      
3         12:54:12  res1Avg                  0.06      0.03      129522000 2542120   1.92      1.60      0.16      0.59      2048.00   32.00     
4         12:54:13  res1Avg                  0.09      0.06      129520936 2543182   1.92      0.60      0.19      0.59      2048.00   0.00      
5         12:54:14  res1Avg                  0.06      0.06      129518448 2545670   1.93      6.80      4.31      169.47    4044.00   16.00     

For other metrics specific to Flink’s execution you may need to rely on various metrics Flink is currently exposing.

Best,
Ovidiu

> On 21 Dec 2016, at 19:55, Lydia Ickler <ic...@googlemail.com> wrote:
> 
> Hi all,
> 
> I have a question regarding the Monitoring REST API;
> 
> I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document <https://hal.inria.fr/hal-01347638v2/document>)
> From the JSON file at http:master:8081/jobs/jobid/ I get a summary including the information of read/write records and read/write bytes.
> Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am running my program on a cluster with up to 32 nodes.
> 
> Where can I find the values for e.g. CPU or Network?
> 
> Thanks in advance!
> Lydia
>