You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Guido <gm...@gmail.com> on 2016/02/04 22:25:05 UTC

GC on TaskManagers stats

Hello,

I have few questions regarding garbage collector’s stats on Taskmanagers and any help or further documentation would be great.
I have collected “1 second polling requesting" stats on 7 Taskmanagers, through the relative request (/taskmanagers/<idtaskmanager>/) of the Monitoring REST API  while a job, that overall took 38 seconds, was running.

This way got 38 records for each TaskManager and focusing on garbage collector’s stats I can see, for example on 1 of the 38th records:

- PS-Scavenge.Time: 2597, PS-MarkSweep.Time: 29016; 
1. Is It correct to assume they represent the total elapsed time on different GCs (respectively young and old gen)? So, I basically got a running sum distribution?
2. If yes, values are in mills, so 29 sec?

3. Could they be used to get how much time has been wasted in total because of the “Stop-the-world” GCs policy?

Finally, on the same record:

- PS-Scavenge.Count: 3, PS-MarkSweep.Time: 5, load: 3.73.

4. Is it the “load” value tightly related?

Sorry if it has been quite long and thanks a lot.

Guido

 

Re: GC on TaskManagers stats

Posted by Robert Metzger <rm...@apache.org>.
Hi Guido,

sorry for the late reply. You were collecting the stats every 1 second.
Afaik, Flink is internally collecting the stats with a frequency of 5
seconds, so you can either change your or Flink's polling interval (I think
its taskmanager.heartbeat-interval)

Regarding the details on PS-Scavenge, MarkSweep etc.: We just use the names
the Java management beans return, so you can just google for the names and
read how to interpret them. For example:
http://www.ibm.com/developerworks/library/j-jtp11253/

The load is the operating system load.



On Thu, Feb 4, 2016 at 10:25 PM, Guido <gm...@gmail.com> wrote:

> Hello,
>
> I have few questions regarding garbage collector’s stats on Taskmanagers
> and any help or further documentation would be great.
> I have collected “1 second polling requesting" stats on 7 Taskmanagers,
> through the relative request (/taskmanagers/<idtaskmanager>/) of the
> Monitoring REST API  while a job, that overall took 38 seconds, was
> running.
>
> This way got 38 records for each TaskManager and focusing on garbage
> collector’s stats I can see, for example on 1 of the 38th records:
>
> - PS-Scavenge.Time: 2597, PS-MarkSweep.Time: 29016;
> 1. Is It correct to assume they represent the total elapsed time on
> different GCs (respectively young and old gen)? So, I basically got a
> running sum distribution?
> 2. If yes, values are in mills, so 29 sec?
>
> 3. Could they be used to get how much time has been wasted in total
> because of the “Stop-the-world” GCs policy?
>
> Finally, on the same record:
>
> - PS-Scavenge.Count: 3, PS-MarkSweep.Time: 5, load: 3.73.
>
> 4. Is it the “load” value tightly related?
>
> Sorry if it has been quite long and thanks a lot.
>
> Guido
>
>
>