You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Kruse, Sebastian" <Se...@hpi.de> on 2014/08/19 11:08:51 UTC

Job Profiling

Hi everyone,

I want to profile my flink jobs to find bottlenecks. I read the issue https://issues.apache.org/jira/browse/FLINK-964 and my question is whether there are currently ongoing efforts to bring the profiling data to the web frontend.

Additionally, I was thinking of some kind of logical profiling, that measures the elements (like tuples) being passed among the operators. That way one could better understand the properties of intermediate data, e.g., join cardinalities. Plotting these data against a time axis, one would come up with something like a data flow profile of the job. However, before engaging in creating such profiles, I wanted to ask you if the system already keeps track of such data. For instance, the job history graphs provide something similar, but the scheduling states of tasks are not necessarily identical to the data flow through them.
I am happy for any comments!

Cheers,
Sebastian

RE: Job Profiling

Posted by "Kruse, Sebastian" <Se...@hpi.de>.
Thanks, I will have a look at it and see if I can figure out how to do that.

-----Original Message-----
From: ewenstephan@gmail.com [mailto:ewenstephan@gmail.com] On Behalf Of Stephan Ewen
Sent: Dienstag, 19. August 2014 16:07
To: dev@flink.incubator.apache.org
Subject: Re: Job Profiling

Hi Sebastian!

There is some profiling code that was used by previous versions of Flink (Stratosphere). The profiling works, but there is currently nothing that displays the profiling data.

It would be a great addition to integrate displaying the profiling code in the web frontend, or making it available for download.

Have a look at those classes here:
 - JobManager side :
https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/profiling/impl/JobManagerProfilerImpl.java
 - TaskManager sied :
https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/profiling/impl/TaskManagerProfilerImpl.java

Daniel Warneke authored those, maybe he can chime in and give a few pointers

Greetings,
Stephan




On Tue, Aug 19, 2014 at 11:08 AM, Kruse, Sebastian <Se...@hpi.de>
wrote:

> Hi everyone,
>
> I want to profile my flink jobs to find bottlenecks. I read the issue
> https://issues.apache.org/jira/browse/FLINK-964 and my question is 
> whether there are currently ongoing efforts to bring the profiling 
> data to the web frontend.
>
> Additionally, I was thinking of some kind of logical profiling, that 
> measures the elements (like tuples) being passed among the operators. 
> That way one could better understand the properties of intermediate 
> data, e.g., join cardinalities. Plotting these data against a time 
> axis, one would come up with something like a data flow profile of the 
> job. However, before engaging in creating such profiles, I wanted to 
> ask you if the system already keeps track of such data. For instance, 
> the job history graphs provide something similar, but the scheduling 
> states of tasks are not necessarily identical to the data flow through them.
> I am happy for any comments!
>
> Cheers,
> Sebastian
>

Re: Job Profiling

Posted by Stephan Ewen <se...@apache.org>.
Hi Sebastian!

There is some profiling code that was used by previous versions of Flink
(Stratosphere). The profiling works, but there is currently nothing that
displays the profiling data.

It would be a great addition to integrate displaying the profiling code in
the web frontend, or making it available for download.

Have a look at those classes here:
 - JobManager side :
https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/profiling/impl/JobManagerProfilerImpl.java
 - TaskManager sied :
https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/profiling/impl/TaskManagerProfilerImpl.java

Daniel Warneke authored those, maybe he can chime in and give a few pointers

Greetings,
Stephan




On Tue, Aug 19, 2014 at 11:08 AM, Kruse, Sebastian <Se...@hpi.de>
wrote:

> Hi everyone,
>
> I want to profile my flink jobs to find bottlenecks. I read the issue
> https://issues.apache.org/jira/browse/FLINK-964 and my question is
> whether there are currently ongoing efforts to bring the profiling data to
> the web frontend.
>
> Additionally, I was thinking of some kind of logical profiling, that
> measures the elements (like tuples) being passed among the operators. That
> way one could better understand the properties of intermediate data, e.g.,
> join cardinalities. Plotting these data against a time axis, one would come
> up with something like a data flow profile of the job. However, before
> engaging in creating such profiles, I wanted to ask you if the system
> already keeps track of such data. For instance, the job history graphs
> provide something similar, but the scheduling states of tasks are not
> necessarily identical to the data flow through them.
> I am happy for any comments!
>
> Cheers,
> Sebastian
>