You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2017/04/04 10:02:41 UTC

[jira] [Commented] (FLINK-4840) Measure latency of record processing and expose it as a metric

    [ https://issues.apache.org/jira/browse/FLINK-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954938#comment-15954938 ] 

Chesnay Schepler commented on FLINK-4840:
-----------------------------------------

I may have found a suitable implementation alternative:

The key problem in the existing approach is that it calculates the time taken for every invocation of the method, which is just to expensive since this requires 2 time measurements (which should also use nanoTime which is even more expensive), as well as using a histogram.

My idea would be to
* no longer create a histogram since this can be done easily outside of Flink and only provide raw time measurements
* not measure the time for every call, but instead only a fixed number of times over a period of time. We already have all tools that we require for this, the View interface.

We can generalize the details in a new Timer interface:
{code}
public interface Timer implements Metric {
	void start();
	void end();
	long getTime(); // last measure time
}
{code}

The following TimerView implementation relies on the View interface to be regularly (every 5 seconds) enabled using the update() method.
If the TimerView is not enabled start() and stop() are no-ops. If it is enabled it will take a single measurement.

The implementation could look like this:
{code}
public class TimerView implements Timer, View {
	private boolean enabled = false;
	private long startTime = 0;
	private long lastMeasurement = -1;

	public void update() {
		enabled = true;
	}

	public void start() {
		if (enabled) {
 			startTime = System.nanoTime();
		}
	}

	public void stop() {
		if (enabled) {
			lastMeasurement = System.nanoTime() - startTime; // convert to millis or smth
			enabled = false;
		}
	}

	public long getTime() {
		return lastMeasurement;
	}
}
{code}

I quickly threw this together so here are of course some details missing, like what happens when stop() is never called and such.

But the general approach seems reasonable to me; tell me what you think.

> Measure latency of record processing and expose it as a metric
> --------------------------------------------------------------
>
>                 Key: FLINK-4840
>                 URL: https://issues.apache.org/jira/browse/FLINK-4840
>             Project: Flink
>          Issue Type: Improvement
>          Components: Metrics
>            Reporter: zhuhaifeng
>            Assignee: zhuhaifeng
>            Priority: Minor
>
> We should expose the following Metrics on the TaskIOMetricGroup:
> 1. recordProcessLatency(ms): Histogram measuring the processing time per record of a task. It is the processing time of chain if a chained task.  
> 2. recordProcTimeProportion(ms): Meter measuring the proportion of record processing time for infor whether the main cost



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)