You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2017/04/04 10:02:41 UTC
[jira] [Commented] (FLINK-4840) Measure latency of record
processing and expose it as a metric
[ https://issues.apache.org/jira/browse/FLINK-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954938#comment-15954938 ]
Chesnay Schepler commented on FLINK-4840:
-----------------------------------------
I may have found a suitable implementation alternative:
The key problem in the existing approach is that it calculates the time taken for every invocation of the method, which is just to expensive since this requires 2 time measurements (which should also use nanoTime which is even more expensive), as well as using a histogram.
My idea would be to
* no longer create a histogram since this can be done easily outside of Flink and only provide raw time measurements
* not measure the time for every call, but instead only a fixed number of times over a period of time. We already have all tools that we require for this, the View interface.
We can generalize the details in a new Timer interface:
{code}
public interface Timer implements Metric {
void start();
void end();
long getTime(); // last measure time
}
{code}
The following TimerView implementation relies on the View interface to be regularly (every 5 seconds) enabled using the update() method.
If the TimerView is not enabled start() and stop() are no-ops. If it is enabled it will take a single measurement.
The implementation could look like this:
{code}
public class TimerView implements Timer, View {
private boolean enabled = false;
private long startTime = 0;
private long lastMeasurement = -1;
public void update() {
enabled = true;
}
public void start() {
if (enabled) {
startTime = System.nanoTime();
}
}
public void stop() {
if (enabled) {
lastMeasurement = System.nanoTime() - startTime; // convert to millis or smth
enabled = false;
}
}
public long getTime() {
return lastMeasurement;
}
}
{code}
I quickly threw this together so here are of course some details missing, like what happens when stop() is never called and such.
But the general approach seems reasonable to me; tell me what you think.
> Measure latency of record processing and expose it as a metric
> --------------------------------------------------------------
>
> Key: FLINK-4840
> URL: https://issues.apache.org/jira/browse/FLINK-4840
> Project: Flink
> Issue Type: Improvement
> Components: Metrics
> Reporter: zhuhaifeng
> Assignee: zhuhaifeng
> Priority: Minor
>
> We should expose the following Metrics on the TaskIOMetricGroup:
> 1. recordProcessLatency(ms): Histogram measuring the processing time per record of a task. It is the processing time of chain if a chained task.
> 2. recordProcTimeProportion(ms): Meter measuring the proportion of record processing time for infor whether the main cost
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)