You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/08 12:04:20 UTC
[jira] [Commented] (FLINK-3660) Measure latency of elements and
expose it through web interface
[ https://issues.apache.org/jira/browse/FLINK-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473681#comment-15473681 ]
ASF GitHub Bot commented on FLINK-3660:
---------------------------------------
Github user rmetzger commented on the issue:
https://github.com/apache/flink/pull/2386
@aljoscha I updated the pull request, which now builds green (just rebasing fixed the issues).
I also keep metrics now per operator, not only for the sinks.
> Measure latency of elements and expose it through web interface
> ---------------------------------------------------------------
>
> Key: FLINK-3660
> URL: https://issues.apache.org/jira/browse/FLINK-3660
> Project: Flink
> Issue Type: Sub-task
> Components: Streaming
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Fix For: pre-apache
>
>
> It would be nice to expose the end-to-end latency of a streaming job in the webinterface.
> To achieve this, my initial thought was to attach an ingestion-time timestamp at the sources to each record.
> However, this introduces overhead for a monitoring feature users might not even use (8 bytes for each element + System.currentTimeMilis() on each element).
> Therefore, I suggest to implement this feature by periodically sending special events, similar to watermarks through the topology.
> Those {{LatencyMarks}} are emitted at a configurable interval at the sources and forwarded by the tasks. The sinks will compare the timestamp of the latency marks with their current system time to determine the latency.
> The latency marks will not add to the latency of a job, but the marks will be delayed similarly than regular records, so their latency will approximate the record latency.
> Above suggestion expects the clocks on all taskmanagers to be in sync. Otherwise, the measured latencies would also include the offsets between the taskmanager's clocks.
> In a second step, we can try to mitigate the issue by using the JobManager as a central timing service. The TaskManagers will periodically query the JM for the current time in order to determine the offset with their clock.
> This offset would still include the network latency between TM and JM but it would still lead to reasonably good estimations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)