You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2014/12/05 20:09:12 UTC

[jira] [Updated] (FLINK-456) Optional runtime statistics collection

     [ https://issues.apache.org/jira/browse/FLINK-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Metzger updated FLINK-456:
---------------------------------
    Reporter: Fabian Hueske  (was: GitHub Import)

> Optional runtime statistics collection
> --------------------------------------
>
>                 Key: FLINK-456
>                 URL: https://issues.apache.org/jira/browse/FLINK-456
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Fabian Hueske
>              Labels: github-import
>             Fix For: pre-apache
>
>
> The engine should collect job execution statistics (e.g., via accumulators) such as:
> - total number of input / output records per operator
> - histogram of input/output ratio of UDF calls
> - histogram of number of input records per reduce / cogroup UDF call
> - histogram of number of output records per UDF call
> - histogram of time spend in UDF calls
> - number of local and remote bytes read (not via accumulators)
> - ...
> These stats should be made available to the user after execution (via webfrontend). The purpose of this feature is to ease performance debugging of parallel jobs (e.g., to detect data skew).
> It should be possible to deactivate (or activate) the gathering of these statistics.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/456
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, runtime, user satisfaction, 
> Created at: Tue Feb 04 20:32:49 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)