You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/15 17:01:04 UTC

[jira] [Commented] (FLUME-3092) Extend the FileChannel's monitoring metrics

    [ https://issues.apache.org/jira/browse/FLUME-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010907#comment-16010907 ] 

ASF GitHub Bot commented on FLUME-3092:
---------------------------------------

GitHub user adenes opened a pull request:

    https://github.com/apache/flume/pull/131

    FLUME-3092. Extend the FileChannel's monitoring metrics

    This patch adds the following new metrics to the FileChannel's counters:
    - eventPutErrorCount: incremented if an IOException occurs during put operation.
    - eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
      during take operation.
    - checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
    - unhealthy: this flag represents whether the channel has started successfully
      (i.e. the replay ran without any problem), so the channel is capable for normal operation
    - closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/adenes/flume FLUME-3092

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flume/pull/131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #131
    
----
commit 7c5957e4692817482519e6b9da20d29324a7f332
Author: Denes Arvay <de...@cloudera.com>
Date:   2017-05-09T14:23:31Z

    FLUME-3092. Extend the FileChannel's monitoring metrics
    
    This patch adds the following new metrics to the FileChannel's counters:
    - eventPutErrorCount: incremented if an IOException occurs during put operation.
    - eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
      during take operation.
    - checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
    - unhealthy: this flag represents whether the channel has started successfully
      (i.e. the replay ran without any problem), so the channel is capable for normal operation
    - closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

----


> Extend the FileChannel's monitoring metrics
> -------------------------------------------
>
>                 Key: FLUME-3092
>                 URL: https://issues.apache.org/jira/browse/FLUME-3092
>             Project: Flume
>          Issue Type: Improvement
>          Components: File Channel
>    Affects Versions: 1.7.0
>            Reporter: Denes Arvay
>            Assignee: Denes Arvay
>
> There are already several generic metrics (e.g. {{eventPutAttemptCount}} and {{eventPutSuccessCount}}) which can be used to create compound metrics for monitoring the FileChannel's health.
> Some monitoring system's aren't capable to calculate such derived metrics, though, so I recommend to add the following extra counters to represent if a channel operation failed or the channel is in an unhealthy state.
> - {{eventPutErrorCount}}: incremented if an {{IOException}} occurs during {{put}} operation.
> - {{eventTakeErrorCount}}: incremented if an {{IOException}} or {{CorruptEventException}} occurs during {{take}} operation.
> - {{checkpointWriteErrorCount}}: incremented if an exception occurs during checkpoint write.
> - {{unhealthy}}: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem). This is similar to the already existing {{open}} flag except that the latter is initially false and is set to {{true}} if the initialization (including the log replay) is successfully done. The {{unhealthy}}, in contrary, is {{false}} by default and is set to {{true}} if there is an error during startup.
> Beside these flags I'd also introduce a {{closed}} flag which is the numeric representation (1: closed, 0: open) of the negated (already existing) {{open}} flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)