You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@eagle.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/19 05:42:04 UTC

[jira] [Commented] (EAGLE-1024) Monitor jobs with high RPC throughput

    [ https://issues.apache.org/jira/browse/EAGLE-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016919#comment-16016919 ] 

ASF GitHub Bot commented on EAGLE-1024:
---------------------------------------

GitHub user qingwen220 opened a pull request:

    https://github.com/apache/eagle/pull/938

    EAGLE-1024: Monitor jobs with high RPC throughput

    https://issues.apache.org/jira/browse/EAGLE-1024
    
    * add job RPC data in MAP_REDUCE_JOB_STREAM 
    * refactor org.apache.eagle.jpm.analyzer.publisher.EmailPublisher 
    * support new config 'application.analyzerReport.alertLevel' to define alert level

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/qingwen220/eagle EAGLE-1024

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/eagle/pull/938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #938
    
----
commit 5beddd6c55512570f20532f15d42d961b442cecf
Author: Zhao, Qingwen <qi...@apache.org>
Date:   2017-05-18T15:02:57Z

    add JobRpcAnalysisEvaluator

commit 9011e49fbd7d82b83e455dfdceebf1b4d1359eef
Author: Zhao, Qingwen <qi...@apache.org>
Date:   2017-05-19T05:29:20Z

    refactor EmailPublisher.java

----


> Monitor jobs with high RPC throughput 
> --------------------------------------
>
>                 Key: EAGLE-1024
>                 URL: https://issues.apache.org/jira/browse/EAGLE-1024
>             Project: Eagle
>          Issue Type: Improvement
>    Affects Versions: v0.5.0
>            Reporter: Zhao, Qingwen
>            Assignee: Zhao, Qingwen
>
> We've identified some jobs with high RPC throughput which causes the NN heavy RPC overhead. These jobs has requested extremely large HDFS operations in a very short window (2 mins).
> So we tend to capture those jobs with:
> a) the job has very large RPC throughput, using the job total HDFS ops/the job duration, if the throughput is larger than 1000
> b) and if the HDFS ops per task is larger than 25
> Then send out the alert out. Later, we will notify the users to optimize their jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)