You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@eagle.apache.org by "Zhao, Qingwen (JIRA)" <ji...@apache.org> on 2017/05/17 11:04:04 UTC

[jira] [Created] (EAGLE-1024) Monitor jobs with high RPC throughput

Zhao, Qingwen created EAGLE-1024:
------------------------------------

             Summary: Monitor jobs with high RPC throughput 
                 Key: EAGLE-1024
                 URL: https://issues.apache.org/jira/browse/EAGLE-1024
             Project: Eagle
          Issue Type: Improvement
    Affects Versions: v0.5.0
            Reporter: Zhao, Qingwen


We've identified some jobs with high RPC throughput which causes the NN heavy RPC overhead. These jobs has requested extremely large HDFS operations in a very short window (2 mins).

So we tend to capture those jobs with:
a) the job has very large RPC throughput, using the job total HDFS ops/the job duration, if the throughput is larger than 1000
b) and if the HDFS ops per task is larger than 25
Then send out the alert out. Later, we will notify the users to optimize their jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)