You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2011/09/08 06:29:09 UTC

[jira] [Commented] (OOZIE-103) GH-68: Better reporting/handling of problems in Hadoop

    [ https://issues.apache.org/jira/browse/OOZIE-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099801#comment-13099801 ] 

Hadoop QA commented on OOZIE-103:
---------------------------------

mislam77 remarked:
Objective :
---------------
1. The long term objective is: when hadoop is slow oozie should be able to  throttle the JT/NN load through submitting fewer jobs(e.g.). 

2. In short term, we want to instrument oozie so that it could report the response time of JT/NN  at any time. How will the value be meat or presented is not the scope of this short term goal.  

3. It is expected that the design to achieve the short term objective should be extend-able and reusable for long term objective.

Solution:
------------
Following ideas were discussed internally at Y! .
Approach 1:
Use a separate monitoring thread that will periodically ping with a representative command to the Hadoop server. For example,  in namenode, the thread will invoke "ls /tmp" like  command.

Pros & Cons :
*  This thread will add extra overhead to hadoop as well as to oozie.
* Find a representative command that would represent the actual health of hadoop might not be trivial.

Approach 2:
 When oozie calls to NN, JT, oozie could instrument that turn-around time. The benefit is: there  is no extra command sent.

Pros  & Cons :
* There are different types of commands and there normal response time also varied. In this case, oozie could restrict the instrumentation to a subset of commonly used commands. Each command type will have a different instrumented value.
 
* When oozie is idle, oozie might miss the data for that period. 

Comments please.

> GH-68: Better reporting/handling of problems in Hadoop
> ------------------------------------------------------
>
>                 Key: OOZIE-103
>                 URL: https://issues.apache.org/jira/browse/OOZIE-103
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Add instrumentation to track performance stats of NN and JT (how long to get directory listing on hdfs; how long to submit a job or query JT queue)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira