You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2011/09/08 06:41:09 UTC
[jira] [Created] (OOZIE-144) GH-118: JT/NN backoff if response time
over threshold
GH-118: JT/NN backoff if response time over threshold
-----------------------------------------------------
Key: OOZIE-144
URL: https://issues.apache.org/jira/browse/OOZIE-144
Project: Oozie
Issue Type: Bug
Reporter: Hadoop QA
If the JT/NN and overloaded Oozie should back-off temporary.
This can be done in the HadoopAccessorService.
Because JT/NN does not provide and API to find out the current health this has to be determined using API calls that do a known/fixed amount of work. For example for JT asking for the queue names, for NN asking for the contents of the root directory.
A tool that queries this values should be run against the cluster to find the normal values an values under stress. This would help to determine the threshold value for Oozie.
Oozie, before using a JT/NN handle (JobClient/FileSystem) will test the response time, if the response time is above the threshold Oozie will backoff for # seconds and will not attempt any call to the cluster.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (OOZIE-144) GH-118: JT/NN backoff if response
time over threshold
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OOZIE-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hadoop QA resolved OOZIE-144.
-----------------------------
Resolution: Fixed
> GH-118: JT/NN backoff if response time over threshold
> -----------------------------------------------------
>
> Key: OOZIE-144
> URL: https://issues.apache.org/jira/browse/OOZIE-144
> Project: Oozie
> Issue Type: Bug
> Reporter: Hadoop QA
>
> If the JT/NN and overloaded Oozie should back-off temporary.
> This can be done in the HadoopAccessorService.
> Because JT/NN does not provide and API to find out the current health this has to be determined using API calls that do a known/fixed amount of work. For example for JT asking for the queue names, for NN asking for the contents of the root directory.
> A tool that queries this values should be run against the cluster to find the normal values an values under stress. This would help to determine the threshold value for Oozie.
> Oozie, before using a JT/NN handle (JobClient/FileSystem) will test the response time, if the response time is above the threshold Oozie will backoff for # seconds and will not attempt any call to the cluster.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (OOZIE-144) GH-118: JT/NN backoff if response
time over threshold
Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OOZIE-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Shaposhnik reopened OOZIE-144:
------------------------------------
> GH-118: JT/NN backoff if response time over threshold
> -----------------------------------------------------
>
> Key: OOZIE-144
> URL: https://issues.apache.org/jira/browse/OOZIE-144
> Project: Oozie
> Issue Type: Bug
> Reporter: Hadoop QA
>
> If the JT/NN and overloaded Oozie should back-off temporary.
> This can be done in the HadoopAccessorService.
> Because JT/NN does not provide and API to find out the current health this has to be determined using API calls that do a known/fixed amount of work. For example for JT asking for the queue names, for NN asking for the contents of the root directory.
> A tool that queries this values should be run against the cluster to find the normal values an values under stress. This would help to determine the threshold value for Oozie.
> Oozie, before using a JT/NN handle (JobClient/FileSystem) will test the response time, if the response time is above the threshold Oozie will backoff for # seconds and will not attempt any call to the cluster.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira