You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Jerome Boulon (JIRA)" <ji...@apache.org> on 2009/01/23 19:07:59 UTC

[jira] Created: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now (every 15 secs)

ChukwaAgent controller should retry to register for a longer period but not as frequent as now (every 15 secs)
--------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-5118
                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/chukwa
            Reporter: Jerome Boulon
            Assignee: Jerome Boulon


if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated HADOOP-5118:
------------------------------

      Resolution: Fixed
    Release Note: 
What is new in HADOOP-5118:

   - Reduced communication for log4j appender to chukwa agent, retry frequency to 48 retries and 30 minutes per retry.
          Status: Resolved  (was: Patch Available)

I just committed this.  Thanks Jerome.

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Status: Patch Available  (was: Open)

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Status: Patch Available  (was: Open)

Change default values for Chukwa Log4j Appender:
retryInterval = 1000 * 60 * 30;
numRetries = 48;

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672093#action_12672093 ] 

Ari Rabkin commented on HADOOP-5118:
------------------------------------

Ack.  I meant a comment in the source.

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch, HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672090#action_12672090 ] 

Jerome Boulon commented on HADOOP-5118:
---------------------------------------

Sure:-)
Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.

In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.


> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch, HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Summary: ChukwaAgent controller should retry to register for a longer period but not as frequent as now   (was: ChukwaAgent controller should retry to register for a longer period but not as frequent as now (every 15 secs))

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676068#action_12676068 ] 

Ari Rabkin commented on HADOOP-5118:
------------------------------------

Looks okay.  +1

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5118:
----------------------------------

    Status: Open  (was: Patch Available)

I agree with Ari; this should at least correct the comment above the fix. The indentation should also be 4 spaces/tab, not 8.

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch, HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672085#action_12672085 ] 

Ari Rabkin commented on HADOOP-5118:
------------------------------------

OK.  +1  In the future, it would be good to have comments explaining the logic behind the "magic numbers" in the code.

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch, HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667604#action_12667604 ] 

Hadoop QA commented on HADOOP-5118:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12398752/HADOOP-5118.patch
  against trunk revision 737944.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3759/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3759/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3759/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3759/console

This message is automatically generated.

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment:     (was: HADOOP-5118.patch)

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment: HADOOP-5118.patch

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment:     (was: HADOOP-5118.patch)

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677397#action_12677397 ] 

Hudson commented on HADOOP-5118:
--------------------------------

Integrated in Hadoop-trunk #767 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/767/])
    .  Reduced the retries to 30 minutes, and 48 retries.


> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment: HADOOP-5118-2.patch

- Add comment to the source code
- Fix indentation


> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment: HADOOP-5118.patch

upload the patch again so that it could be picked up by Hudson 

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch, HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment:     (was: HADOOP-5118.patch)

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675625#action_12675625 ] 

Hadoop QA commented on HADOOP-5118:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12400630/HADOOP-5118-2.patch
  against trunk revision 746340.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3897/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3897/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3897/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3897/console

This message is automatically generated.

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Description: 
Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.

In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

  was:
if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours



> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment: HADOOP-5118.patch

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5118) ChukwaAgent controller should retry to register for a longer period but not as frequent as now

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Attachment: HADOOP-5118.patch

Formatting needs to be fixed cf: HADOOP-4504

> ChukwaAgent controller should retry to register for a longer period but not as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118.patch
>
>
> if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer
> So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.