You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2011/05/13 17:54:47 UTC

[jira] [Created] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason

The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason
-----------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-2495
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: distributed-cache
    Affects Versions: 0.21.0
            Reporter: Robert Joseph Evans
            Assignee: Robert Joseph Evans
            Priority: Minor


The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Attachment: MAPREDUCE-2495-v3.patch
                MAPREDUCE-2495-20.20X-V3.patch

Incorporated Owens Comments. 

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036373#comment-13036373 ] 

Hadoop QA commented on MAPREDUCE-2495:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12479790/MAPREDUCE-2495-v3.patch
  against trunk revision 1124553.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//console

This message is automatically generated.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Open  (was: Patch Available)

Will add in new patches incorporating new comments

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Patch Available  (was: Open)

> The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035507#comment-13035507 ] 

Hadoop QA commented on MAPREDUCE-2495:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12479597/MAPREDUCE-2495-v2.patch
  against trunk revision 1104687.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//console

This message is automatically generated.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034536#comment-13034536 ] 

Hadoop QA commented on MAPREDUCE-2495:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12479365/MAPREDUCE-2495-v1.patch
  against trunk revision 1103921.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//console

This message is automatically generated.

> The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated MAPREDUCE-2495:
-------------------------------------

    Fix Version/s: 0.20.204.0

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>             Fix For: 0.20.204.0, 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Open  (was: Patch Available)

Chris indicated as a side comment in a different conversation that the sleeps in the tests are not very good, so I am reworking the tests to avoid using sleep.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Patch Available  (was: Open)

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035142#comment-13035142 ] 

Owen O'Malley commented on MAPREDUCE-2495:
------------------------------------------

There are lots of places where we do it wrong, but in general HDFS is better.

http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java look at the ReplicationMonitor.


> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Patch Available  (was: Open)

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035568#comment-13035568 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

The contrib test issues are with RAID, and appear to be a known issue.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-2495:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.0
                   0.20.205.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Robert!

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Open  (was: Patch Available)

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039168#comment-13039168 ] 

Hudson commented on MAPREDUCE-2495:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #690 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/690/])
    MAPREDUCE-2495. exit() the TaskTracker when the distributed cache cleanup
thread dies. Contributed by Robert Joseph Evans

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127361
Files : 
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapreduce/filecache/TestTrackerDistributedCacheManager.java


> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034905#comment-13034905 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

Please ignore the previous comment, the patch it is complaining about is not for trunk, but the 20 security branch.

The following is from the 20 security branch

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.


The javadocs issue is wrong, as both of them generated 6 warnings, and the Eclipse issue is a known issue.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Attachment: MAPREDUCE-2495-20.20X-V1.patch

Attaching patch for the 20.20X security line too.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036206#comment-13036206 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

OK, Will have an updated patch shortly.  But just to clarify.  You want the code to look like

} catch (InterruptedException e) {
    LOG.info("Cleanup...",e);
    //To force us to exit cleanly
    running = false;
}

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038938#comment-13038938 ] 

Hudson commented on MAPREDUCE-2495:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #698 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/698/])
    MAPREDUCE-2495. exit() the TaskTracker when the distributed cache cleanup
thread dies. Contributed by Robert Joseph Evans

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127361
Files : 
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapreduce/filecache/TestTrackerDistributedCacheManager.java


> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035369#comment-13035369 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

Looks good, should have an updated patch shortly.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Attachment: MAPREDUCE-2495-v1.patch

Added simple patch to verify that the cleanup thread is always running when cache archives are being added in.

> The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Summary: The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason  (was: The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason)

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034827#comment-13034827 ] 

Hadoop QA commented on MAPREDUCE-2495:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12479465/MAPREDUCE-2495-20.20X-V1.patch
  against trunk revision 1103993.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/257//console

This message is automatically generated.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035022#comment-13035022 ] 

Owen O'Malley commented on MAPREDUCE-2495:
------------------------------------------

This doesn't match the approach we use other places.

All threads in the servers should have catch clauses for Throwable that log and then shutdown the server.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Attachment: MAPREDUCE-2495-v2.patch
                MAPREDUCE-2495-20.20X-V2.patch

Changed to exit the Task Tracker when an unexpected exception is thrown.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035844#comment-13035844 ] 

Owen O'Malley commented on MAPREDUCE-2495:
------------------------------------------

This looks good, except that you shouldn't do a shutdown for interruptedexception. Those are *only* thrown when another thread is trying to do a clean shutdown. Just log the exception as info and exit nicely.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035091#comment-13035091 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

Sorry, should have been a bit more verbose.  I am fine with catching a throwable and shutting down the server.  I just am not completely sure how to go about shutting down the task tracker appropriately.  I will look through the code for an example, but a pointer would be helpful.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037080#comment-13037080 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

Just like before.
The contrib test issues are with RAID, and appear to be a known issue.

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035089#comment-13035089 ] 

Robert Joseph Evans commented on MAPREDUCE-2495:
------------------------------------------------

What is the proper way to shut down the server?

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-v1.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Attachment: MAPREDUCE-2495-v4.patch
                MAPREDUCE-2495-20.20X-V4.patch

Tests no longer sleep

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated MAPREDUCE-2495:
-------------------------------------

    Fix Version/s:     (was: 0.20.205.0)

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>             Fix For: 0.20.204.0, 0.23.0
>
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2495:
-------------------------------------------

    Status: Patch Available  (was: Open)

> The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Minor
>         Attachments: MAPREDUCE-2495-20.20X-V1.patch, MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch
>
>
> The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira