You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Ahmed Radwan (Created) (JIRA)" <ji...@apache.org> on 2011/11/04 00:43:34 UTC

[jira] [Created] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

TaskTracker Out of Memory because of distributed cache
------------------------------------------------------

                 Key: MAPREDUCE-3343
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1
    Affects Versions: 0.20.205.0
            Reporter: Ahmed Radwan


This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 

Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144082#comment-13144082 ] 

Robert Joseph Evans commented on MAPREDUCE-3343:
------------------------------------------------

If the analysis is correct then it looks like the issue has been around for a long time.  In SVN revision 1077679 the jobArchives map was added along with a releaseJob method that would remove the entries from jobArchives and release the resources held by the TaskDistributedCacheManager.  However, this method appears to have never been called.  The very next revision to TrackerDistributedCacheManager.java 1077687 removed that method and had TaskTracker.java release the resources for the TaskDistributedCacheManager directly, not removing it from the jobArchives Map.

It looks like this bug has been in the code since security was introduced.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Ahmed Radwan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146860#comment-13146860 ] 

Ahmed Radwan commented on MAPREDUCE-3343:
-----------------------------------------

+1 lgtm. Thanks zhaoyunjiong.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145525#comment-13145525 ] 

Robert Joseph Evans commented on MAPREDUCE-3343:
------------------------------------------------

The patch itself looks good to me, but I would like to see some tests added, or a justification why no tests are needed.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaoyunjiong updated MAPREDUCE-3343:
------------------------------------

    Attachment: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch

patch for avoid memory leak.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated MAPREDUCE-3343:
-----------------------------------

    Fix Version/s: 0.20.206.0
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 0.20.206.0
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Matt Foley (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-3343:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Resolving since patch committed to stated version.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 1.1.0
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins reassigned MAPREDUCE-3343:
--------------------------------------

    Assignee: zhaoyunjiong
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149419#comment-13149419 ] 

zhaoyunjiong commented on MAPREDUCE-3343:
-----------------------------------------

Eli Collins are right, no need for catch exception in removeTaskDistributedCacheManager. 
Thanks for your comments. 
Also thanks for Ahmed Radwan kindly updated my patch.

I notice the assignee is me now. What else should I do to commit this patch?


                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153331#comment-13153331 ] 

zhaoyunjiong commented on MAPREDUCE-3343:
-----------------------------------------

It's my pleasure. Thanks to all of you.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 0.20.206.0
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Matt Foley (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-3343:
----------------------------------

    Target Version/s: 1.1.0  (was: 1.0.0)
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 1.1.0
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaoyunjiong updated MAPREDUCE-3343:
------------------------------------

    Attachment:     (was: mapreduce-3343-release-0.20.205.0.patch)
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149761#comment-13149761 ] 

Eli Collins commented on MAPREDUCE-3343:
----------------------------------------

How does the test cover that the job is removed from the archives? Looks like it should pass even if we remove the call to removeTaskDistributedCacheManager in TT and the test.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaoyunjiong updated MAPREDUCE-3343:
------------------------------------

              Labels: mapreduce patch  (was: )
    Target Version/s: 0.20.205.0, 0.20.205.1
              Status: Patch Available  (was: Open)

Remove job's TaskDistributedCacheManager from TrackerDistributedCacheManager when job is done to avoid memory leak.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152672#comment-13152672 ] 

zhaoyunjiong commented on MAPREDUCE-3343:
-----------------------------------------

Unit test already cover the job is removed from the archives.
If we remove the call to removeTaskDistributedCacheManager in the test, the last line of testRemoveTaskDistributedCacheManager will fail.

                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaoyunjiong updated MAPREDUCE-3343:
------------------------------------

    Attachment: mapreduce-3343-release-0.20.205.0.patch

Add a unit test. 
This patch works fine in our production cluster.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146837#comment-13146837 ] 

zhaoyunjiong commented on MAPREDUCE-3343:
-----------------------------------------

The result of test-patch on the 0.20.205:     

     [exec] BUILD SUCCESSFUL
     [exec] Total time: 4 minutes 59 seconds
     [exec] 
     [exec] 
     [exec] 
     [exec] 
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
     [exec] 
     [exec] 
     [exec] 
     [exec] 
     [exec] ======================================================================
     [exec] ======================================================================
     [exec]     Finished build.
     [exec] ======================================================================
     [exec] ======================================================================

                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Matt Foley (Closed) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley closed MAPREDUCE-3343.
---------------------------------


Closed upon release of 1.0.1.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Suresh Srinivas (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reassigned MAPREDUCE-3343:
------------------------------------------

    Assignee: Suresh Srinivas  (was: zhaoyunjiong)
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: Suresh Srinivas
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152582#comment-13152582 ] 

Eli Collins commented on MAPREDUCE-3343:
----------------------------------------

Ping?
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146087#comment-13146087 ] 

Hadoop QA commented on MAPREDUCE-3343:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12502883/mapreduce-3343-release-0.20.205.0.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1268//console

This message is automatically generated.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Matt Foley (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206393#comment-13206393 ] 

Matt Foley commented on MAPREDUCE-3343:
---------------------------------------

This patch has been tested at user sites and is believed stable. Nathan Roberts requested that I include it in 1.0.1, as its absence is causing ops problems with 1.0.0.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaoyunjiong updated MAPREDUCE-3343:
------------------------------------

    Attachment: mapreduce-3343-release-0.20.205.0.patch
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Ahmed Radwan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated MAPREDUCE-3343:
------------------------------------

    Attachment: MAPREDUCE-3343_rev2.patch

Here is zhaoyunjiong's patch incorporating Eli's additional comments.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146325#comment-13146325 ] 

Robert Joseph Evans commented on MAPREDUCE-3343:
------------------------------------------------

All that is left is to run test-patch on the 0.20.205 line and posting the results here.  [Here|http://wiki.apache.org/hadoop/HowToContribute?action=recall&rev=47] are instructions on how to do that.  Look under the section testing your patch.  Assuming that you get a +1 from that I am also a +1 (Non-binding)
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Matt Foley (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-3343:
----------------------------------

    Target Version/s: 1.0.1  (was: 1.1.0)
       Fix Version/s:     (was: 1.1.0)
                      1.0.1
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146095#comment-13146095 ] 

Hadoop QA commented on MAPREDUCE-3343:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12502884/mapreduce-3343-release-0.20.205.0.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1269//console

This message is automatically generated.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "zhaoyunjiong (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaoyunjiong updated MAPREDUCE-3343:
------------------------------------

    Attachment:     (was: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch)
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153238#comment-13153238 ] 

Eli Collins commented on MAPREDUCE-3343:
----------------------------------------

Ah, I missed that.  +1 to the latest patch. I'll commit this to branch-20-security for 206. Thanks Zhao!
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145282#comment-13145282 ] 

Hadoop QA commented on MAPREDUCE-3343:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12502735/bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1255//console

This message is automatically generated.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: bug-fix-avoid-memory-leak-in-TrackerDistributedCacheManager.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated MAPREDUCE-3343:
-----------------------------------

    Target Version/s: 0.20.205.1  (was: 0.20.205.1, 0.20.205.0)
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149009#comment-13149009 ] 

Hadoop QA commented on MAPREDUCE-3343:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503477/MAPREDUCE-3343_rev2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1297//console

This message is automatically generated.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148767#comment-13148767 ] 

Eli Collins commented on MAPREDUCE-3343:
----------------------------------------

Thanks for submitting a patch!

* Why catch Exception in removeTaskDistributedCacheManager? The key should never be null right?
* getTaskDistributedCacheManager can be package protection instead of public right?
* Nit: please use two spaces instead of tabs per  http://wiki.apache.org/hadoop/HowToContribute

Otherwise looks great.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>              Labels: mapreduce, patch
>         Attachments: mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

Posted by "Suresh Srinivas (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reassigned MAPREDUCE-3343:
------------------------------------------

    Assignee: zhaoyunjiong  (was: Suresh Srinivas)
    
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira