You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2010/07/01 06:11:50 UTC

[jira] Created: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
-------------------------------------------------------------------------------

                 Key: MAPREDUCE-1904
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
    Affects Versions: 0.20.1
            Reporter: Rajesh Balamohan


While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.

As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.

Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated MAPREDUCE-1904:
----------------------------------------

    Attachment: MAPREDUCE-1904-RC10.patch

Patch for RC10 release is attached here.

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908604#action_12908604 ] 

Rajesh Balamohan commented on MAPREDUCE-1904:
---------------------------------------------

Thanks for the review comments Arun. 

1. For #1, I would post the profiler output of which methods are expensive in getLocalPathToRead().

2. For #2, the code path for LocalDirAllocator.confChanged() need not be called in this context of TaskTracker. 

Reason: In this context, TaskTracker is trying to check for any config changes related to  "mapred.local.dir" using LocalDirAllocator. Once its read, this parameter does not change over TaskTracker's lifetime. Hence, it is not mandatory to do this check for every invocation. Corner case: When tasktracker goes down and new configs are reloaded, the LRUCache would also be repopulated.  



> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, profiler output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated MAPREDUCE-1904:
----------------------------------------

    Attachment: Thread profiler output showing contention.jpg

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, profiler output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907334#action_12907334 ] 

Arun C Murthy commented on MAPREDUCE-1904:
------------------------------------------

Couple of concerns:

# I'd like to understand what part of LocalDirAllocator.getLocalPathToRead is expensive... it's fine to add a cache, but it's better to do it _after_ we understand why we really need it.
# This patch results in the code path skipping the sanity checks in LocalDirAllocator.confChanged which is called by LocalDirAllocator.getLocalPathToRead. That is a concern. Again, this might be the expensive part of LocalDirAllocator.getLocalPathToRead, but we need to ensure that.

Don't get me wrong, the focus of this jira is very useful - we just need to fix it the 'right' way.

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, profiler output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated MAPREDUCE-1904:
----------------------------------------

    Attachment: MAPREDUCE-1904-trunk.patch

Attaching the patch for trunk version. 

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, profiler output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated MAPREDUCE-1904:
----------------------------------------

    Attachment: LocalDirAllocator.JPG
                LocalDirAllocator_Monitor.JPG

CPU profiler output of a TaskTracker "without" the patch. Profiler output shows that ~4% of time being spent on internal methods of LocalDirAllocator.*.getLocalPathToRead(). Rest of them time is spent on the method itself. This along with the monitor profiling shows synchonization to be the bottleneck in getLocalPathToRead().

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: LocalDirAllocator.JPG, LocalDirAllocator_Monitor.JPG, MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, profiler output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated MAPREDUCE-1904:
----------------------------------------

    Attachment: profiler output after applying the patch.jpg

Contention on LocalDirAllocator is very less. Close to 0%

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, profiler output after applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

Posted by "Rajesh Balamohan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated MAPREDUCE-1904:
----------------------------------------

    Attachment: TaskTracker- yourkit profiler output .jpg

LocalDirAllocator.AllocatorPerContext is heavily contended. 

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: MAPREDUCE-1904-RC10.patch, TaskTracker- yourkit profiler output .jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.