You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Liyin Liang (JIRA)" <ji...@apache.org> on 2011/06/15 07:53:47 UTC

[jira] [Commented] (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049621#comment-13049621 ] 

Liyin Liang commented on MAPREDUCE-1904:
----------------------------------------

This is a great patch. Here is part of the stack when work thread is blocked:
{code}
"1797055149@qtp0-98" prio=10 tid=0x0000002aa1a40000 nid=0x333 waiting for monitor entry [0x0000000049dc5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:377)
        - waiting to lock <0x00000000a0000090> (a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:142)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3086)
{code}

I have written a job with one map which output 1M data, and 100 reduces. Each reduce spawn 10 threads to fetch data from map side 3k times just like shuffle phase. When run this job, most of work threads is blocked on AllocatorPerContext.

With LRUCache, most work threads are blocked on LOG.info() as following stack.
{code}
"1793911889@qtp0-101" prio=10 tid=0x0000002aa1530000 nid=0x34f2 waiting for monitor entry [0x0000000041d45000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.log4j.Category.callAppenders(Category.java:204)
        - waiting to lock <0x00000000a01be928> (a org.apache.log4j.spi.RootLogger)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.log(Category.java:856)
        at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:133)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3246)
{code}

With LRUCache + disable LOG.info(): This job takes 3mins, 19sec to run.
Without LRUCache + enable LOG.info(): This job takes just 37sec to run.

b.t.w LRUCache should use *mapId* as key instead of *(jobId + mapId)*. Because jobId is just part of mapId.

> Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1904
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>         Attachments: LocalDirAllocator.JPG, LocalDirAllocator_Monitor.JPG, MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, TaskTracker- yourkit profiler output .jpg, Thread profiler output showing contention.jpg, profiler output after applying the patch.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would be available per tasktracker httpserver.  Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira