You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Yi Li (JIRA)" <ji...@apache.org> on 2016/03/14 07:28:33 UTC

[jira] [Commented] (HADOOP-10048) LocalDirAllocator should avoid holding locks while accessing the filesystem

    [ https://issues.apache.org/jira/browse/HADOOP-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192814#comment-15192814 ] 

Yi Li commented on HADOOP-10048:
--------------------------------

Hi all, we have integrated this patch into our internal build (based on hadoop-2.5-cdh5.3.2) and deployed to two clusters (tens/hundreds of nodes each) for more than a month.So far there's no abnormal behavior, no job will be stuck in SHUFFLE phase (back then a single fetch attempt could hang for several hours under heavy load), and we have observed that the throughput of ShuffleHandler can be 10x higher than before (based on ShuffleMetrics). 
Is the patch good to go? Thanks.

> LocalDirAllocator should avoid holding locks while accessing the filesystem
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-10048
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10048
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-10048.patch
>
>
> As noted in MAPREDUCE-5584 and HADOOP-7016, LocalDirAllocator can be a bottleneck for multithreaded setups like the ShuffleHandler.  We should consider moving to a lockless design or minimizing the critical sections to a very small amount of time that does not involve I/O operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)