You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2014/06/13 00:56:02 UTC

[jira] [Commented] (HADOOP-10681) Remove synchronized blocks from SnappyCodec and ZlibCodec buffering

    [ https://issues.apache.org/jira/browse/HADOOP-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029965#comment-14029965 ] 

Gopal V commented on HADOOP-10681:
----------------------------------

Added a SnappyCode impl into hive, which should come before in the classpath for hive+tez.

Tested out TPC-H Query5, which has a spill-merge on the JOIN.

Query times went from 539.8 seconds to 464.89 seconds, mostly from speedup to a single reducer stage.

> Remove synchronized blocks from SnappyCodec and ZlibCodec buffering
> -------------------------------------------------------------------
>
>                 Key: HADOOP-10681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10681
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: performance
>    Affects Versions: 2.2.0, 2.4.0, 2.5.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: perfomance
>         Attachments: compress-cmpxchg-small.png, perf-top-spill-merge.png, snappy-perf-unsync.png
>
>
> The current implementation of SnappyCompressor spends more time within the java loop of copying from the user buffer into the direct buffer allocated to the compressor impl, than the time it takes to compress the buffers.
> !perf-top-spill-merge.png!
> The bottleneck was found to be java monitor code inside SnappyCompressor.
> The methods are neatly inlined by the JIT into the parent caller (BlockCompressorStream::write), which unfortunately does not flatten out the synchronized blocks.
> !compress-cmpxchg-small.png!
> The loop does a write of small byte[] buffers (each IFile key+value). 
> I counted approximately 6 monitor enter/exit blocks per k-v pair written.



--
This message was sent by Atlassian JIRA
(v6.2#6252)