You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/09/16 04:07:00 UTC

[jira] [Commented] (PHOENIX-4902) Snappy compression benefit is lost when generate hash cache RPC

    [ https://issues.apache.org/jira/browse/PHOENIX-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616582#comment-16616582 ] 

ASF GitHub Bot commented on PHOENIX-4902:
-----------------------------------------

GitHub user ortutay opened a pull request:

    https://github.com/apache/phoenix/pull/349

    PHOENIX-4902 Use only compressed portion of hash cache memory buffer

    See ticket https://issues.apache.org/jira/browse/PHOENIX-4902 for description of issue. Current code loses Snappy compression benefits, this change makes sure only the compressed portion is sent in the RPC message.a

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ortutay/phoenix PHOENIX-4902-snappy-compression-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #349
    
----
commit ed5f2251b6c0f10c3166c4a5480734c6741df4a4
Author: Marcell Ortutay <ma...@...>
Date:   2018-09-16T04:05:12Z

    PHOENIX-4902 Use only compressed portion of hash cache memory buffer

----


> Snappy compression benefit is lost when generate hash cache RPC
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4902
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4902
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Marcell Ortutay
>            Assignee: Marcell Ortutay
>            Priority: Minor
>
> Phoenix uses snappy compression on hash caches before it sends them to region server:
> {code}
>                 int maxCompressedSize = Snappy.maxCompressedLength(baOut.size());
>                 byte[] compressed = new byte[maxCompressedSize]; // size for worst case
>                 int compressedSize = Snappy.compress(baOut.getBuffer(), 0, baOut.size(), compressed, 0);
>                 // Last realloc to size of compressed buffer.
>                 ptr.set(compressed,0,compressedSize);
> {code}
> However, looking at debug output, it seems like the serialized protobuf that it sends to region servers does not have the benefits of snappy compression. Below is an excerpt of some debug output I put in:
> {code}
> Building an RPC with a cache ptr of size: 39MB  // The compressed size is 39MB
> Done serializing the AddServerCacheRequest RPC, size is 206MB  // However the serialized RPC is 206MB
> And the cache ptr size is: 206MB  // And specifically, the byte array that contains the serialized hash cache is 206MB
> {code}
> I've made a simple test codebase to attempt to reproduce this bug. It shows similar behavior:
> {code}
> bytes size: 10000 bytes
> compressed bytes size: 721 bytes
> message size: 10003 bytes
> compressed message size: 11701 bytes
> {code}
> The code for the simplified example is here: https://github.com/ortutay/snappy-bytes-buffer/blob/master/src/main/java/testprotobuf/Main.java
> I observed this behavior in Phoenix 4.14.1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)