You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/10/12 06:45:04 UTC

[jira] [Comment Edited] (HBASE-5355) Compressed RPC's for HBase

    [ https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474786#comment-13474786 ] 

Lars Hofhansl edited comment on HBASE-5355 at 10/12/12 4:44 AM:
----------------------------------------------------------------

Before we commit this or the trunk patch I'd love to see some numbers comparing this full compression stream approach with just avoiding duplicate data while serializing from/to the RegionServer. On both sides we'd have to reassemble the full KVs (unless we finally make a KV interface), but we can do that efficiently if we keep track of the size of the omitted parts of the KVs and preallocate the space and copy the data into that. That way we'd have the same amount memory copying (ignoring DMA from the network card for the moment) and save bytes on the wire.

I raised this on the mailing this a while ago, and Andy commented on that somewhere as well.
KV are sorted when traveling over the wire (as a set of Puts/Deletes or in a Result) we can simple avoid copying the prefix multiple times.

Edit: Fixed my typical spelling mistakes.
                
      was (Author: lhofhansl):
    Before we commit this or the trunk patch I'd love to see some numbers comparing this full compression stream approach with just avoiding duplicate data while serializing from/to the RegionServer. On both sides we'd have to reassemble the full KVs (unless we finally make a KV interface), but we can that efficiently if we keep track size of the omitted parts of the KV and preallocate the space and copy the data in that. That way we'd have the same amount memory copying (ignoring DMA from the network card for the moment) and can safe bytes on the wire.
I raised this on the mailing this a while ago, and Andy commented on that somewhere as well.
KV are sorted when traveling over the wire (as a set of Puts/Deletes or in a Result) we can simple avoid copying the prefix multiple times.
                  
> Compressed RPC's for HBase
> --------------------------
>
>                 Key: HBASE-5355
>                 URL: https://issues.apache.org/jira/browse/HBASE-5355
>             Project: HBase
>          Issue Type: Improvement
>          Components: IPC/RPC
>    Affects Versions: 0.89.20100924
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-5355-0.94.patch
>
>
> Some application need ability to do large batched writes and reads from a remote MR cluster. These eventually get bottlenecked on the network. These results are also pretty compressible sometimes.
> The aim here is to add the ability to do compressed calls to the server on both the send and receive paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira