You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Karthick Sankarachary (JIRA)" <ji...@apache.org> on 2011/05/05 21:27:03 UTC

[jira] [Commented] (HBASE-3732) New configuration option for client-side compression

    [ https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029522#comment-13029522 ] 

Karthick Sankarachary commented on HBASE-3732:
----------------------------------------------

Does it make sense to perform the compression at the IPC layer, specifically in the {{HBaseClient}} and {{HBaseServer}} classes? Currently, then both read (write) headers and data through a {{DataInputStream}} ({{DataOutputStream}}). What if we wrap those streams such that it compresses the bytes flowing through it, based on the yet-to-be-determined config option? As a matter of fact, I was working on something along these lines last year, but didn't follow through on it. Luckily, I still have the compression-based streams that I wrote, and I'm attaching those here just to get your thoughts. If this approach truly makes sense, then I can try to put together a working patch.

> New configuration option for client-side compression
> ----------------------------------------------------
>
>                 Key: HBASE-3732
>                 URL: https://issues.apache.org/jira/browse/HBASE-3732
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>
>         Attachments: compressed_streams.jar
>
>
> We have a case here where we have to store very fat cells (arrays of integers) which can amount into the hundreds of KBs that we need to read often, concurrently, and possibly keep in cache. Compressing the values on the client using java.util.zip's Deflater before sending them to HBase proved to be in our case almost an order of magnitude faster.
> There reasons are evident: less data sent to hbase, memstore contains compressed data, block cache contains compressed data too, etc.
> I was thinking that it might be something useful to add to a family schema, so that Put/Result do the conversion for you. The actual compression algo should also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira