You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Lawrence Chan (JIRA)" <ji...@apache.org> on 2018/03/08 23:47:00 UTC

[jira] [Comment Edited] (ARROW-300) [Format] Add buffer compression option to IPC file format

    [ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392124#comment-16392124 ] 

Lawrence Chan edited comment on ARROW-300 at 3/8/18 11:46 PM:
--------------------------------------------------------------

What did we decide with this? Imho there's still a use case for compressed arrow files due to the limited storage types in parquet. I don't really love the idea of storing 8-bit or 16-bit ints in an INT32 and hand waving it away with compression. My current workaround uses a fixed length byte array but it's pretty clunky to do this efficiently, at least in the parquet-cpp implementation. There are maybe also some alignment concerns with that approach that I'm just ignoring right now.

Happy to help, but I'm not familiar enough with the code base to place it in the right spot. If we make a branch with some TODOs/placeholders I can probably plug in more easily.


was (Author: llchan):
What did we decide with this? Imho there's still a use case for compressed arrow files due to the limited storage types in parquet. I don't really love the idea of storing 8-bit or 16-bit ints in an INT32 and hand waving it away with compression. My current workaround uses a fixed length byte array but it's pretty clunky to do this efficiently, at least in the parquet-cpp implementation. There are maybe also some alignment concerns with that latter approach that I'm just ignoring right now.

Happy to help, but I'm not familiar enough with the code base to place it in the right spot. If we make a branch with some TODOs/placeholders I can probably plug in more easily.

> [Format] Add buffer compression option to IPC file format
> ---------------------------------------------------------
>
>                 Key: ARROW-300
>                 URL: https://issues.apache.org/jira/browse/ARROW-300
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.10.0
>
>
> It may be useful if data is to be sent over the wire to compress the data buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer compression setting in the file Footer. Probably only two compressors worth supporting out of the box would be zlib (higher compression ratios) and lz4 (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)