You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/12/03 08:35:00 UTC

[jira] [Commented] (PARQUET-1622) Adding an encoding for FP data

    [ https://issues.apache.org/jira/browse/PARQUET-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986714#comment-16986714 ] 

ASF GitHub Bot commented on PARQUET-1622:
-----------------------------------------

gszadovszky commented on pull request #144: PARQUET-1622: Add BYTE_STREAM_SPLIT encoding
URL: https://github.com/apache/parquet-format/pull/144
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Adding an encoding for FP data
> ------------------------------
>
>                 Key: PARQUET-1622
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1622
>             Project: Parquet
>          Issue Type: Wish
>          Components: parquet-cpp, parquet-format, parquet-mr, parquet-thrift
>            Reporter: Martin Radev
>            Priority: Minor
>              Labels: features, pull-request-available
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Apache Parquet does not have any encodings suitable for FP data and the available text compressors (zstd, gzip, etc) do not handle FP data very well.
> It is possible to apply a simple data transformation named "stream splitting". Such could be "byte stream splitting" which creates K streams of length N where K is the number of bytes in the data type (4 for floats, 8 for doubles) and N is the number of elements in the sequence.
> The transformed data compresses significantly better on average than the original data and for some cases there is a performance improvement in compression and decompression speed.
> You can read a more detailed report here:
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view



--
This message was sent by Atlassian Jira
(v8.3.4#803005)