You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2017/05/31 07:15:05 UTC

[jira] [Commented] (PARQUET-1011) bzip2 compression

    [ https://issues.apache.org/jira/browse/PARQUET-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030750#comment-16030750 ] 

Uwe L. Korn commented on PARQUET-1011:
--------------------------------------

We can add {{bzip2}} to Parquet but this will only change compression, it won't have any effect on splittability. By the design of the format Parquet files are always splittable, independently of the compression algorithm used. This means especially that also GZIP compressed Parquet files are splittable. In your case, it is probably easier to stick with that instead of implementing {{bzip2}} in Parquet.

Still it would be nice to see if {{bzip2}} would improve performance-wise against the currently implemented GZIP/snappy/Brotli codecs.

> bzip2 compression 
> ------------------
>
>                 Key: PARQUET-1011
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1011
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Rajasekhar Konda
>
> Hi,
> I have a requirement to implement Parquet with bzip2 compression because it's splitable. Right now, we can't provide bzip2 in PIG. 
> SET parquet.compression none/gzip/SNAPPY; 
> Is there any way to compress to bzip2 on top parquet ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)