You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2019/01/22 14:51:00 UTC

[jira] [Commented] (FLINK-11401) Allow compression on ParquetBulkWriter

    [ https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748795#comment-16748795 ] 

Stephan Ewen commented on FLINK-11401:
--------------------------------------

I can see that being useful.

Please bear in mind that bulk writers currently have the implication that they need to roll on checkpoint, because many formats (like Parquet) don't make it easy to intermediately persist and resume writes.
Avro's row-by-row append nature makes it possible to write part files across checkpoints.

One could think of letting the row-formats add a header, when opening a part file. That would allow the Avro writes to keep the property of writing part files across checkpoints.

> Allow compression on ParquetBulkWriter
> --------------------------------------
>
>                 Key: FLINK-11401
>                 URL: https://issues.apache.org/jira/browse/FLINK-11401
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.7.1
>            Reporter: Fokko Driesprong
>            Assignee: Fokko Driesprong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)