You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2018/05/23 13:26:00 UTC

[jira] [Commented] (FLINK-9411) Support parquet rolling sink writer

    [ https://issues.apache.org/jira/browse/FLINK-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487241#comment-16487241 ] 

Stephan Ewen commented on FLINK-9411:
-------------------------------------

I would strongly suggest to write a comprehensive design for this up front, otherwise there is a big chance that the contribution cannot be added.

Parquet and similar files require writing of larger batches (for efficient columnarization) and this collides with the bucketing sink's assumption that it can flush()/persist at any checkpoints. We first need a plan/design to handle these conflicting requirements - for example, will the compression only happen on rolling, or always during writing?

Me and [~aljoscha] and [~kkl0u] are also looking at a new version of the Bucketing Sink that fixes a bunch of shortcomings, like making it work with Flink's file systems, making it work properly with S3 (eventual consistency), and support for non-row-wise formats - there will probably be a design doc for that coming in the next weeks.

> Support parquet rolling sink writer
> -----------------------------------
>
>                 Key: FLINK-9411
>                 URL: https://issues.apache.org/jira/browse/FLINK-9411
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: mingleizhang
>            Assignee: Triones Deng
>            Priority: Major
>
> Like support orc rolling sink writer in FLINK-9407 , we should also support parquet rolling sink writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)