You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Triones Deng (JIRA)" <ji...@apache.org> on 2018/06/21 14:30:00 UTC

[jira] [Comment Edited] (FLINK-9411) Support parquet rolling sink writer

    [ https://issues.apache.org/jira/browse/FLINK-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519405#comment-16519405 ] 

Triones Deng edited comment on FLINK-9411 at 6/21/18 2:29 PM:
--------------------------------------------------------------

[~StephanEwen] sure a design is necessary, here there may be two ways to do it i think
in production. Orc and Parquet file is popupar because of columnar storage.
there are two method to support parquet writer.
1. just write a ParquetStreamWriter which is a subclass of StreamWriterBase. which looks the same as FLINK-9407
2. with a HdfsWriterWrapper which own one delegate writer, when end user want to use one format, just simply to specify the format like orc, parquet and let the wrapper create a suitable writer like OrcStreamWriter or ParquetStreamWriter and so on.
sample code for the HdfsWriterWrapper

{code:java}
public class HdfsWriterWrapper<T> implements Writer<T> {

    private Writer<T> delegate;
    private String format;
    private Configuration configuration;
    private TableSchema tableSchema;

    public HdfsWriterWrapper(Configuration configuration, String format,Class<T> tableClass,String[] columnFields){

    }
{code}
which one is better?


was (Author: triones):
[~StephanEwen] sure a design is necessary, here there may be two ways to do it i think
in production. Orc and Parquet file is popupar because of columnar storage.
there are two method to support parquet writer.
1. just write a ParquetStreamWriter which is a subclass of StreamWriterBase. which looks the same as FLINK-9407
2. with a HdfsWriterWrapper which own one delegate writer, when end user want to use one format, just simply to specify the format like orc, parquet and let the wrapper create a suitable writer like OrcStreamWriter or ParquetStreamWriter and so on.
sample code for the HdfsWriterWrapper

{code:java}
public class HdfsWriterWrapper<T> implements Writer<T> {

    private Writer<T> delegate;
    private String format;
    private Configuration configuration;
    private TableSchema tableSchema;

    public HdfsWriterWrapper(Configuration configuration, String format,Class<T> tableClass,String[] columnFields){

    }
{code}
what do you think will be better?

> Support parquet rolling sink writer
> -----------------------------------
>
>                 Key: FLINK-9411
>                 URL: https://issues.apache.org/jira/browse/FLINK-9411
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: mingleizhang
>            Assignee: Triones Deng
>            Priority: Major
>
> Like support orc rolling sink writer in FLINK-9407 , we should also support parquet rolling sink writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)