You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhangminglei (JIRA)" <ji...@apache.org> on 2018/07/06 12:42:00 UTC

[jira] [Comment Edited] (FLINK-9411) Support parquet rolling sink writer

    [ https://issues.apache.org/jira/browse/FLINK-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534776#comment-16534776 ] 

zhangminglei edited comment on FLINK-9411 at 7/6/18 12:41 PM:
--------------------------------------------------------------

A short checkpoint time that whether lead to poor compression or not, I will give an actual production test under compression situation and as an attachment attached to FLINK-9407 in the next few days.

Actually, We having been using this PR (but not polished yet) in our production environment for a long time. And getting a very nice results ending that compared to spark streaming. I will put the test results in the form of attachments to FLINK-9407 as a reference. And what you are more concerned about the short checkpoint interval will lead to poor compression, yea I would agree but we need a test for it. Furthermore, we have all known that the longer snapshots interval, the lower performance impact of asynchronous or synchronous snapshotting we would get. So, I think people do not inclined to adopt short checkpoint intervals for getting a high throughput and low latency in most cases. For example, in our calculation of uv jobs, the checkpoint intervals is 30 seconds. Anyway, I will still give a test results under the compression situation.

I suggest this PR can merge as a temporary solution to reduce the user's learning curve since some users already had used {{BucketingSink}} in their project and wants this functionality as we can see in the user mail list a few days ago.  And In a short time, they may not switch to a new sink {{StreamingFileSink}} . This may be one of the good reasons I think.

By the way, I will watch FLINK-9749 and the subtask FLINK-9753 soon and give more response in the next couple of days.


was (Author: mingleizhang):
A short checkpoint time that whether lead to poor compression or not, I will give an actual production test under compression situation and as an attachment attached to FLINK-9407 in the next few days.

Actually, We having been using this PR (but not polished yet) in our production environment for a long time. And getting a very nice results ending that compared to spark streaming. I will put the test results in the form of attachments to FLINK-9407 as a reference. And what you are more concerned about the short checkpoint interval will lead to poor compression, yea I would agree but we need a test for it. Furthermore, we have all known that the longer snapshots interval, the lower performance impact of asynchronous or synchronous snapshotting we would get. So, I think people do not inclined to adopt short checkpoint intervals for getting a high throughput and low latency in most cases. For example, in our calculation of uv jobs, the checkpoint intervals is 30 seconds. Anyway, I will still give a test results under the compression situation.

I suggest this PR can merge as a temporary solution to reduce the user's learning curve since some users already had used ```BucketingSink``` in their project and wants this functionality as we can see in the user mail list a few days ago.  And In a short time, they may not switch to a new sink ```StreamingFileSink ``` . This may be one of the good reasons I think.

By the way, I will watch FLINK-9749 and the subtask FLINK-9753 soon and give more response in the next couple of days.

> Support parquet rolling sink writer
> -----------------------------------
>
>                 Key: FLINK-9411
>                 URL: https://issues.apache.org/jira/browse/FLINK-9411
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: zhangminglei
>            Assignee: Triones Deng
>            Priority: Major
>
> Like support orc rolling sink writer in FLINK-9407 , we should also support parquet rolling sink writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)