You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jun Zhang <82...@qq.com> on 2019/09/17 03:45:41 UTC

回复: Add Bucket File System Table Sink

Hi Kurt:
	Thanks.
	When I encountered this problem, I found a File System Connector, but its function is not powerful enough and rich.
	I also found that it is built into Flink, there are many unit tests that refer to it, so I dare not easily modify it to enrich its functions.


	So I develop a new Connector, and later we can keep only one File System Connector and ensure that it is powerful and stable.


&nbsp; &nbsp; &nbsp;I will learn about FLIP-63 and see if there is a better solution to combine these two functions. I am very willing to join this development.







------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"Kurt Young"<ykt836@gmail.com&gt;;
发送时间:&nbsp;2019年9月17日(星期二) 中午11:19
收件人:&nbsp;"Jun Zhang"<825875991@qq.com&gt;;
抄送:&nbsp;"dev"<dev@flink.apache.org&gt;;"user"<user@flink.apache.org&gt;;
主题:&nbsp;Re: Add Bucket File System Table Sink



Thanks. Let me clarify a bit more about my thinkings. Generally, I would
prefer we can concentrate the functionalities about connector, especially
some standard &amp; most popular connectors, like kafka, different file
system with different formats, etc. We should make these core connectors
as powerful as we can, and can also prevent something badly from
happening, such as "if you want use this feature, please use connectorA.
But if you want use another feature, please use connectorB".

Best,
Kurt


On Tue, Sep 17, 2019 at 11:11 AM Jun Zhang <825875991@qq.com&gt; wrote:

&gt; Hi Kurt:
&gt; thank you very much.
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I will take a closer look at the FLIP-63.
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I develop this PR, the underlying is StreamingFileSink, not
&gt; BuckingSink, but I gave him a name, called Bucket.
&gt;
&gt;
&gt; On 09/17/2019 10:57,Kurt Young<ykt836@gmail.com&gt; <ykt836@gmail.com&gt;
&gt; wrote:
&gt;
&gt; Hi Jun,
&gt;
&gt; Thanks for bringing this up, in general I'm +1 on this feature. As
&gt; you might know, there is another ongoing efforts about such kind
&gt; of table sink, which covered in newly proposed partition support
&gt; reworking[1]. In this proposal, we also want to introduce a new
&gt; file system connector, which can not only cover the partition
&gt; support, but also end-to-end exactly once in streaming mode.
&gt;
&gt; I would suggest we could combine these two efforts into one. The
&gt; benefits would be save some review efforts, also reduce the core
&gt; connector number to ease our maintaining effort in the future.
&gt; What do you think?
&gt;
&gt; BTW, BucketingSink is already deprecated, I think we should refer
&gt; to StreamingFileSink instead.
&gt;
&gt; Best,
&gt; Kurt
&gt;
&gt; [1]
&gt; http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html
&gt;
&gt;
&gt; On Tue, Sep 17, 2019 at 10:39 AM Jun Zhang <825875991@qq.com&gt; wrote:
&gt;
&gt;&gt; Hello everyone:
&gt;&gt; I am a user and fan of flink. I also want to join the flink community. I
&gt;&gt; contributed my first PR a few days ago. Can anyone help me to review my
&gt;&gt; code? If there is something wrong, hope I would be grateful if you can give
&gt;&gt; some advice.
&gt;&gt;
&gt;&gt; This PR is mainly in the process of development, I use sql to read data
&gt;&gt; from kafka and then write to hdfs, I found that there is no suitable
&gt;&gt; tablesink, I found the document and found that File System Connector is
&gt;&gt; only experimental (
&gt;&gt; https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector),
&gt;&gt; so I wrote a Bucket File System Table Sink that supports writing stream
&gt;&gt; data. Hdfs, file file system, data format supports json, csv, parquet,
&gt;&gt; avro. Subsequently add other format support, such as protobuf, thrift, etc.
&gt;&gt;
&gt;&gt; In addition, I also added documentation, python api, units test,
&gt;&gt; end-end-test, sql-client, DDL, and compiled on travis.
&gt;&gt;
&gt;&gt; the issue is https://issues.apache.org/jira/browse/FLINK-12584
&gt;&gt; thank you very much
&gt;&gt;
&gt;&gt;
&gt;&gt;

Re: Add Bucket File System Table Sink

Posted by Kurt Young <yk...@gmail.com>.
Great to hear.

Best,
Kurt


On Tue, Sep 17, 2019 at 11:45 AM Jun Zhang <82...@qq.com> wrote:

>
> Hi Kurt:
> Thanks.
> When I encountered this problem, I found a File System Connector, but its
> function is not powerful enough and rich.
> I also found that it is built into Flink, there are many unit tests that
> refer to it, so I dare not easily modify it to enrich its functions.
>
> So I develop a new Connector, and later we can keep only one File System
> Connector and ensure that it is powerful and stable.
>
>      I will learn about FLIP-63 and see if there is a better solution to
> combine these two functions. I am very willing to join this development.
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Kurt Young"<yk...@gmail.com>;
> *发送时间:* 2019年9月17日(星期二) 中午11:19
> *收件人:* "Jun Zhang"<82...@qq.com>;
> *抄送:* "dev"<de...@flink.apache.org>;
> *主题:* Re: Add Bucket File System Table Sink
>
> Thanks. Let me clarify a bit more about my thinkings. Generally, I would
> prefer we can concentrate the functionalities about connector, especially
> some standard & most popular connectors, like kafka, different file
> system with different formats, etc. We should make these core connectors
> as powerful as we can, and can also prevent something badly from
> happening, such as "if you want use this feature, please use connectorA.
> But if you want use another feature, please use connectorB".
>
> Best,
> Kurt
>
>
> On Tue, Sep 17, 2019 at 11:11 AM Jun Zhang <82...@qq.com> wrote:
>
> > Hi Kurt:
> > thank you very much.
> >         I will take a closer look at the FLIP-63.
> >
> >         I develop this PR, the underlying is StreamingFileSink, not
> > BuckingSink, but I gave him a name, called Bucket.
> >
> >
> > On 09/17/2019 10:57,Kurt Young<yk...@gmail.com> <yk...@gmail.com>
> > wrote:
> >
> > Hi Jun,
> >
> > Thanks for bringing this up, in general I'm +1 on this feature. As
> > you might know, there is another ongoing efforts about such kind
> > of table sink, which covered in newly proposed partition support
> > reworking[1]. In this proposal, we also want to introduce a new
> > file system connector, which can not only cover the partition
> > support, but also end-to-end exactly once in streaming mode.
> >
> > I would suggest we could combine these two efforts into one. The
> > benefits would be save some review efforts, also reduce the core
> > connector number to ease our maintaining effort in the future.
> > What do you think?
> >
> > BTW, BucketingSink is already deprecated, I think we should refer
> > to StreamingFileSink instead.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html
> >
> >
> > On Tue, Sep 17, 2019 at 10:39 AM Jun Zhang <82...@qq.com> wrote:
> >
> >> Hello everyone:
> >> I am a user and fan of flink. I also want to join the flink community. I
> >> contributed my first PR a few days ago. Can anyone help me to review my
> >> code? If there is something wrong, hope I would be grateful if you can
> give
> >> some advice.
> >>
> >> This PR is mainly in the process of development, I use sql to read data
> >> from kafka and then write to hdfs, I found that there is no suitable
> >> tablesink, I found the document and found that File System Connector is
> >> only experimental (
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector
> ),
> >> so I wrote a Bucket File System Table Sink that supports writing stream
> >> data. Hdfs, file file system, data format supports json, csv, parquet,
> >> avro. Subsequently add other format support, such as protobuf, thrift,
> etc.
> >>
> >> In addition, I also added documentation, python api, units test,
> >> end-end-test, sql-client, DDL, and compiled on travis.
> >>
> >> the issue is https://issues.apache.org/jira/browse/FLINK-12584
> >> thank you very much
> >>
> >>
> >>
>
>

Re: Add Bucket File System Table Sink

Posted by Kurt Young <yk...@gmail.com>.
Great to hear.

Best,
Kurt


On Tue, Sep 17, 2019 at 11:45 AM Jun Zhang <82...@qq.com> wrote:

>
> Hi Kurt:
> Thanks.
> When I encountered this problem, I found a File System Connector, but its
> function is not powerful enough and rich.
> I also found that it is built into Flink, there are many unit tests that
> refer to it, so I dare not easily modify it to enrich its functions.
>
> So I develop a new Connector, and later we can keep only one File System
> Connector and ensure that it is powerful and stable.
>
>      I will learn about FLIP-63 and see if there is a better solution to
> combine these two functions. I am very willing to join this development.
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Kurt Young"<yk...@gmail.com>;
> *发送时间:* 2019年9月17日(星期二) 中午11:19
> *收件人:* "Jun Zhang"<82...@qq.com>;
> *抄送:* "dev"<de...@flink.apache.org>;
> *主题:* Re: Add Bucket File System Table Sink
>
> Thanks. Let me clarify a bit more about my thinkings. Generally, I would
> prefer we can concentrate the functionalities about connector, especially
> some standard & most popular connectors, like kafka, different file
> system with different formats, etc. We should make these core connectors
> as powerful as we can, and can also prevent something badly from
> happening, such as "if you want use this feature, please use connectorA.
> But if you want use another feature, please use connectorB".
>
> Best,
> Kurt
>
>
> On Tue, Sep 17, 2019 at 11:11 AM Jun Zhang <82...@qq.com> wrote:
>
> > Hi Kurt:
> > thank you very much.
> >         I will take a closer look at the FLIP-63.
> >
> >         I develop this PR, the underlying is StreamingFileSink, not
> > BuckingSink, but I gave him a name, called Bucket.
> >
> >
> > On 09/17/2019 10:57,Kurt Young<yk...@gmail.com> <yk...@gmail.com>
> > wrote:
> >
> > Hi Jun,
> >
> > Thanks for bringing this up, in general I'm +1 on this feature. As
> > you might know, there is another ongoing efforts about such kind
> > of table sink, which covered in newly proposed partition support
> > reworking[1]. In this proposal, we also want to introduce a new
> > file system connector, which can not only cover the partition
> > support, but also end-to-end exactly once in streaming mode.
> >
> > I would suggest we could combine these two efforts into one. The
> > benefits would be save some review efforts, also reduce the core
> > connector number to ease our maintaining effort in the future.
> > What do you think?
> >
> > BTW, BucketingSink is already deprecated, I think we should refer
> > to StreamingFileSink instead.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html
> >
> >
> > On Tue, Sep 17, 2019 at 10:39 AM Jun Zhang <82...@qq.com> wrote:
> >
> >> Hello everyone:
> >> I am a user and fan of flink. I also want to join the flink community. I
> >> contributed my first PR a few days ago. Can anyone help me to review my
> >> code? If there is something wrong, hope I would be grateful if you can
> give
> >> some advice.
> >>
> >> This PR is mainly in the process of development, I use sql to read data
> >> from kafka and then write to hdfs, I found that there is no suitable
> >> tablesink, I found the document and found that File System Connector is
> >> only experimental (
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector
> ),
> >> so I wrote a Bucket File System Table Sink that supports writing stream
> >> data. Hdfs, file file system, data format supports json, csv, parquet,
> >> avro. Subsequently add other format support, such as protobuf, thrift,
> etc.
> >>
> >> In addition, I also added documentation, python api, units test,
> >> end-end-test, sql-client, DDL, and compiled on travis.
> >>
> >> the issue is https://issues.apache.org/jira/browse/FLINK-12584
> >> thank you very much
> >>
> >>
> >>
>
>