You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by 刘大龙 <ld...@zju.edu.cn> on 2022/04/08 02:09:46 UTC

Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Hi, Martijn

Do you have any question about this FLIP? looking forward to your more feedback.

Best,

Ron


> -----原始邮件-----
> 发件人: "刘大龙" <ld...@zju.edu.cn>
> 发送时间: 2022-03-29 19:33:58 (星期二)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> 
> 
> 
> > -----原始邮件-----
> > 发件人: "Martijn Visser" <ma...@ververica.com>
> > 发送时间: 2022-03-24 16:18:14 (星期四)
> > 收件人: dev <de...@flink.apache.org>
> > 抄送: 
> > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > 
> > Hi Ron,
> > 
> > Thanks for creating the FLIP. You're talking about both local and remote
> > resources. With regards to remote resources, how do you see this work with
> > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > dependencies are not packaged, but I would hope that we do that for all
> > filesystem implementation. I don't think it's a good idea to have any tight
> > coupling to file system implementations, especially if at some point we
> > could also externalize file system implementations (like we're doing for
> > connectors already). I think the FLIP would be better by not only
> > referring to "Hadoop" as a remote resource provider, but a more generic
> > term since there are more options than Hadoop.
> > 
> > I'm also thinking about security/operations implications: would it be
> > possible for bad actor X to create a JAR that either influences other
> > running jobs, leaks data or credentials or anything else? If so, I think it
> > would also be good to have an option to disable this feature completely. I
> > think there are roughly two types of companies who run Flink: those who
> > open it up for everyone to use (here the feature would be welcomed) and
> > those who need to follow certain minimum standards/have a more closed Flink
> > ecosystem). They usually want to validate a JAR upfront before making it
> > available, even at the expense of speed, because it gives them more control
> > over what will be running in their environment.
> > 
> > Best regards,
> > 
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > 
> > 
> > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > 
> > >
> > >
> > >
> > > > -----原始邮件-----
> > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > 收件人: dev <de...@flink.apache.org>
> > > > 抄送:
> > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > > Hi Ron,
> > > >
> > > > Thanks for reviving the discussion of the work. The design looks good. A
> > > > small typo in the FLIP is that currently it is marked as released in
> > > 1.16.
> > > >
> > > >
> > > > Best Regards
> > > > Peter Huang
> > > >
> > > >
> > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com> wrote:
> > > >
> > > > > hi Yuxia,
> > > > >
> > > > >
> > > > > Thanks for your reply. Your reminder is very important !
> > > > >
> > > > >
> > > > > Since we download the file to the local, remember to clean it up when
> > > the
> > > > > flink client exits
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Mang Zhang
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > > > benefit from it. The flip looks good to me. I just have two minor
> > > questions:
> > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > identifier....", I think the word "identifier" may not be
> > > > > self-dedescriptive for actually it's not a random name but the name of
> > > the
> > > > > class that provides the implementation for function to be create.
> > > > > >May be it'll be more clear to use "class_name" replace "identifier"
> > > just
> > > > > like what Hive[1]/Spark[2] do.
> > > > > >
> > > > > >2.  >> If the resource used is a remote resource, it will first
> > > download
> > > > > the resource to a local temporary directory, which will be generated
> > > using
> > > > > UUID, and then register the local path to the user class loader.
> > > > > >For the above explanation in this FLIP, It seems for such statement
> > > sets,
> > > > > >""
> > > > > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > > > > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > > > > >""
> > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> > > > > possible to provide some cache mechanism that we won't need to
> > > download /
> > > > > store for twice?
> > > > > >
> > > > > >
> > > > > >Best regards,
> > > > > >Yuxia
> > > > > >[1]
> > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > >[2]
> > > > >
> > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > >日 期:2022年03月22日 11:35:24
> > > > > >收件人:<de...@flink.apache.org>
> > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > >
> > > > > >Hi Ron, Thank you so much for this suggestion, this is so good.
> > > > > >In our company, when users use custom UDF, it is very inconvenient,
> > > and
> > > > > the code needs to be packaged into the job jar,
> > > > > >and cannot refer to the existing udf jar through the existing udf jar.
> > > > > >Or pass in the jar reference in the startup command.
> > > > > >If we implement this feature, users can focus on their own business
> > > > > development.
> > > > > >I can also contribute if needed.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >--
> > > > > >
> > > > > >Best regards,
> > > > > >Mang Zhang
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > >>Hi, everyone
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>I would like to open a discussion for support advanced Function DDL,
> > > > > this proposal is a continuation of FLIP-79 in which Flink Function DDL
> > > is
> > > > > defined. Until now it is partially released as the Flink function DDL
> > > with
> > > > > user defined resources is not clearly discussed and implemented. It is
> > > an
> > > > > important feature for support to register UDF with custom jar resource,
> > > > > users can use UDF more more easily without having to put jars under the
> > > > > classpath in advance.
> > > > > >>
> > > > > >>Looking forward to your feedback.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>[1]
> > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>Best,
> > > > > >>
> > > > > >>Ron
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > >
> > > Hi, Peter, Thanks for your feedback. This work also has your effort, thank
> > > you very much.
> > >
> 
> Hi, Martijn
> Thank you very much for the feedback, it was very useful for me.
> 1. Filesystem abstraction: With regards to remote resources, I agree with you that we should use Flink's FileSytem abstraction to supports all types of file system, including HTTP, S3, HDFS, etc, rather than binding to a specific implementation. Currently in the first version, we will give priority to support HDFS as a resource provider by Flink's FileSytem abstraction. HDFS is used very much.
> 
> 2. Security/operations implications: The point you are considering is a great one, security is an issue that needs to be considered. Your starting point is that Jar needs to have some verification done on it before it is used, to avoid some non-secure behavior. However, IMO, the validation of Jar is supposed to be done by the platform side itself, and the platform needs to ensure that users have permission to use the jar and security of Jar. Option is not able to disable the syntax completely, the user can still open it by Set command. I think the most correct approach is the platform to verify rather than the engine side. In addition, the current Connector/UDF/DataStream program also exists using custom Jar case, these Jar will also have security issues, Flink currently does not provide Option to prohibit the use of custom Jar. The user used a custom Jar, which means that the user has permission to do this, then the user should be responsible for the security of the Jar. If it was hacked, it means that there are loopholes in the company's permissions/network and they need to fix these problems. All in all, I agree with you on this point, but Option can't solve this problem.
> 
> Best,
> 
> Ron

Re: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by 刘大龙 <ld...@zju.edu.cn>.
Thanks for all discuss about this FLIP again. I will open a vote tomorrow.

Best,
Ron


> -----原始邮件-----
> 发件人: "Jark Wu" <im...@gmail.com>
> 发送时间: 2022-04-19 16:03:22 (星期二)
> 收件人: dev <de...@flink.apache.org>
> 抄送: 
> 主题: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Thank Ron for updating the FLIP.
> 
> I think the updated FLIP has addressed Martijn's concern.
> I don't have other feedback. So +1 for a vote.
> 
> Best,
> Jark
> 
> On Fri, 15 Apr 2022 at 16:36, 刘大龙 <ld...@zju.edu.cn> wrote:
> 
> > Hi, Jingsong
> >
> > Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS
> > S3 OSS will be supported.
> >
> > Best,
> >
> > Ron
> >
> > > -----原始邮件-----
> > > 发件人: "Jingsong Li" <ji...@gmail.com>
> > > 发送时间: 2022-04-14 17:55:03 (星期四)
> > > 收件人: dev <de...@flink.apache.org>
> > > 抄送:
> > > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function DDL
> > >
> > > I agree with Martijn.
> > >
> > > At least, HDFS S3 OSS should be supported.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com>
> > wrote:
> > > >
> > > > Hi Ron,
> > > >
> > > > The FLIP mentions that the priority will be set to support HDFS as a
> > > > resource provider. I'm concerned that we end up with a partially
> > > > implemented FLIP which only supports local and HDFS and then we move
> > on to
> > > > other features, as we see happen with others. I would argue that we
> > should
> > > > not focus on one resource provider, but that at least S3 support is
> > > > included in the same Flink release as HDFS support is.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn Visser
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > >
> > > > > Hi, everyone
> > > > >
> > > > > First of all, thanks for the valuable suggestions received about this
> > > > > FLIP. After some discussion, it looks like all concerns have been
> > addressed
> > > > > for now, so I will start a vote about this FLIP in two or three days
> > later.
> > > > > Also, further feedback is very welcome.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > > > 收件人: dev@flink.apache.org
> > > > > > 抄送:
> > > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function
> > > > > DDL
> > > > > >
> > > > > > Hi, Martijn
> > > > > >
> > > > > > Do you have any question about this FLIP? looking forward to your
> > more
> > > > > feedback.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Ron
> > > > > >
> > > > > >
> > > > > > > -----原始邮件-----
> > > > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > > > 收件人: dev@flink.apache.org
> > > > > > > 抄送:
> > > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function DDL
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----原始邮件-----
> > > > > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > 抄送:
> > > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > > > > > > >
> > > > > > > > Hi Ron,
> > > > > > > >
> > > > > > > > Thanks for creating the FLIP. You're talking about both local
> > and
> > > > > remote
> > > > > > > > resources. With regards to remote resources, how do you see
> > this
> > > > > work with
> > > > > > > > Flink's filesystem abstraction? I did read in the FLIP that
> > Hadoop
> > > > > > > > dependencies are not packaged, but I would hope that we do
> > that for
> > > > > all
> > > > > > > > filesystem implementation. I don't think it's a good idea to
> > have
> > > > > any tight
> > > > > > > > coupling to file system implementations, especially if at some
> > point
> > > > > we
> > > > > > > > could also externalize file system implementations (like we're
> > doing
> > > > > for
> > > > > > > > connectors already). I think the FLIP would be better by not
> > only
> > > > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > > > generic
> > > > > > > > term since there are more options than Hadoop.
> > > > > > > >
> > > > > > > > I'm also thinking about security/operations implications:
> > would it be
> > > > > > > > possible for bad actor X to create a JAR that either
> > influences other
> > > > > > > > running jobs, leaks data or credentials or anything else? If
> > so, I
> > > > > think it
> > > > > > > > would also be good to have an option to disable this feature
> > > > > completely. I
> > > > > > > > think there are roughly two types of companies who run Flink:
> > those
> > > > > who
> > > > > > > > open it up for everyone to use (here the feature would be
> > welcomed)
> > > > > and
> > > > > > > > those who need to follow certain minimum standards/have a more
> > > > > closed Flink
> > > > > > > > ecosystem). They usually want to validate a JAR upfront before
> > > > > making it
> > > > > > > > available, even at the expense of speed, because it gives them
> > more
> > > > > control
> > > > > > > > over what will be running in their environment.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Martijn Visser
> > > > > > > > https://twitter.com/MartijnVisser82
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----原始邮件-----
> > > > > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > > > 抄送:
> > > > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > > > > > > > > >
> > > > > > > > > > Hi Ron,
> > > > > > > > > >
> > > > > > > > > > Thanks for reviving the discussion of the work. The design
> > looks
> > > > > good. A
> > > > > > > > > > small typo in the FLIP is that currently it is marked as
> > > > > released in
> > > > > > > > > 1.16.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best Regards
> > > > > > > > > > Peter Huang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <
> > zhangmang1@163.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > hi Yuxia,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Since we download the file to the local, remember to
> > clean it
> > > > > up when
> > > > > > > > > the
> > > > > > > > > > > flink client exits
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Mang Zhang
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > > > > users will
> > > > > > > > > > > benefit from it. The flip looks good to me. I just have
> > two
> > > > > minor
> > > > > > > > > questions:
> > > > > > > > > > > >1. For synax explanation, I see it's "Create ....
> > function as
> > > > > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > > > > self-dedescriptive for actually it's not a random name
> > but the
> > > > > name of
> > > > > > > > > the
> > > > > > > > > > > class that provides the implementation for function to be
> > > > > create.
> > > > > > > > > > > >May be it'll be more clear to use "class_name" replace
> > > > > "identifier"
> > > > > > > > > just
> > > > > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > > > > >
> > > > > > > > > > > >2.  >> If the resource used is a remote resource, it
> > will
> > > > > first
> > > > > > > > > download
> > > > > > > > > > > the resource to a local temporary directory, which will
> > be
> > > > > generated
> > > > > > > > > using
> > > > > > > > > > > UUID, and then register the local path to the user class
> > > > > loader.
> > > > > > > > > > > >For the above explanation in this FLIP, It seems for
> > such
> > > > > statement
> > > > > > > > > sets,
> > > > > > > > > > > >""
> > > > > > > > > > > >Create  function as org.apache.udf1 using jar
> > > > > 'hdfs://myudfs.jar';
> > > > > > > > > > > >Create  function as org.apache.udf2 using jar
> > > > > 'hdfs://myudfs.jar';
> > > > > > > > > > > >""
> > > > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for
> > twice.
> > > > > So is it
> > > > > > > > > > > possible to provide some cache mechanism that we won't
> > need to
> > > > > > > > > download /
> > > > > > > > > > > store for twice?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >Best regards,
> > > > > > > > > > > >Yuxia
> > > > > > > > > > > >[1]
> > > > > > > > >
> > > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > > > > >[2]
> > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > > > > >
> > > > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is
> > so
> > > > > good.
> > > > > > > > > > > >In our company, when users use custom UDF, it is very
> > > > > inconvenient,
> > > > > > > > > and
> > > > > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > > > > >and cannot refer to the existing udf jar through the
> > existing
> > > > > udf jar.
> > > > > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > > > > >If we implement this feature, users can focus on their
> > own
> > > > > business
> > > > > > > > > > > development.
> > > > > > > > > > > >I can also contribute if needed.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >--
> > > > > > > > > > > >
> > > > > > > > > > > >Best regards,
> > > > > > > > > > > >Mang Zhang
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > > > > >>Hi, everyone
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>I would like to open a discussion for support advanced
> > > > > Function DDL,
> > > > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > > > > Function DDL
> > > > > > > > > is
> > > > > > > > > > > defined. Until now it is partially released as the Flink
> > > > > function DDL
> > > > > > > > > with
> > > > > > > > > > > user defined resources is not clearly discussed and
> > > > > implemented. It is
> > > > > > > > > an
> > > > > > > > > > > important feature for support to register UDF with
> > custom jar
> > > > > resource,
> > > > > > > > > > > users can use UDF more more easily without having to put
> > jars
> > > > > under the
> > > > > > > > > > > classpath in advance.
> > > > > > > > > > > >>
> > > > > > > > > > > >>Looking forward to your feedback.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>[1]
> > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>Best,
> > > > > > > > > > > >>
> > > > > > > > > > > >>Ron
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > > > > effort, thank
> > > > > > > > > you very much.
> > > > > > > > >
> > > > > > >
> > > > > > > Hi, Martijn
> > > > > > > Thank you very much for the feedback, it was very useful for me.
> > > > > > > 1. Filesystem abstraction: With regards to remote resources, I
> > agree
> > > > > with you that we should use Flink's FileSytem abstraction to
> > supports all
> > > > > types of file system, including HTTP, S3, HDFS, etc, rather than
> > binding to
> > > > > a specific implementation. Currently in the first version, we will
> > give
> > > > > priority to support HDFS as a resource provider by Flink's FileSytem
> > > > > abstraction. HDFS is used very much.
> > > > > > >
> > > > > > > 2. Security/operations implications: The point you are
> > considering is
> > > > > a great one, security is an issue that needs to be considered. Your
> > > > > starting point is that Jar needs to have some verification done on it
> > > > > before it is used, to avoid some non-secure behavior. However, IMO,
> > the
> > > > > validation of Jar is supposed to be done by the platform side
> > itself, and
> > > > > the platform needs to ensure that users have permission to use the
> > jar and
> > > > > security of Jar. Option is not able to disable the syntax
> > completely, the
> > > > > user can still open it by Set command. I think the most correct
> > approach is
> > > > > the platform to verify rather than the engine side. In addition, the
> > > > > current Connector/UDF/DataStream program also exists using custom
> > Jar case,
> > > > > these Jar will also have security issues, Flink currently does not
> > provide
> > > > > Option to prohibit the use of custom Jar. The user used a custom
> > Jar, which
> > > > > means that the user has permission to do this, then the user should
> > be
> > > > > responsible for the security of the Jar. If it was hacked, it means
> > that
> > > > > there are loopholes in the company's permissions/network and they
> > need to
> > > > > fix these problems. All in all, I agree with you on this point, but
> > Option
> > > > > can't solve this problem.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Ron
> > > > >
> >

Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by Jark Wu <im...@gmail.com>.
Thank Ron for updating the FLIP.

I think the updated FLIP has addressed Martijn's concern.
I don't have other feedback. So +1 for a vote.

Best,
Jark

On Fri, 15 Apr 2022 at 16:36, 刘大龙 <ld...@zju.edu.cn> wrote:

> Hi, Jingsong
>
> Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS
> S3 OSS will be supported.
>
> Best,
>
> Ron
>
> > -----原始邮件-----
> > 发件人: "Jingsong Li" <ji...@gmail.com>
> > 发送时间: 2022-04-14 17:55:03 (星期四)
> > 收件人: dev <de...@flink.apache.org>
> > 抄送:
> > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> Function DDL
> >
> > I agree with Martijn.
> >
> > At least, HDFS S3 OSS should be supported.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com>
> wrote:
> > >
> > > Hi Ron,
> > >
> > > The FLIP mentions that the priority will be set to support HDFS as a
> > > resource provider. I'm concerned that we end up with a partially
> > > implemented FLIP which only supports local and HDFS and then we move
> on to
> > > other features, as we see happen with others. I would argue that we
> should
> > > not focus on one resource provider, but that at least S3 support is
> > > included in the same Flink release as HDFS support is.
> > >
> > > Best regards,
> > >
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > > https://github.com/MartijnVisser
> > >
> > >
> > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> > >
> > > > Hi, everyone
> > > >
> > > > First of all, thanks for the valuable suggestions received about this
> > > > FLIP. After some discussion, it looks like all concerns have been
> addressed
> > > > for now, so I will start a vote about this FLIP in two or three days
> later.
> > > > Also, further feedback is very welcome.
> > > >
> > > > Best,
> > > >
> > > > Ron
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > > 收件人: dev@flink.apache.org
> > > > > 抄送:
> > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> Function
> > > > DDL
> > > > >
> > > > > Hi, Martijn
> > > > >
> > > > > Do you have any question about this FLIP? looking forward to your
> more
> > > > feedback.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > > 收件人: dev@flink.apache.org
> > > > > > 抄送:
> > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> Function DDL
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----原始邮件-----
> > > > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > 抄送:
> > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> DDL
> > > > > > >
> > > > > > > Hi Ron,
> > > > > > >
> > > > > > > Thanks for creating the FLIP. You're talking about both local
> and
> > > > remote
> > > > > > > resources. With regards to remote resources, how do you see
> this
> > > > work with
> > > > > > > Flink's filesystem abstraction? I did read in the FLIP that
> Hadoop
> > > > > > > dependencies are not packaged, but I would hope that we do
> that for
> > > > all
> > > > > > > filesystem implementation. I don't think it's a good idea to
> have
> > > > any tight
> > > > > > > coupling to file system implementations, especially if at some
> point
> > > > we
> > > > > > > could also externalize file system implementations (like we're
> doing
> > > > for
> > > > > > > connectors already). I think the FLIP would be better by not
> only
> > > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > > generic
> > > > > > > term since there are more options than Hadoop.
> > > > > > >
> > > > > > > I'm also thinking about security/operations implications:
> would it be
> > > > > > > possible for bad actor X to create a JAR that either
> influences other
> > > > > > > running jobs, leaks data or credentials or anything else? If
> so, I
> > > > think it
> > > > > > > would also be good to have an option to disable this feature
> > > > completely. I
> > > > > > > think there are roughly two types of companies who run Flink:
> those
> > > > who
> > > > > > > open it up for everyone to use (here the feature would be
> welcomed)
> > > > and
> > > > > > > those who need to follow certain minimum standards/have a more
> > > > closed Flink
> > > > > > > ecosystem). They usually want to validate a JAR upfront before
> > > > making it
> > > > > > > available, even at the expense of speed, because it gives them
> more
> > > > control
> > > > > > > over what will be running in their environment.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Martijn Visser
> > > > > > > https://twitter.com/MartijnVisser82
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----原始邮件-----
> > > > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > > 抄送:
> > > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> DDL
> > > > > > > > >
> > > > > > > > > Hi Ron,
> > > > > > > > >
> > > > > > > > > Thanks for reviving the discussion of the work. The design
> looks
> > > > good. A
> > > > > > > > > small typo in the FLIP is that currently it is marked as
> > > > released in
> > > > > > > > 1.16.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best Regards
> > > > > > > > > Peter Huang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <
> zhangmang1@163.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > hi Yuxia,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Since we download the file to the local, remember to
> clean it
> > > > up when
> > > > > > > > the
> > > > > > > > > > flink client exits
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Mang Zhang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > > > users will
> > > > > > > > > > benefit from it. The flip looks good to me. I just have
> two
> > > > minor
> > > > > > > > questions:
> > > > > > > > > > >1. For synax explanation, I see it's "Create ....
> function as
> > > > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > > > self-dedescriptive for actually it's not a random name
> but the
> > > > name of
> > > > > > > > the
> > > > > > > > > > class that provides the implementation for function to be
> > > > create.
> > > > > > > > > > >May be it'll be more clear to use "class_name" replace
> > > > "identifier"
> > > > > > > > just
> > > > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > > > >
> > > > > > > > > > >2.  >> If the resource used is a remote resource, it
> will
> > > > first
> > > > > > > > download
> > > > > > > > > > the resource to a local temporary directory, which will
> be
> > > > generated
> > > > > > > > using
> > > > > > > > > > UUID, and then register the local path to the user class
> > > > loader.
> > > > > > > > > > >For the above explanation in this FLIP, It seems for
> such
> > > > statement
> > > > > > > > sets,
> > > > > > > > > > >""
> > > > > > > > > > >Create  function as org.apache.udf1 using jar
> > > > 'hdfs://myudfs.jar';
> > > > > > > > > > >Create  function as org.apache.udf2 using jar
> > > > 'hdfs://myudfs.jar';
> > > > > > > > > > >""
> > > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for
> twice.
> > > > So is it
> > > > > > > > > > possible to provide some cache mechanism that we won't
> need to
> > > > > > > > download /
> > > > > > > > > > store for twice?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >Best regards,
> > > > > > > > > > >Yuxia
> > > > > > > > > > >[1]
> > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > > > >[2]
> > > > > > > > > >
> > > > > > > >
> > > >
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > > > >
> > > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is
> so
> > > > good.
> > > > > > > > > > >In our company, when users use custom UDF, it is very
> > > > inconvenient,
> > > > > > > > and
> > > > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > > > >and cannot refer to the existing udf jar through the
> existing
> > > > udf jar.
> > > > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > > > >If we implement this feature, users can focus on their
> own
> > > > business
> > > > > > > > > > development.
> > > > > > > > > > >I can also contribute if needed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >--
> > > > > > > > > > >
> > > > > > > > > > >Best regards,
> > > > > > > > > > >Mang Zhang
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > > > >>Hi, everyone
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>I would like to open a discussion for support advanced
> > > > Function DDL,
> > > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > > > Function DDL
> > > > > > > > is
> > > > > > > > > > defined. Until now it is partially released as the Flink
> > > > function DDL
> > > > > > > > with
> > > > > > > > > > user defined resources is not clearly discussed and
> > > > implemented. It is
> > > > > > > > an
> > > > > > > > > > important feature for support to register UDF with
> custom jar
> > > > resource,
> > > > > > > > > > users can use UDF more more easily without having to put
> jars
> > > > under the
> > > > > > > > > > classpath in advance.
> > > > > > > > > > >>
> > > > > > > > > > >>Looking forward to your feedback.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>[1]
> > > > > > > > > >
> > > > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>Best,
> > > > > > > > > > >>
> > > > > > > > > > >>Ron
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > > > effort, thank
> > > > > > > > you very much.
> > > > > > > >
> > > > > >
> > > > > > Hi, Martijn
> > > > > > Thank you very much for the feedback, it was very useful for me.
> > > > > > 1. Filesystem abstraction: With regards to remote resources, I
> agree
> > > > with you that we should use Flink's FileSytem abstraction to
> supports all
> > > > types of file system, including HTTP, S3, HDFS, etc, rather than
> binding to
> > > > a specific implementation. Currently in the first version, we will
> give
> > > > priority to support HDFS as a resource provider by Flink's FileSytem
> > > > abstraction. HDFS is used very much.
> > > > > >
> > > > > > 2. Security/operations implications: The point you are
> considering is
> > > > a great one, security is an issue that needs to be considered. Your
> > > > starting point is that Jar needs to have some verification done on it
> > > > before it is used, to avoid some non-secure behavior. However, IMO,
> the
> > > > validation of Jar is supposed to be done by the platform side
> itself, and
> > > > the platform needs to ensure that users have permission to use the
> jar and
> > > > security of Jar. Option is not able to disable the syntax
> completely, the
> > > > user can still open it by Set command. I think the most correct
> approach is
> > > > the platform to verify rather than the engine side. In addition, the
> > > > current Connector/UDF/DataStream program also exists using custom
> Jar case,
> > > > these Jar will also have security issues, Flink currently does not
> provide
> > > > Option to prohibit the use of custom Jar. The user used a custom
> Jar, which
> > > > means that the user has permission to do this, then the user should
> be
> > > > responsible for the security of the Jar. If it was hacked, it means
> that
> > > > there are loopholes in the company's permissions/network and they
> need to
> > > > fix these problems. All in all, I agree with you on this point, but
> Option
> > > > can't solve this problem.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Ron
> > > >
>

Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by 刘大龙 <ld...@zju.edu.cn>.
Hi, Jingsong

Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS S3 OSS will be supported.

Best,

Ron

> -----原始邮件-----
> 发件人: "Jingsong Li" <ji...@gmail.com>
> 发送时间: 2022-04-14 17:55:03 (星期四)
> 收件人: dev <de...@flink.apache.org>
> 抄送: 
> 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> I agree with Martijn.
> 
> At least, HDFS S3 OSS should be supported.
> 
> Best,
> Jingsong
> 
> On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com> wrote:
> >
> > Hi Ron,
> >
> > The FLIP mentions that the priority will be set to support HDFS as a
> > resource provider. I'm concerned that we end up with a partially
> > implemented FLIP which only supports local and HDFS and then we move on to
> > other features, as we see happen with others. I would argue that we should
> > not focus on one resource provider, but that at least S3 support is
> > included in the same Flink release as HDFS support is.
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> >
> > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> >
> > > Hi, everyone
> > >
> > > First of all, thanks for the valuable suggestions received about this
> > > FLIP. After some discussion, it looks like all concerns have been addressed
> > > for now, so I will start a vote about this FLIP in two or three days later.
> > > Also, further feedback is very welcome.
> > >
> > > Best,
> > >
> > > Ron
> > >
> > >
> > > > -----原始邮件-----
> > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > 收件人: dev@flink.apache.org
> > > > 抄送:
> > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > > DDL
> > > >
> > > > Hi, Martijn
> > > >
> > > > Do you have any question about this FLIP? looking forward to your more
> > > feedback.
> > > >
> > > > Best,
> > > >
> > > > Ron
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > 收件人: dev@flink.apache.org
> > > > > 抄送:
> > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > 抄送:
> > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > >
> > > > > > Hi Ron,
> > > > > >
> > > > > > Thanks for creating the FLIP. You're talking about both local and
> > > remote
> > > > > > resources. With regards to remote resources, how do you see this
> > > work with
> > > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > > > dependencies are not packaged, but I would hope that we do that for
> > > all
> > > > > > filesystem implementation. I don't think it's a good idea to have
> > > any tight
> > > > > > coupling to file system implementations, especially if at some point
> > > we
> > > > > > could also externalize file system implementations (like we're doing
> > > for
> > > > > > connectors already). I think the FLIP would be better by not only
> > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > generic
> > > > > > term since there are more options than Hadoop.
> > > > > >
> > > > > > I'm also thinking about security/operations implications: would it be
> > > > > > possible for bad actor X to create a JAR that either influences other
> > > > > > running jobs, leaks data or credentials or anything else? If so, I
> > > think it
> > > > > > would also be good to have an option to disable this feature
> > > completely. I
> > > > > > think there are roughly two types of companies who run Flink: those
> > > who
> > > > > > open it up for everyone to use (here the feature would be welcomed)
> > > and
> > > > > > those who need to follow certain minimum standards/have a more
> > > closed Flink
> > > > > > ecosystem). They usually want to validate a JAR upfront before
> > > making it
> > > > > > available, even at the expense of speed, because it gives them more
> > > control
> > > > > > over what will be running in their environment.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Martijn Visser
> > > > > > https://twitter.com/MartijnVisser82
> > > > > >
> > > > > >
> > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----原始邮件-----
> > > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > 抄送:
> > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > >
> > > > > > > > Hi Ron,
> > > > > > > >
> > > > > > > > Thanks for reviving the discussion of the work. The design looks
> > > good. A
> > > > > > > > small typo in the FLIP is that currently it is marked as
> > > released in
> > > > > > > 1.16.
> > > > > > > >
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Peter Huang
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > hi Yuxia,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Since we download the file to the local, remember to clean it
> > > up when
> > > > > > > the
> > > > > > > > > flink client exits
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Mang Zhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > > users will
> > > > > > > > > benefit from it. The flip looks good to me. I just have two
> > > minor
> > > > > > > questions:
> > > > > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > > self-dedescriptive for actually it's not a random name but the
> > > name of
> > > > > > > the
> > > > > > > > > class that provides the implementation for function to be
> > > create.
> > > > > > > > > >May be it'll be more clear to use "class_name" replace
> > > "identifier"
> > > > > > > just
> > > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > > >
> > > > > > > > > >2.  >> If the resource used is a remote resource, it will
> > > first
> > > > > > > download
> > > > > > > > > the resource to a local temporary directory, which will be
> > > generated
> > > > > > > using
> > > > > > > > > UUID, and then register the local path to the user class
> > > loader.
> > > > > > > > > >For the above explanation in this FLIP, It seems for such
> > > statement
> > > > > > > sets,
> > > > > > > > > >""
> > > > > > > > > >Create  function as org.apache.udf1 using jar
> > > 'hdfs://myudfs.jar';
> > > > > > > > > >Create  function as org.apache.udf2 using jar
> > > 'hdfs://myudfs.jar';
> > > > > > > > > >""
> > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice.
> > > So is it
> > > > > > > > > possible to provide some cache mechanism that we won't need to
> > > > > > > download /
> > > > > > > > > store for twice?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >Best regards,
> > > > > > > > > >Yuxia
> > > > > > > > > >[1]
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > > >[2]
> > > > > > > > >
> > > > > > >
> > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > > >
> > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so
> > > good.
> > > > > > > > > >In our company, when users use custom UDF, it is very
> > > inconvenient,
> > > > > > > and
> > > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > > >and cannot refer to the existing udf jar through the existing
> > > udf jar.
> > > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > > >If we implement this feature, users can focus on their own
> > > business
> > > > > > > > > development.
> > > > > > > > > >I can also contribute if needed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >--
> > > > > > > > > >
> > > > > > > > > >Best regards,
> > > > > > > > > >Mang Zhang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > > >>Hi, everyone
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>I would like to open a discussion for support advanced
> > > Function DDL,
> > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > > Function DDL
> > > > > > > is
> > > > > > > > > defined. Until now it is partially released as the Flink
> > > function DDL
> > > > > > > with
> > > > > > > > > user defined resources is not clearly discussed and
> > > implemented. It is
> > > > > > > an
> > > > > > > > > important feature for support to register UDF with custom jar
> > > resource,
> > > > > > > > > users can use UDF more more easily without having to put jars
> > > under the
> > > > > > > > > classpath in advance.
> > > > > > > > > >>
> > > > > > > > > >>Looking forward to your feedback.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>[1]
> > > > > > > > >
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>Best,
> > > > > > > > > >>
> > > > > > > > > >>Ron
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > > effort, thank
> > > > > > > you very much.
> > > > > > >
> > > > >
> > > > > Hi, Martijn
> > > > > Thank you very much for the feedback, it was very useful for me.
> > > > > 1. Filesystem abstraction: With regards to remote resources, I agree
> > > with you that we should use Flink's FileSytem abstraction to supports all
> > > types of file system, including HTTP, S3, HDFS, etc, rather than binding to
> > > a specific implementation. Currently in the first version, we will give
> > > priority to support HDFS as a resource provider by Flink's FileSytem
> > > abstraction. HDFS is used very much.
> > > > >
> > > > > 2. Security/operations implications: The point you are considering is
> > > a great one, security is an issue that needs to be considered. Your
> > > starting point is that Jar needs to have some verification done on it
> > > before it is used, to avoid some non-secure behavior. However, IMO, the
> > > validation of Jar is supposed to be done by the platform side itself, and
> > > the platform needs to ensure that users have permission to use the jar and
> > > security of Jar. Option is not able to disable the syntax completely, the
> > > user can still open it by Set command. I think the most correct approach is
> > > the platform to verify rather than the engine side. In addition, the
> > > current Connector/UDF/DataStream program also exists using custom Jar case,
> > > these Jar will also have security issues, Flink currently does not provide
> > > Option to prohibit the use of custom Jar. The user used a custom Jar, which
> > > means that the user has permission to do this, then the user should be
> > > responsible for the security of the Jar. If it was hacked, it means that
> > > there are loopholes in the company's permissions/network and they need to
> > > fix these problems. All in all, I agree with you on this point, but Option
> > > can't solve this problem.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > >

Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by Jingsong Li <ji...@gmail.com>.
I agree with Martijn.

At least, HDFS S3 OSS should be supported.

Best,
Jingsong

On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com> wrote:
>
> Hi Ron,
>
> The FLIP mentions that the priority will be set to support HDFS as a
> resource provider. I'm concerned that we end up with a partially
> implemented FLIP which only supports local and HDFS and then we move on to
> other features, as we see happen with others. I would argue that we should
> not focus on one resource provider, but that at least S3 support is
> included in the same Flink release as HDFS support is.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
>
> > Hi, everyone
> >
> > First of all, thanks for the valuable suggestions received about this
> > FLIP. After some discussion, it looks like all concerns have been addressed
> > for now, so I will start a vote about this FLIP in two or three days later.
> > Also, further feedback is very welcome.
> >
> > Best,
> >
> > Ron
> >
> >
> > > -----原始邮件-----
> > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > 收件人: dev@flink.apache.org
> > > 抄送:
> > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > >
> > > Hi, Martijn
> > >
> > > Do you have any question about this FLIP? looking forward to your more
> > feedback.
> > >
> > > Best,
> > >
> > > Ron
> > >
> > >
> > > > -----原始邮件-----
> > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > 收件人: dev@flink.apache.org
> > > > 抄送:
> > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > >
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > 收件人: dev <de...@flink.apache.org>
> > > > > 抄送:
> > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for creating the FLIP. You're talking about both local and
> > remote
> > > > > resources. With regards to remote resources, how do you see this
> > work with
> > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > > dependencies are not packaged, but I would hope that we do that for
> > all
> > > > > filesystem implementation. I don't think it's a good idea to have
> > any tight
> > > > > coupling to file system implementations, especially if at some point
> > we
> > > > > could also externalize file system implementations (like we're doing
> > for
> > > > > connectors already). I think the FLIP would be better by not only
> > > > > referring to "Hadoop" as a remote resource provider, but a more
> > generic
> > > > > term since there are more options than Hadoop.
> > > > >
> > > > > I'm also thinking about security/operations implications: would it be
> > > > > possible for bad actor X to create a JAR that either influences other
> > > > > running jobs, leaks data or credentials or anything else? If so, I
> > think it
> > > > > would also be good to have an option to disable this feature
> > completely. I
> > > > > think there are roughly two types of companies who run Flink: those
> > who
> > > > > open it up for everyone to use (here the feature would be welcomed)
> > and
> > > > > those who need to follow certain minimum standards/have a more
> > closed Flink
> > > > > ecosystem). They usually want to validate a JAR upfront before
> > making it
> > > > > available, even at the expense of speed, because it gives them more
> > control
> > > > > over what will be running in their environment.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn Visser
> > > > > https://twitter.com/MartijnVisser82
> > > > >
> > > > >
> > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----原始邮件-----
> > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > 抄送:
> > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > >
> > > > > > > Hi Ron,
> > > > > > >
> > > > > > > Thanks for reviving the discussion of the work. The design looks
> > good. A
> > > > > > > small typo in the FLIP is that currently it is marked as
> > released in
> > > > > > 1.16.
> > > > > > >
> > > > > > >
> > > > > > > Best Regards
> > > > > > > Peter Huang
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com>
> > wrote:
> > > > > > >
> > > > > > > > hi Yuxia,
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > >
> > > > > > > >
> > > > > > > > Since we download the file to the local, remember to clean it
> > up when
> > > > > > the
> > > > > > > > flink client exits
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Mang Zhang
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > users will
> > > > > > > > benefit from it. The flip looks good to me. I just have two
> > minor
> > > > > > questions:
> > > > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > self-dedescriptive for actually it's not a random name but the
> > name of
> > > > > > the
> > > > > > > > class that provides the implementation for function to be
> > create.
> > > > > > > > >May be it'll be more clear to use "class_name" replace
> > "identifier"
> > > > > > just
> > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > >
> > > > > > > > >2.  >> If the resource used is a remote resource, it will
> > first
> > > > > > download
> > > > > > > > the resource to a local temporary directory, which will be
> > generated
> > > > > > using
> > > > > > > > UUID, and then register the local path to the user class
> > loader.
> > > > > > > > >For the above explanation in this FLIP, It seems for such
> > statement
> > > > > > sets,
> > > > > > > > >""
> > > > > > > > >Create  function as org.apache.udf1 using jar
> > 'hdfs://myudfs.jar';
> > > > > > > > >Create  function as org.apache.udf2 using jar
> > 'hdfs://myudfs.jar';
> > > > > > > > >""
> > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice.
> > So is it
> > > > > > > > possible to provide some cache mechanism that we won't need to
> > > > > > download /
> > > > > > > > store for twice?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >Best regards,
> > > > > > > > >Yuxia
> > > > > > > > >[1]
> > > > > >
> > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > >[2]
> > > > > > > >
> > > > > >
> > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > >
> > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so
> > good.
> > > > > > > > >In our company, when users use custom UDF, it is very
> > inconvenient,
> > > > > > and
> > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > >and cannot refer to the existing udf jar through the existing
> > udf jar.
> > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > >If we implement this feature, users can focus on their own
> > business
> > > > > > > > development.
> > > > > > > > >I can also contribute if needed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >--
> > > > > > > > >
> > > > > > > > >Best regards,
> > > > > > > > >Mang Zhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > >>Hi, everyone
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>I would like to open a discussion for support advanced
> > Function DDL,
> > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > Function DDL
> > > > > > is
> > > > > > > > defined. Until now it is partially released as the Flink
> > function DDL
> > > > > > with
> > > > > > > > user defined resources is not clearly discussed and
> > implemented. It is
> > > > > > an
> > > > > > > > important feature for support to register UDF with custom jar
> > resource,
> > > > > > > > users can use UDF more more easily without having to put jars
> > under the
> > > > > > > > classpath in advance.
> > > > > > > > >>
> > > > > > > > >>Looking forward to your feedback.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>[1]
> > > > > > > >
> > > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>Best,
> > > > > > > > >>
> > > > > > > > >>Ron
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > effort, thank
> > > > > > you very much.
> > > > > >
> > > >
> > > > Hi, Martijn
> > > > Thank you very much for the feedback, it was very useful for me.
> > > > 1. Filesystem abstraction: With regards to remote resources, I agree
> > with you that we should use Flink's FileSytem abstraction to supports all
> > types of file system, including HTTP, S3, HDFS, etc, rather than binding to
> > a specific implementation. Currently in the first version, we will give
> > priority to support HDFS as a resource provider by Flink's FileSytem
> > abstraction. HDFS is used very much.
> > > >
> > > > 2. Security/operations implications: The point you are considering is
> > a great one, security is an issue that needs to be considered. Your
> > starting point is that Jar needs to have some verification done on it
> > before it is used, to avoid some non-secure behavior. However, IMO, the
> > validation of Jar is supposed to be done by the platform side itself, and
> > the platform needs to ensure that users have permission to use the jar and
> > security of Jar. Option is not able to disable the syntax completely, the
> > user can still open it by Set command. I think the most correct approach is
> > the platform to verify rather than the engine side. In addition, the
> > current Connector/UDF/DataStream program also exists using custom Jar case,
> > these Jar will also have security issues, Flink currently does not provide
> > Option to prohibit the use of custom Jar. The user used a custom Jar, which
> > means that the user has permission to do this, then the user should be
> > responsible for the security of the Jar. If it was hacked, it means that
> > there are loopholes in the company's permissions/network and they need to
> > fix these problems. All in all, I agree with you on this point, but Option
> > can't solve this problem.
> > > >
> > > > Best,
> > > >
> > > > Ron
> >

Re: Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by 刘大龙 <ld...@zju.edu.cn>.
Hi, Martijn

My description in the FLIP is not very clear, we will use flink FileSystem abstraction to download resource, so HDFS/S3/OSS etc are will be supported in first version.

Best,

Ron


> -----原始邮件-----
> 发件人: "Martijn Visser" <ma...@ververica.com>
> 发送时间: 2022-04-14 16:46:24 (星期四)
> 收件人: dev <de...@flink.apache.org>
> 抄送: 
> 主题: Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron,
> 
> The FLIP mentions that the priority will be set to support HDFS as a
> resource provider. I'm concerned that we end up with a partially
> implemented FLIP which only supports local and HDFS and then we move on to
> other features, as we see happen with others. I would argue that we should
> not focus on one resource provider, but that at least S3 support is
> included in the same Flink release as HDFS support is.
> 
> Best regards,
> 
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
> 
> 
> On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> 
> > Hi, everyone
> >
> > First of all, thanks for the valuable suggestions received about this
> > FLIP. After some discussion, it looks like all concerns have been addressed
> > for now, so I will start a vote about this FLIP in two or three days later.
> > Also, further feedback is very welcome.
> >
> > Best,
> >
> > Ron
> >
> >
> > > -----原始邮件-----
> > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > 收件人: dev@flink.apache.org
> > > 抄送:
> > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > >
> > > Hi, Martijn
> > >
> > > Do you have any question about this FLIP? looking forward to your more
> > feedback.
> > >
> > > Best,
> > >
> > > Ron
> > >
> > >
> > > > -----原始邮件-----
> > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > 收件人: dev@flink.apache.org
> > > > 抄送:
> > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > >
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > 收件人: dev <de...@flink.apache.org>
> > > > > 抄送:
> > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for creating the FLIP. You're talking about both local and
> > remote
> > > > > resources. With regards to remote resources, how do you see this
> > work with
> > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > > dependencies are not packaged, but I would hope that we do that for
> > all
> > > > > filesystem implementation. I don't think it's a good idea to have
> > any tight
> > > > > coupling to file system implementations, especially if at some point
> > we
> > > > > could also externalize file system implementations (like we're doing
> > for
> > > > > connectors already). I think the FLIP would be better by not only
> > > > > referring to "Hadoop" as a remote resource provider, but a more
> > generic
> > > > > term since there are more options than Hadoop.
> > > > >
> > > > > I'm also thinking about security/operations implications: would it be
> > > > > possible for bad actor X to create a JAR that either influences other
> > > > > running jobs, leaks data or credentials or anything else? If so, I
> > think it
> > > > > would also be good to have an option to disable this feature
> > completely. I
> > > > > think there are roughly two types of companies who run Flink: those
> > who
> > > > > open it up for everyone to use (here the feature would be welcomed)
> > and
> > > > > those who need to follow certain minimum standards/have a more
> > closed Flink
> > > > > ecosystem). They usually want to validate a JAR upfront before
> > making it
> > > > > available, even at the expense of speed, because it gives them more
> > control
> > > > > over what will be running in their environment.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn Visser
> > > > > https://twitter.com/MartijnVisser82
> > > > >
> > > > >
> > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----原始邮件-----
> > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > 抄送:
> > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > >
> > > > > > > Hi Ron,
> > > > > > >
> > > > > > > Thanks for reviving the discussion of the work. The design looks
> > good. A
> > > > > > > small typo in the FLIP is that currently it is marked as
> > released in
> > > > > > 1.16.
> > > > > > >
> > > > > > >
> > > > > > > Best Regards
> > > > > > > Peter Huang
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com>
> > wrote:
> > > > > > >
> > > > > > > > hi Yuxia,
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > >
> > > > > > > >
> > > > > > > > Since we download the file to the local, remember to clean it
> > up when
> > > > > > the
> > > > > > > > flink client exits
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Mang Zhang
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > users will
> > > > > > > > benefit from it. The flip looks good to me. I just have two
> > minor
> > > > > > questions:
> > > > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > self-dedescriptive for actually it's not a random name but the
> > name of
> > > > > > the
> > > > > > > > class that provides the implementation for function to be
> > create.
> > > > > > > > >May be it'll be more clear to use "class_name" replace
> > "identifier"
> > > > > > just
> > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > >
> > > > > > > > >2.  >> If the resource used is a remote resource, it will
> > first
> > > > > > download
> > > > > > > > the resource to a local temporary directory, which will be
> > generated
> > > > > > using
> > > > > > > > UUID, and then register the local path to the user class
> > loader.
> > > > > > > > >For the above explanation in this FLIP, It seems for such
> > statement
> > > > > > sets,
> > > > > > > > >""
> > > > > > > > >Create  function as org.apache.udf1 using jar
> > 'hdfs://myudfs.jar';
> > > > > > > > >Create  function as org.apache.udf2 using jar
> > 'hdfs://myudfs.jar';
> > > > > > > > >""
> > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice.
> > So is it
> > > > > > > > possible to provide some cache mechanism that we won't need to
> > > > > > download /
> > > > > > > > store for twice?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >Best regards,
> > > > > > > > >Yuxia
> > > > > > > > >[1]
> > > > > >
> > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > >[2]
> > > > > > > >
> > > > > >
> > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > >
> > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so
> > good.
> > > > > > > > >In our company, when users use custom UDF, it is very
> > inconvenient,
> > > > > > and
> > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > >and cannot refer to the existing udf jar through the existing
> > udf jar.
> > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > >If we implement this feature, users can focus on their own
> > business
> > > > > > > > development.
> > > > > > > > >I can also contribute if needed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >--
> > > > > > > > >
> > > > > > > > >Best regards,
> > > > > > > > >Mang Zhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > >>Hi, everyone
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>I would like to open a discussion for support advanced
> > Function DDL,
> > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > Function DDL
> > > > > > is
> > > > > > > > defined. Until now it is partially released as the Flink
> > function DDL
> > > > > > with
> > > > > > > > user defined resources is not clearly discussed and
> > implemented. It is
> > > > > > an
> > > > > > > > important feature for support to register UDF with custom jar
> > resource,
> > > > > > > > users can use UDF more more easily without having to put jars
> > under the
> > > > > > > > classpath in advance.
> > > > > > > > >>
> > > > > > > > >>Looking forward to your feedback.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>[1]
> > > > > > > >
> > > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>Best,
> > > > > > > > >>
> > > > > > > > >>Ron
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > effort, thank
> > > > > > you very much.
> > > > > >
> > > >
> > > > Hi, Martijn
> > > > Thank you very much for the feedback, it was very useful for me.
> > > > 1. Filesystem abstraction: With regards to remote resources, I agree
> > with you that we should use Flink's FileSytem abstraction to supports all
> > types of file system, including HTTP, S3, HDFS, etc, rather than binding to
> > a specific implementation. Currently in the first version, we will give
> > priority to support HDFS as a resource provider by Flink's FileSytem
> > abstraction. HDFS is used very much.
> > > >
> > > > 2. Security/operations implications: The point you are considering is
> > a great one, security is an issue that needs to be considered. Your
> > starting point is that Jar needs to have some verification done on it
> > before it is used, to avoid some non-secure behavior. However, IMO, the
> > validation of Jar is supposed to be done by the platform side itself, and
> > the platform needs to ensure that users have permission to use the jar and
> > security of Jar. Option is not able to disable the syntax completely, the
> > user can still open it by Set command. I think the most correct approach is
> > the platform to verify rather than the engine side. In addition, the
> > current Connector/UDF/DataStream program also exists using custom Jar case,
> > these Jar will also have security issues, Flink currently does not provide
> > Option to prohibit the use of custom Jar. The user used a custom Jar, which
> > means that the user has permission to do this, then the user should be
> > responsible for the security of the Jar. If it was hacked, it means that
> > there are loopholes in the company's permissions/network and they need to
> > fix these problems. All in all, I agree with you on this point, but Option
> > can't solve this problem.
> > > >
> > > > Best,
> > > >
> > > > Ron
> >

Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by Martijn Visser <ma...@ververica.com>.
Hi Ron,

The FLIP mentions that the priority will be set to support HDFS as a
resource provider. I'm concerned that we end up with a partially
implemented FLIP which only supports local and HDFS and then we move on to
other features, as we see happen with others. I would argue that we should
not focus on one resource provider, but that at least S3 support is
included in the same Flink release as HDFS support is.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:

> Hi, everyone
>
> First of all, thanks for the valuable suggestions received about this
> FLIP. After some discussion, it looks like all concerns have been addressed
> for now, so I will start a vote about this FLIP in two or three days later.
> Also, further feedback is very welcome.
>
> Best,
>
> Ron
>
>
> > -----原始邮件-----
> > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > 发送时间: 2022-04-08 10:09:46 (星期五)
> > 收件人: dev@flink.apache.org
> > 抄送:
> > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> DDL
> >
> > Hi, Martijn
> >
> > Do you have any question about this FLIP? looking forward to your more
> feedback.
> >
> > Best,
> >
> > Ron
> >
> >
> > > -----原始邮件-----
> > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > 收件人: dev@flink.apache.org
> > > 抄送:
> > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > >
> > >
> > >
> > >
> > > > -----原始邮件-----
> > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > 收件人: dev <de...@flink.apache.org>
> > > > 抄送:
> > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > > Hi Ron,
> > > >
> > > > Thanks for creating the FLIP. You're talking about both local and
> remote
> > > > resources. With regards to remote resources, how do you see this
> work with
> > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > dependencies are not packaged, but I would hope that we do that for
> all
> > > > filesystem implementation. I don't think it's a good idea to have
> any tight
> > > > coupling to file system implementations, especially if at some point
> we
> > > > could also externalize file system implementations (like we're doing
> for
> > > > connectors already). I think the FLIP would be better by not only
> > > > referring to "Hadoop" as a remote resource provider, but a more
> generic
> > > > term since there are more options than Hadoop.
> > > >
> > > > I'm also thinking about security/operations implications: would it be
> > > > possible for bad actor X to create a JAR that either influences other
> > > > running jobs, leaks data or credentials or anything else? If so, I
> think it
> > > > would also be good to have an option to disable this feature
> completely. I
> > > > think there are roughly two types of companies who run Flink: those
> who
> > > > open it up for everyone to use (here the feature would be welcomed)
> and
> > > > those who need to follow certain minimum standards/have a more
> closed Flink
> > > > ecosystem). They usually want to validate a JAR upfront before
> making it
> > > > available, even at the expense of speed, because it gives them more
> control
> > > > over what will be running in their environment.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn Visser
> > > > https://twitter.com/MartijnVisser82
> > > >
> > > >
> > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > 抄送:
> > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > >
> > > > > > Hi Ron,
> > > > > >
> > > > > > Thanks for reviving the discussion of the work. The design looks
> good. A
> > > > > > small typo in the FLIP is that currently it is marked as
> released in
> > > > > 1.16.
> > > > > >
> > > > > >
> > > > > > Best Regards
> > > > > > Peter Huang
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com>
> wrote:
> > > > > >
> > > > > > > hi Yuxia,
> > > > > > >
> > > > > > >
> > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > >
> > > > > > >
> > > > > > > Since we download the file to the local, remember to clean it
> up when
> > > > > the
> > > > > > > flink client exits
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Mang Zhang
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> users will
> > > > > > > benefit from it. The flip looks good to me. I just have two
> minor
> > > > > questions:
> > > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > self-dedescriptive for actually it's not a random name but the
> name of
> > > > > the
> > > > > > > class that provides the implementation for function to be
> create.
> > > > > > > >May be it'll be more clear to use "class_name" replace
> "identifier"
> > > > > just
> > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > >
> > > > > > > >2.  >> If the resource used is a remote resource, it will
> first
> > > > > download
> > > > > > > the resource to a local temporary directory, which will be
> generated
> > > > > using
> > > > > > > UUID, and then register the local path to the user class
> loader.
> > > > > > > >For the above explanation in this FLIP, It seems for such
> statement
> > > > > sets,
> > > > > > > >""
> > > > > > > >Create  function as org.apache.udf1 using jar
> 'hdfs://myudfs.jar';
> > > > > > > >Create  function as org.apache.udf2 using jar
> 'hdfs://myudfs.jar';
> > > > > > > >""
> > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice.
> So is it
> > > > > > > possible to provide some cache mechanism that we won't need to
> > > > > download /
> > > > > > > store for twice?
> > > > > > > >
> > > > > > > >
> > > > > > > >Best regards,
> > > > > > > >Yuxia
> > > > > > > >[1]
> > > > >
> https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > >[2]
> > > > > > >
> > > > >
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > >
> > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so
> good.
> > > > > > > >In our company, when users use custom UDF, it is very
> inconvenient,
> > > > > and
> > > > > > > the code needs to be packaged into the job jar,
> > > > > > > >and cannot refer to the existing udf jar through the existing
> udf jar.
> > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > >If we implement this feature, users can focus on their own
> business
> > > > > > > development.
> > > > > > > >I can also contribute if needed.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >--
> > > > > > > >
> > > > > > > >Best regards,
> > > > > > > >Mang Zhang
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > >>Hi, everyone
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>I would like to open a discussion for support advanced
> Function DDL,
> > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> Function DDL
> > > > > is
> > > > > > > defined. Until now it is partially released as the Flink
> function DDL
> > > > > with
> > > > > > > user defined resources is not clearly discussed and
> implemented. It is
> > > > > an
> > > > > > > important feature for support to register UDF with custom jar
> resource,
> > > > > > > users can use UDF more more easily without having to put jars
> under the
> > > > > > > classpath in advance.
> > > > > > > >>
> > > > > > > >>Looking forward to your feedback.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>[1]
> > > > > > >
> > > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>Best,
> > > > > > > >>
> > > > > > > >>Ron
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > >
> > > > > Hi, Peter, Thanks for your feedback. This work also has your
> effort, thank
> > > > > you very much.
> > > > >
> > >
> > > Hi, Martijn
> > > Thank you very much for the feedback, it was very useful for me.
> > > 1. Filesystem abstraction: With regards to remote resources, I agree
> with you that we should use Flink's FileSytem abstraction to supports all
> types of file system, including HTTP, S3, HDFS, etc, rather than binding to
> a specific implementation. Currently in the first version, we will give
> priority to support HDFS as a resource provider by Flink's FileSytem
> abstraction. HDFS is used very much.
> > >
> > > 2. Security/operations implications: The point you are considering is
> a great one, security is an issue that needs to be considered. Your
> starting point is that Jar needs to have some verification done on it
> before it is used, to avoid some non-secure behavior. However, IMO, the
> validation of Jar is supposed to be done by the platform side itself, and
> the platform needs to ensure that users have permission to use the jar and
> security of Jar. Option is not able to disable the syntax completely, the
> user can still open it by Set command. I think the most correct approach is
> the platform to verify rather than the engine side. In addition, the
> current Connector/UDF/DataStream program also exists using custom Jar case,
> these Jar will also have security issues, Flink currently does not provide
> Option to prohibit the use of custom Jar. The user used a custom Jar, which
> means that the user has permission to do this, then the user should be
> responsible for the security of the Jar. If it was hacked, it means that
> there are loopholes in the company's permissions/network and they need to
> fix these problems. All in all, I agree with you on this point, but Option
> can't solve this problem.
> > >
> > > Best,
> > >
> > > Ron
>

Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by 刘大龙 <ld...@zju.edu.cn>.
Hi, everyone

First of all, thanks for the valuable suggestions received about this FLIP. After some discussion, it looks like all concerns have been addressed for now, so I will start a vote about this FLIP in two or three days later. Also, further feedback is very welcome.

Best,

Ron


> -----原始邮件-----
> 发件人: "刘大龙" <ld...@zju.edu.cn>
> 发送时间: 2022-04-08 10:09:46 (星期五)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi, Martijn
> 
> Do you have any question about this FLIP? looking forward to your more feedback.
> 
> Best,
> 
> Ron
> 
> 
> > -----原始邮件-----
> > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > 发送时间: 2022-03-29 19:33:58 (星期二)
> > 收件人: dev@flink.apache.org
> > 抄送: 
> > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > 
> > 
> > 
> > 
> > > -----原始邮件-----
> > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > 收件人: dev <de...@flink.apache.org>
> > > 抄送: 
> > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > 
> > > Hi Ron,
> > > 
> > > Thanks for creating the FLIP. You're talking about both local and remote
> > > resources. With regards to remote resources, how do you see this work with
> > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > dependencies are not packaged, but I would hope that we do that for all
> > > filesystem implementation. I don't think it's a good idea to have any tight
> > > coupling to file system implementations, especially if at some point we
> > > could also externalize file system implementations (like we're doing for
> > > connectors already). I think the FLIP would be better by not only
> > > referring to "Hadoop" as a remote resource provider, but a more generic
> > > term since there are more options than Hadoop.
> > > 
> > > I'm also thinking about security/operations implications: would it be
> > > possible for bad actor X to create a JAR that either influences other
> > > running jobs, leaks data or credentials or anything else? If so, I think it
> > > would also be good to have an option to disable this feature completely. I
> > > think there are roughly two types of companies who run Flink: those who
> > > open it up for everyone to use (here the feature would be welcomed) and
> > > those who need to follow certain minimum standards/have a more closed Flink
> > > ecosystem). They usually want to validate a JAR upfront before making it
> > > available, even at the expense of speed, because it gives them more control
> > > over what will be running in their environment.
> > > 
> > > Best regards,
> > > 
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > > 
> > > 
> > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > 
> > > >
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > 收件人: dev <de...@flink.apache.org>
> > > > > 抄送:
> > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for reviving the discussion of the work. The design looks good. A
> > > > > small typo in the FLIP is that currently it is marked as released in
> > > > 1.16.
> > > > >
> > > > >
> > > > > Best Regards
> > > > > Peter Huang
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com> wrote:
> > > > >
> > > > > > hi Yuxia,
> > > > > >
> > > > > >
> > > > > > Thanks for your reply. Your reminder is very important !
> > > > > >
> > > > > >
> > > > > > Since we download the file to the local, remember to clean it up when
> > > > the
> > > > > > flink client exits
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Mang Zhang
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > > > > benefit from it. The flip looks good to me. I just have two minor
> > > > questions:
> > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > identifier....", I think the word "identifier" may not be
> > > > > > self-dedescriptive for actually it's not a random name but the name of
> > > > the
> > > > > > class that provides the implementation for function to be create.
> > > > > > >May be it'll be more clear to use "class_name" replace "identifier"
> > > > just
> > > > > > like what Hive[1]/Spark[2] do.
> > > > > > >
> > > > > > >2.  >> If the resource used is a remote resource, it will first
> > > > download
> > > > > > the resource to a local temporary directory, which will be generated
> > > > using
> > > > > > UUID, and then register the local path to the user class loader.
> > > > > > >For the above explanation in this FLIP, It seems for such statement
> > > > sets,
> > > > > > >""
> > > > > > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > > > > > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > > > > > >""
> > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> > > > > > possible to provide some cache mechanism that we won't need to
> > > > download /
> > > > > > store for twice?
> > > > > > >
> > > > > > >
> > > > > > >Best regards,
> > > > > > >Yuxia
> > > > > > >[1]
> > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > >[2]
> > > > > >
> > > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > >收件人:<de...@flink.apache.org>
> > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > >
> > > > > > >Hi Ron, Thank you so much for this suggestion, this is so good.
> > > > > > >In our company, when users use custom UDF, it is very inconvenient,
> > > > and
> > > > > > the code needs to be packaged into the job jar,
> > > > > > >and cannot refer to the existing udf jar through the existing udf jar.
> > > > > > >Or pass in the jar reference in the startup command.
> > > > > > >If we implement this feature, users can focus on their own business
> > > > > > development.
> > > > > > >I can also contribute if needed.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >--
> > > > > > >
> > > > > > >Best regards,
> > > > > > >Mang Zhang
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > >>Hi, everyone
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>I would like to open a discussion for support advanced Function DDL,
> > > > > > this proposal is a continuation of FLIP-79 in which Flink Function DDL
> > > > is
> > > > > > defined. Until now it is partially released as the Flink function DDL
> > > > with
> > > > > > user defined resources is not clearly discussed and implemented. It is
> > > > an
> > > > > > important feature for support to register UDF with custom jar resource,
> > > > > > users can use UDF more more easily without having to put jars under the
> > > > > > classpath in advance.
> > > > > > >>
> > > > > > >>Looking forward to your feedback.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>[1]
> > > > > >
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>Best,
> > > > > > >>
> > > > > > >>Ron
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
> > > > Hi, Peter, Thanks for your feedback. This work also has your effort, thank
> > > > you very much.
> > > >
> > 
> > Hi, Martijn
> > Thank you very much for the feedback, it was very useful for me.
> > 1. Filesystem abstraction: With regards to remote resources, I agree with you that we should use Flink's FileSytem abstraction to supports all types of file system, including HTTP, S3, HDFS, etc, rather than binding to a specific implementation. Currently in the first version, we will give priority to support HDFS as a resource provider by Flink's FileSytem abstraction. HDFS is used very much.
> > 
> > 2. Security/operations implications: The point you are considering is a great one, security is an issue that needs to be considered. Your starting point is that Jar needs to have some verification done on it before it is used, to avoid some non-secure behavior. However, IMO, the validation of Jar is supposed to be done by the platform side itself, and the platform needs to ensure that users have permission to use the jar and security of Jar. Option is not able to disable the syntax completely, the user can still open it by Set command. I think the most correct approach is the platform to verify rather than the engine side. In addition, the current Connector/UDF/DataStream program also exists using custom Jar case, these Jar will also have security issues, Flink currently does not provide Option to prohibit the use of custom Jar. The user used a custom Jar, which means that the user has permission to do this, then the user should be responsible for the security of the Jar. If it was hacked, it means that there are loopholes in the company's permissions/network and they need to fix these problems. All in all, I agree with you on this point, but Option can't solve this problem.
> > 
> > Best,
> > 
> > Ron