You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by 刘大龙 <ld...@zju.edu.cn> on 2022/04/15 08:35:55 UTC

Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Hi, Jingsong

Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS S3 OSS will be supported.

Best,

Ron

> -----原始邮件-----
> 发件人: "Jingsong Li" <ji...@gmail.com>
> 发送时间: 2022-04-14 17:55:03 (星期四)
> 收件人: dev <de...@flink.apache.org>
> 抄送: 
> 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> I agree with Martijn.
> 
> At least, HDFS S3 OSS should be supported.
> 
> Best,
> Jingsong
> 
> On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com> wrote:
> >
> > Hi Ron,
> >
> > The FLIP mentions that the priority will be set to support HDFS as a
> > resource provider. I'm concerned that we end up with a partially
> > implemented FLIP which only supports local and HDFS and then we move on to
> > other features, as we see happen with others. I would argue that we should
> > not focus on one resource provider, but that at least S3 support is
> > included in the same Flink release as HDFS support is.
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> >
> > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> >
> > > Hi, everyone
> > >
> > > First of all, thanks for the valuable suggestions received about this
> > > FLIP. After some discussion, it looks like all concerns have been addressed
> > > for now, so I will start a vote about this FLIP in two or three days later.
> > > Also, further feedback is very welcome.
> > >
> > > Best,
> > >
> > > Ron
> > >
> > >
> > > > -----原始邮件-----
> > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > 收件人: dev@flink.apache.org
> > > > 抄送:
> > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > > DDL
> > > >
> > > > Hi, Martijn
> > > >
> > > > Do you have any question about this FLIP? looking forward to your more
> > > feedback.
> > > >
> > > > Best,
> > > >
> > > > Ron
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > 收件人: dev@flink.apache.org
> > > > > 抄送:
> > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > 抄送:
> > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > >
> > > > > > Hi Ron,
> > > > > >
> > > > > > Thanks for creating the FLIP. You're talking about both local and
> > > remote
> > > > > > resources. With regards to remote resources, how do you see this
> > > work with
> > > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > > > dependencies are not packaged, but I would hope that we do that for
> > > all
> > > > > > filesystem implementation. I don't think it's a good idea to have
> > > any tight
> > > > > > coupling to file system implementations, especially if at some point
> > > we
> > > > > > could also externalize file system implementations (like we're doing
> > > for
> > > > > > connectors already). I think the FLIP would be better by not only
> > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > generic
> > > > > > term since there are more options than Hadoop.
> > > > > >
> > > > > > I'm also thinking about security/operations implications: would it be
> > > > > > possible for bad actor X to create a JAR that either influences other
> > > > > > running jobs, leaks data or credentials or anything else? If so, I
> > > think it
> > > > > > would also be good to have an option to disable this feature
> > > completely. I
> > > > > > think there are roughly two types of companies who run Flink: those
> > > who
> > > > > > open it up for everyone to use (here the feature would be welcomed)
> > > and
> > > > > > those who need to follow certain minimum standards/have a more
> > > closed Flink
> > > > > > ecosystem). They usually want to validate a JAR upfront before
> > > making it
> > > > > > available, even at the expense of speed, because it gives them more
> > > control
> > > > > > over what will be running in their environment.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Martijn Visser
> > > > > > https://twitter.com/MartijnVisser82
> > > > > >
> > > > > >
> > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----原始邮件-----
> > > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > 抄送:
> > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > >
> > > > > > > > Hi Ron,
> > > > > > > >
> > > > > > > > Thanks for reviving the discussion of the work. The design looks
> > > good. A
> > > > > > > > small typo in the FLIP is that currently it is marked as
> > > released in
> > > > > > > 1.16.
> > > > > > > >
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Peter Huang
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zh...@163.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > hi Yuxia,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Since we download the file to the local, remember to clean it
> > > up when
> > > > > > > the
> > > > > > > > > flink client exits
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Mang Zhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > > users will
> > > > > > > > > benefit from it. The flip looks good to me. I just have two
> > > minor
> > > > > > > questions:
> > > > > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > > self-dedescriptive for actually it's not a random name but the
> > > name of
> > > > > > > the
> > > > > > > > > class that provides the implementation for function to be
> > > create.
> > > > > > > > > >May be it'll be more clear to use "class_name" replace
> > > "identifier"
> > > > > > > just
> > > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > > >
> > > > > > > > > >2.  >> If the resource used is a remote resource, it will
> > > first
> > > > > > > download
> > > > > > > > > the resource to a local temporary directory, which will be
> > > generated
> > > > > > > using
> > > > > > > > > UUID, and then register the local path to the user class
> > > loader.
> > > > > > > > > >For the above explanation in this FLIP, It seems for such
> > > statement
> > > > > > > sets,
> > > > > > > > > >""
> > > > > > > > > >Create  function as org.apache.udf1 using jar
> > > 'hdfs://myudfs.jar';
> > > > > > > > > >Create  function as org.apache.udf2 using jar
> > > 'hdfs://myudfs.jar';
> > > > > > > > > >""
> > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice.
> > > So is it
> > > > > > > > > possible to provide some cache mechanism that we won't need to
> > > > > > > download /
> > > > > > > > > store for twice?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >Best regards,
> > > > > > > > > >Yuxia
> > > > > > > > > >[1]
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > > >[2]
> > > > > > > > >
> > > > > > >
> > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > > >
> > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is so
> > > good.
> > > > > > > > > >In our company, when users use custom UDF, it is very
> > > inconvenient,
> > > > > > > and
> > > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > > >and cannot refer to the existing udf jar through the existing
> > > udf jar.
> > > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > > >If we implement this feature, users can focus on their own
> > > business
> > > > > > > > > development.
> > > > > > > > > >I can also contribute if needed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >--
> > > > > > > > > >
> > > > > > > > > >Best regards,
> > > > > > > > > >Mang Zhang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > > >>Hi, everyone
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>I would like to open a discussion for support advanced
> > > Function DDL,
> > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > > Function DDL
> > > > > > > is
> > > > > > > > > defined. Until now it is partially released as the Flink
> > > function DDL
> > > > > > > with
> > > > > > > > > user defined resources is not clearly discussed and
> > > implemented. It is
> > > > > > > an
> > > > > > > > > important feature for support to register UDF with custom jar
> > > resource,
> > > > > > > > > users can use UDF more more easily without having to put jars
> > > under the
> > > > > > > > > classpath in advance.
> > > > > > > > > >>
> > > > > > > > > >>Looking forward to your feedback.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>[1]
> > > > > > > > >
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>Best,
> > > > > > > > > >>
> > > > > > > > > >>Ron
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > > effort, thank
> > > > > > > you very much.
> > > > > > >
> > > > >
> > > > > Hi, Martijn
> > > > > Thank you very much for the feedback, it was very useful for me.
> > > > > 1. Filesystem abstraction: With regards to remote resources, I agree
> > > with you that we should use Flink's FileSytem abstraction to supports all
> > > types of file system, including HTTP, S3, HDFS, etc, rather than binding to
> > > a specific implementation. Currently in the first version, we will give
> > > priority to support HDFS as a resource provider by Flink's FileSytem
> > > abstraction. HDFS is used very much.
> > > > >
> > > > > 2. Security/operations implications: The point you are considering is
> > > a great one, security is an issue that needs to be considered. Your
> > > starting point is that Jar needs to have some verification done on it
> > > before it is used, to avoid some non-secure behavior. However, IMO, the
> > > validation of Jar is supposed to be done by the platform side itself, and
> > > the platform needs to ensure that users have permission to use the jar and
> > > security of Jar. Option is not able to disable the syntax completely, the
> > > user can still open it by Set command. I think the most correct approach is
> > > the platform to verify rather than the engine side. In addition, the
> > > current Connector/UDF/DataStream program also exists using custom Jar case,
> > > these Jar will also have security issues, Flink currently does not provide
> > > Option to prohibit the use of custom Jar. The user used a custom Jar, which
> > > means that the user has permission to do this, then the user should be
> > > responsible for the security of the Jar. If it was hacked, it means that
> > > there are loopholes in the company's permissions/network and they need to
> > > fix these problems. All in all, I agree with you on this point, but Option
> > > can't solve this problem.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > >

Re: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by 刘大龙 <ld...@zju.edu.cn>.
Thanks for all discuss about this FLIP again. I will open a vote tomorrow.

Best,
Ron


> -----原始邮件-----
> 发件人: "Jark Wu" <im...@gmail.com>
> 发送时间: 2022-04-19 16:03:22 (星期二)
> 收件人: dev <de...@flink.apache.org>
> 抄送: 
> 主题: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Thank Ron for updating the FLIP.
> 
> I think the updated FLIP has addressed Martijn's concern.
> I don't have other feedback. So +1 for a vote.
> 
> Best,
> Jark
> 
> On Fri, 15 Apr 2022 at 16:36, 刘大龙 <ld...@zju.edu.cn> wrote:
> 
> > Hi, Jingsong
> >
> > Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS
> > S3 OSS will be supported.
> >
> > Best,
> >
> > Ron
> >
> > > -----原始邮件-----
> > > 发件人: "Jingsong Li" <ji...@gmail.com>
> > > 发送时间: 2022-04-14 17:55:03 (星期四)
> > > 收件人: dev <de...@flink.apache.org>
> > > 抄送:
> > > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function DDL
> > >
> > > I agree with Martijn.
> > >
> > > At least, HDFS S3 OSS should be supported.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com>
> > wrote:
> > > >
> > > > Hi Ron,
> > > >
> > > > The FLIP mentions that the priority will be set to support HDFS as a
> > > > resource provider. I'm concerned that we end up with a partially
> > > > implemented FLIP which only supports local and HDFS and then we move
> > on to
> > > > other features, as we see happen with others. I would argue that we
> > should
> > > > not focus on one resource provider, but that at least S3 support is
> > > > included in the same Flink release as HDFS support is.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn Visser
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > >
> > > > > Hi, everyone
> > > > >
> > > > > First of all, thanks for the valuable suggestions received about this
> > > > > FLIP. After some discussion, it looks like all concerns have been
> > addressed
> > > > > for now, so I will start a vote about this FLIP in two or three days
> > later.
> > > > > Also, further feedback is very welcome.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > > > 收件人: dev@flink.apache.org
> > > > > > 抄送:
> > > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function
> > > > > DDL
> > > > > >
> > > > > > Hi, Martijn
> > > > > >
> > > > > > Do you have any question about this FLIP? looking forward to your
> > more
> > > > > feedback.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Ron
> > > > > >
> > > > > >
> > > > > > > -----原始邮件-----
> > > > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > > > 收件人: dev@flink.apache.org
> > > > > > > 抄送:
> > > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function DDL
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----原始邮件-----
> > > > > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > 抄送:
> > > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > > > > > > >
> > > > > > > > Hi Ron,
> > > > > > > >
> > > > > > > > Thanks for creating the FLIP. You're talking about both local
> > and
> > > > > remote
> > > > > > > > resources. With regards to remote resources, how do you see
> > this
> > > > > work with
> > > > > > > > Flink's filesystem abstraction? I did read in the FLIP that
> > Hadoop
> > > > > > > > dependencies are not packaged, but I would hope that we do
> > that for
> > > > > all
> > > > > > > > filesystem implementation. I don't think it's a good idea to
> > have
> > > > > any tight
> > > > > > > > coupling to file system implementations, especially if at some
> > point
> > > > > we
> > > > > > > > could also externalize file system implementations (like we're
> > doing
> > > > > for
> > > > > > > > connectors already). I think the FLIP would be better by not
> > only
> > > > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > > > generic
> > > > > > > > term since there are more options than Hadoop.
> > > > > > > >
> > > > > > > > I'm also thinking about security/operations implications:
> > would it be
> > > > > > > > possible for bad actor X to create a JAR that either
> > influences other
> > > > > > > > running jobs, leaks data or credentials or anything else? If
> > so, I
> > > > > think it
> > > > > > > > would also be good to have an option to disable this feature
> > > > > completely. I
> > > > > > > > think there are roughly two types of companies who run Flink:
> > those
> > > > > who
> > > > > > > > open it up for everyone to use (here the feature would be
> > welcomed)
> > > > > and
> > > > > > > > those who need to follow certain minimum standards/have a more
> > > > > closed Flink
> > > > > > > > ecosystem). They usually want to validate a JAR upfront before
> > > > > making it
> > > > > > > > available, even at the expense of speed, because it gives them
> > more
> > > > > control
> > > > > > > > over what will be running in their environment.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Martijn Visser
> > > > > > > > https://twitter.com/MartijnVisser82
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----原始邮件-----
> > > > > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > > > 抄送:
> > > > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > > > > > > > > >
> > > > > > > > > > Hi Ron,
> > > > > > > > > >
> > > > > > > > > > Thanks for reviving the discussion of the work. The design
> > looks
> > > > > good. A
> > > > > > > > > > small typo in the FLIP is that currently it is marked as
> > > > > released in
> > > > > > > > > 1.16.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best Regards
> > > > > > > > > > Peter Huang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <
> > zhangmang1@163.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > hi Yuxia,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Since we download the file to the local, remember to
> > clean it
> > > > > up when
> > > > > > > > > the
> > > > > > > > > > > flink client exits
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Mang Zhang
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > > > > users will
> > > > > > > > > > > benefit from it. The flip looks good to me. I just have
> > two
> > > > > minor
> > > > > > > > > questions:
> > > > > > > > > > > >1. For synax explanation, I see it's "Create ....
> > function as
> > > > > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > > > > self-dedescriptive for actually it's not a random name
> > but the
> > > > > name of
> > > > > > > > > the
> > > > > > > > > > > class that provides the implementation for function to be
> > > > > create.
> > > > > > > > > > > >May be it'll be more clear to use "class_name" replace
> > > > > "identifier"
> > > > > > > > > just
> > > > > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > > > > >
> > > > > > > > > > > >2.  >> If the resource used is a remote resource, it
> > will
> > > > > first
> > > > > > > > > download
> > > > > > > > > > > the resource to a local temporary directory, which will
> > be
> > > > > generated
> > > > > > > > > using
> > > > > > > > > > > UUID, and then register the local path to the user class
> > > > > loader.
> > > > > > > > > > > >For the above explanation in this FLIP, It seems for
> > such
> > > > > statement
> > > > > > > > > sets,
> > > > > > > > > > > >""
> > > > > > > > > > > >Create  function as org.apache.udf1 using jar
> > > > > 'hdfs://myudfs.jar';
> > > > > > > > > > > >Create  function as org.apache.udf2 using jar
> > > > > 'hdfs://myudfs.jar';
> > > > > > > > > > > >""
> > > > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for
> > twice.
> > > > > So is it
> > > > > > > > > > > possible to provide some cache mechanism that we won't
> > need to
> > > > > > > > > download /
> > > > > > > > > > > store for twice?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >Best regards,
> > > > > > > > > > > >Yuxia
> > > > > > > > > > > >[1]
> > > > > > > > >
> > > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > > > > >[2]
> > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > > > > >
> > > > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is
> > so
> > > > > good.
> > > > > > > > > > > >In our company, when users use custom UDF, it is very
> > > > > inconvenient,
> > > > > > > > > and
> > > > > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > > > > >and cannot refer to the existing udf jar through the
> > existing
> > > > > udf jar.
> > > > > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > > > > >If we implement this feature, users can focus on their
> > own
> > > > > business
> > > > > > > > > > > development.
> > > > > > > > > > > >I can also contribute if needed.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >--
> > > > > > > > > > > >
> > > > > > > > > > > >Best regards,
> > > > > > > > > > > >Mang Zhang
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > > > > >>Hi, everyone
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>I would like to open a discussion for support advanced
> > > > > Function DDL,
> > > > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > > > > Function DDL
> > > > > > > > > is
> > > > > > > > > > > defined. Until now it is partially released as the Flink
> > > > > function DDL
> > > > > > > > > with
> > > > > > > > > > > user defined resources is not clearly discussed and
> > > > > implemented. It is
> > > > > > > > > an
> > > > > > > > > > > important feature for support to register UDF with
> > custom jar
> > > > > resource,
> > > > > > > > > > > users can use UDF more more easily without having to put
> > jars
> > > > > under the
> > > > > > > > > > > classpath in advance.
> > > > > > > > > > > >>
> > > > > > > > > > > >>Looking forward to your feedback.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>[1]
> > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>Best,
> > > > > > > > > > > >>
> > > > > > > > > > > >>Ron
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > > > > effort, thank
> > > > > > > > > you very much.
> > > > > > > > >
> > > > > > >
> > > > > > > Hi, Martijn
> > > > > > > Thank you very much for the feedback, it was very useful for me.
> > > > > > > 1. Filesystem abstraction: With regards to remote resources, I
> > agree
> > > > > with you that we should use Flink's FileSytem abstraction to
> > supports all
> > > > > types of file system, including HTTP, S3, HDFS, etc, rather than
> > binding to
> > > > > a specific implementation. Currently in the first version, we will
> > give
> > > > > priority to support HDFS as a resource provider by Flink's FileSytem
> > > > > abstraction. HDFS is used very much.
> > > > > > >
> > > > > > > 2. Security/operations implications: The point you are
> > considering is
> > > > > a great one, security is an issue that needs to be considered. Your
> > > > > starting point is that Jar needs to have some verification done on it
> > > > > before it is used, to avoid some non-secure behavior. However, IMO,
> > the
> > > > > validation of Jar is supposed to be done by the platform side
> > itself, and
> > > > > the platform needs to ensure that users have permission to use the
> > jar and
> > > > > security of Jar. Option is not able to disable the syntax
> > completely, the
> > > > > user can still open it by Set command. I think the most correct
> > approach is
> > > > > the platform to verify rather than the engine side. In addition, the
> > > > > current Connector/UDF/DataStream program also exists using custom
> > Jar case,
> > > > > these Jar will also have security issues, Flink currently does not
> > provide
> > > > > Option to prohibit the use of custom Jar. The user used a custom
> > Jar, which
> > > > > means that the user has permission to do this, then the user should
> > be
> > > > > responsible for the security of the Jar. If it was hacked, it means
> > that
> > > > > there are loopholes in the company's permissions/network and they
> > need to
> > > > > fix these problems. All in all, I agree with you on this point, but
> > Option
> > > > > can't solve this problem.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Ron
> > > > >
> >

Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

Posted by Jark Wu <im...@gmail.com>.
Thank Ron for updating the FLIP.

I think the updated FLIP has addressed Martijn's concern.
I don't have other feedback. So +1 for a vote.

Best,
Jark

On Fri, 15 Apr 2022 at 16:36, 刘大龙 <ld...@zju.edu.cn> wrote:

> Hi, Jingsong
>
> Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS
> S3 OSS will be supported.
>
> Best,
>
> Ron
>
> > -----原始邮件-----
> > 发件人: "Jingsong Li" <ji...@gmail.com>
> > 发送时间: 2022-04-14 17:55:03 (星期四)
> > 收件人: dev <de...@flink.apache.org>
> > 抄送:
> > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> Function DDL
> >
> > I agree with Martijn.
> >
> > At least, HDFS S3 OSS should be supported.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser <ma...@ververica.com>
> wrote:
> > >
> > > Hi Ron,
> > >
> > > The FLIP mentions that the priority will be set to support HDFS as a
> > > resource provider. I'm concerned that we end up with a partially
> > > implemented FLIP which only supports local and HDFS and then we move
> on to
> > > other features, as we see happen with others. I would argue that we
> should
> > > not focus on one resource provider, but that at least S3 support is
> > > included in the same Flink release as HDFS support is.
> > >
> > > Best regards,
> > >
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > > https://github.com/MartijnVisser
> > >
> > >
> > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 <ld...@zju.edu.cn> wrote:
> > >
> > > > Hi, everyone
> > > >
> > > > First of all, thanks for the valuable suggestions received about this
> > > > FLIP. After some discussion, it looks like all concerns have been
> addressed
> > > > for now, so I will start a vote about this FLIP in two or three days
> later.
> > > > Also, further feedback is very welcome.
> > > >
> > > > Best,
> > > >
> > > > Ron
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > > 收件人: dev@flink.apache.org
> > > > > 抄送:
> > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> Function
> > > > DDL
> > > > >
> > > > > Hi, Martijn
> > > > >
> > > > > Do you have any question about this FLIP? looking forward to your
> more
> > > > feedback.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > > > >
> > > > >
> > > > > > -----原始邮件-----
> > > > > > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > > 收件人: dev@flink.apache.org
> > > > > > 抄送:
> > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> Function DDL
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----原始邮件-----
> > > > > > > 发件人: "Martijn Visser" <ma...@ververica.com>
> > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > 抄送:
> > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> DDL
> > > > > > >
> > > > > > > Hi Ron,
> > > > > > >
> > > > > > > Thanks for creating the FLIP. You're talking about both local
> and
> > > > remote
> > > > > > > resources. With regards to remote resources, how do you see
> this
> > > > work with
> > > > > > > Flink's filesystem abstraction? I did read in the FLIP that
> Hadoop
> > > > > > > dependencies are not packaged, but I would hope that we do
> that for
> > > > all
> > > > > > > filesystem implementation. I don't think it's a good idea to
> have
> > > > any tight
> > > > > > > coupling to file system implementations, especially if at some
> point
> > > > we
> > > > > > > could also externalize file system implementations (like we're
> doing
> > > > for
> > > > > > > connectors already). I think the FLIP would be better by not
> only
> > > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > > generic
> > > > > > > term since there are more options than Hadoop.
> > > > > > >
> > > > > > > I'm also thinking about security/operations implications:
> would it be
> > > > > > > possible for bad actor X to create a JAR that either
> influences other
> > > > > > > running jobs, leaks data or credentials or anything else? If
> so, I
> > > > think it
> > > > > > > would also be good to have an option to disable this feature
> > > > completely. I
> > > > > > > think there are roughly two types of companies who run Flink:
> those
> > > > who
> > > > > > > open it up for everyone to use (here the feature would be
> welcomed)
> > > > and
> > > > > > > those who need to follow certain minimum standards/have a more
> > > > closed Flink
> > > > > > > ecosystem). They usually want to validate a JAR upfront before
> > > > making it
> > > > > > > available, even at the expense of speed, because it gives them
> more
> > > > control
> > > > > > > over what will be running in their environment.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Martijn Visser
> > > > > > > https://twitter.com/MartijnVisser82
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----原始邮件-----
> > > > > > > > > 发件人: "Peter Huang" <hu...@gmail.com>
> > > > > > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > > > > > 收件人: dev <de...@flink.apache.org>
> > > > > > > > > 抄送:
> > > > > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> DDL
> > > > > > > > >
> > > > > > > > > Hi Ron,
> > > > > > > > >
> > > > > > > > > Thanks for reviving the discussion of the work. The design
> looks
> > > > good. A
> > > > > > > > > small typo in the FLIP is that currently it is marked as
> > > > released in
> > > > > > > > 1.16.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best Regards
> > > > > > > > > Peter Huang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <
> zhangmang1@163.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > hi Yuxia,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks for your reply. Your reminder is very important !
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Since we download the file to the local, remember to
> clean it
> > > > up when
> > > > > > > > the
> > > > > > > > > > flink client exits
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Mang Zhang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > > > > > <lu...@alibaba-inc.com.INVALID> wrote:
> > > > > > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive
> > > > users will
> > > > > > > > > > benefit from it. The flip looks good to me. I just have
> two
> > > > minor
> > > > > > > > questions:
> > > > > > > > > > >1. For synax explanation, I see it's "Create ....
> function as
> > > > > > > > > > identifier....", I think the word "identifier" may not be
> > > > > > > > > > self-dedescriptive for actually it's not a random name
> but the
> > > > name of
> > > > > > > > the
> > > > > > > > > > class that provides the implementation for function to be
> > > > create.
> > > > > > > > > > >May be it'll be more clear to use "class_name" replace
> > > > "identifier"
> > > > > > > > just
> > > > > > > > > > like what Hive[1]/Spark[2] do.
> > > > > > > > > > >
> > > > > > > > > > >2.  >> If the resource used is a remote resource, it
> will
> > > > first
> > > > > > > > download
> > > > > > > > > > the resource to a local temporary directory, which will
> be
> > > > generated
> > > > > > > > using
> > > > > > > > > > UUID, and then register the local path to the user class
> > > > loader.
> > > > > > > > > > >For the above explanation in this FLIP, It seems for
> such
> > > > statement
> > > > > > > > sets,
> > > > > > > > > > >""
> > > > > > > > > > >Create  function as org.apache.udf1 using jar
> > > > 'hdfs://myudfs.jar';
> > > > > > > > > > >Create  function as org.apache.udf2 using jar
> > > > 'hdfs://myudfs.jar';
> > > > > > > > > > >""
> > > > > > > > > > > it'll download the resource 'hdfs://myudfs.jar' for
> twice.
> > > > So is it
> > > > > > > > > > possible to provide some cache mechanism that we won't
> need to
> > > > > > > > download /
> > > > > > > > > > store for twice?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >Best regards,
> > > > > > > > > > >Yuxia
> > > > > > > > > > >[1]
> > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > > > > > >[2]
> > > > > > > > > >
> > > > > > > >
> > > >
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > > > > > >发件人:Mang Zhang<zh...@163.com>
> > > > > > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > > > > > >收件人:<de...@flink.apache.org>
> > > > > > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > > > > > >
> > > > > > > > > > >Hi Ron, Thank you so much for this suggestion, this is
> so
> > > > good.
> > > > > > > > > > >In our company, when users use custom UDF, it is very
> > > > inconvenient,
> > > > > > > > and
> > > > > > > > > > the code needs to be packaged into the job jar,
> > > > > > > > > > >and cannot refer to the existing udf jar through the
> existing
> > > > udf jar.
> > > > > > > > > > >Or pass in the jar reference in the startup command.
> > > > > > > > > > >If we implement this feature, users can focus on their
> own
> > > > business
> > > > > > > > > > development.
> > > > > > > > > > >I can also contribute if needed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >--
> > > > > > > > > > >
> > > > > > > > > > >Best regards,
> > > > > > > > > > >Mang Zhang
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > > > > > >>Hi, everyone
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>I would like to open a discussion for support advanced
> > > > Function DDL,
> > > > > > > > > > this proposal is a continuation of FLIP-79 in which Flink
> > > > Function DDL
> > > > > > > > is
> > > > > > > > > > defined. Until now it is partially released as the Flink
> > > > function DDL
> > > > > > > > with
> > > > > > > > > > user defined resources is not clearly discussed and
> > > > implemented. It is
> > > > > > > > an
> > > > > > > > > > important feature for support to register UDF with
> custom jar
> > > > resource,
> > > > > > > > > > users can use UDF more more easily without having to put
> jars
> > > > under the
> > > > > > > > > > classpath in advance.
> > > > > > > > > > >>
> > > > > > > > > > >>Looking forward to your feedback.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>[1]
> > > > > > > > > >
> > > > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>Best,
> > > > > > > > > > >>
> > > > > > > > > > >>Ron
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Hi, Peter, Thanks for your feedback. This work also has your
> > > > effort, thank
> > > > > > > > you very much.
> > > > > > > >
> > > > > >
> > > > > > Hi, Martijn
> > > > > > Thank you very much for the feedback, it was very useful for me.
> > > > > > 1. Filesystem abstraction: With regards to remote resources, I
> agree
> > > > with you that we should use Flink's FileSytem abstraction to
> supports all
> > > > types of file system, including HTTP, S3, HDFS, etc, rather than
> binding to
> > > > a specific implementation. Currently in the first version, we will
> give
> > > > priority to support HDFS as a resource provider by Flink's FileSytem
> > > > abstraction. HDFS is used very much.
> > > > > >
> > > > > > 2. Security/operations implications: The point you are
> considering is
> > > > a great one, security is an issue that needs to be considered. Your
> > > > starting point is that Jar needs to have some verification done on it
> > > > before it is used, to avoid some non-secure behavior. However, IMO,
> the
> > > > validation of Jar is supposed to be done by the platform side
> itself, and
> > > > the platform needs to ensure that users have permission to use the
> jar and
> > > > security of Jar. Option is not able to disable the syntax
> completely, the
> > > > user can still open it by Set command. I think the most correct
> approach is
> > > > the platform to verify rather than the engine side. In addition, the
> > > > current Connector/UDF/DataStream program also exists using custom
> Jar case,
> > > > these Jar will also have security issues, Flink currently does not
> provide
> > > > Option to prohibit the use of custom Jar. The user used a custom
> Jar, which
> > > > means that the user has permission to do this, then the user should
> be
> > > > responsible for the security of the Jar. If it was hacked, it means
> that
> > > > there are loopholes in the company's permissions/network and they
> need to
> > > > fix these problems. All in all, I agree with you on this point, but
> Option
> > > > can't solve this problem.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Ron
> > > >
>