You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Martijn Visser <ma...@apache.org> on 2022/05/04 13:41:44 UTC
Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

Hi Shengkai,

> Agreed. The FLIP mainly focus on the Gateway. I think it's better to
rename the name to the "Support SQL Gateway". WDYT?

+1

> I think it's better to intergate the Gateway into the Flink code base.
The reason behind is
>  1. The Gateway relies on the Flink implementation,  I think we'd better
to maintain it inside the Flink. It really takes us much time to upgrade
the sql-gateway in ververica repo to the latest Flink version.
> 2. The Gateway is important to the Flink itself. Many users needs the
Gateway to manage the Flink SQL jobs. Actually Hive, Spark both have its
Gateway in its code base.

I would like to understand why it's complicated to make the upgrades
problematic. Is it because of relying on internal interfaces? If so, should
we not consider making them public?

A downside I see with integrating the Gateway into the Flink codebase is
that a) it will not be possible to have separate releases of the Gateway,
they will be tied to individual Flink releases and b) if you want the
Gateway to support multiple Flink versions, I can see that becoming
complicated in Flink's release branching and support mechanism. For
example, what if you have a Gateway released with Flink in Flink 1.16 and
Flink 1.17, which both support Flink 1.10 up to their latest version. Then
you encounter a bug in the implementation for Flink 1.12: that means that
you have to create multiple fixes, in multiple branches, and then release
multiple new Flink versions. I don't think that the Gateway is a 'core'
function of Flink which should be included with Flink. There are a lot of
users who use the DataStream, Table or Python as their implementation
layer. They all don't need this extra capability (even though you could
argue that in the future it would be nice to have something similar for
Python).

> Because the Gateway itself relies on the Flink inner implementation...I
think we can just use one Gateway per versions. Users can manage the
gateway with other utils.

I've left my comment above because I was going through each argument one by
one, but this was my assumption already: we should not rely on internal
interfaces for capability we want to support as a community. We should then
make these interfaces public.

> After I read FLIP-91[1], I want to add an init-file option. Its
functionality is the same as option '-i' of Flink SQL Client.

I don't think that an init-file option should be added to the SQL Gateway.
+1 for keeping that in the client, not the Gateway.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Fri, 29 Apr 2022 at 04:26, Shengkai Fang <fs...@gmail.com> wrote:

> Hi Marijn and LuNing.
>
> Thanks for your feedback!
>
> > The FLIP is called "SQL Client Gateway", but isn't this a REST Gateway
> which would be used by Flink's SQL Client (or other applications)?
>
> Agreed. The FLIP mainly focus on the Gateway. I think it's better to rename
> the name to the "Support SQL Gateway". WDYT?
>
> > From a user perspective, I would have expected that we start with the
> REST endpoint before explaining how we would integrate this into Flink. Now
> it's quite hard to first understand what we want to offer to users and if
> that will be sufficient for a first version.
>
> emmm. Considering that api is basically the operation of some concepts, is
> it better to introduce the core concepts first? But I agree you are right
> that we should start with the RESt endpoint. I reorganize the content to
> introduce the REST first in the public interfaces.
>
> > With Flink 1.15, we're introducing an OpenAPI specification. Can we
> also do this straight away for the REST Gateway?
>
> Yes. We will organize the related APIs into OpenAPI specification.
>
> >Should we introduce the REST Gateway as part of Flink's main repository?
> >Wouldn't we be better off to maintain this in a separate repository under
> >ASF?
>
> I think it's better to intergate the Gateway into the Flink code base. The
> reason behind is
>
> 1. The Gateway relies on the Flink implementation,  I think we'd better to
> maintain it inside the Flink. It really takes us much time to upgrade the
> sql-gateway in ververica repo to the latest Flink version.
>
> 2. The Gateway is important to the Flink itself. Many users needs the
> Gateway to manage the Flink SQL jobs. Actually Hive, Spark both have its
> Gateway in its code base.
>
> But I think it's fine to put other utils, e.g. JDBC under the ASF.
>
> > Ideally you would like to be able to support multiple Flink versions
> > with one version of the REST Gateway I think?
>
> > Users can upgrade a large number of Flink jobs versions gradually in a
> Gateway service.
>
> Because the Gateway itself relies on the Flink inner implementation...I
> think we can just use one Gateway per versions. Users can manage the
> gateway with other utils.
>
> >There's no mention of Batch or Streaming in this concept. If I recall
> >correctly, the current Flink SQL Gateway can only support Batch. How will
> >we support Streaming?
>
> > I can imagine that if a user wants to use a REST Gateway, there's also a
> > strong need to combine this with a Catalog.
>
> Yes. I add a section about the Usage of the Gateway. Users can use the SQL
> do everything in the Gateway, including
> - configure the execution parameter, including exectuion mode
> - manage the metadata with DDL, e.g. register catalog
> - submit the job
> ...
>
> >Will there be any requirement with JDBC, as there currently is?
>
> In the FLIP-223, we implement the HiveServer2 endpint. Users can use the
> hive jdbc to connect to the Flink SQL Gateway.
>
> > Shall we name this option `sql-gateway.session.init-file` and write it
> into
> the FLIP-91?
>
> Actually we already supports the -i parameters in the sql client. What's
> more, Hive also supports the -i parameter in the client side[1].
> I think it's fine to move this functionlity to the client rather than
> gateway. WDYT?
>
> [1]
>
> https://github.com/apache/hive/blob/c3fa88a1b7d1475f44383fca913aecf9c664bab0/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L321
>
> Best,
> Shengkai
>
>
>
>
>
> LuNing Wang <wa...@gmail.com> 于2022年4月28日周四 10:04写道：
>
> > > * Should we introduce the REST Gateway as part of Flink's main
> > repository?
> > Wouldn't we be better off to maintain this in a separate repository under
> > ASF? Ideally you would like to be able to support multiple Flink versions
> > with one version of the REST Gateway I think?
> >
> > We would be better off maintaining this in a separate repository. It is
> > important to support multiple Flink versions. Users can upgrade a large
> > number of Flink jobs versions gradually in a Gateway service.
> >
> > LuNing Wang <wa...@gmail.com> 于2022年4月27日周三 17:54写道：
> >
> > > Hi ShengKai,
> > >
> > > After I read FLIP-91[1], I want to add an init-file option. Its
> > > functionality is the same as option '-i' of Flink SQL Client.
> > >
> > > When I use Catalog(HiveCatalog), I need to execute `CREATE CATALOG` by
> > > this option after SQL Gateway starts every time.
> > >
> > > Shall we name this option `sql-gateway.session.init-file` and write it
> > > into the FLIP-91?
> > >
> > > Best regards,
> > >
> > > LuNing Wang
> > >
> > > [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-91
> > >
> > > Martijn Visser <ma...@ververica.com> 于2022年4月26日周二 20:32写道：
> > >
> > >> Hi Shengkai,
> > >>
> > >> Thanks for opening this discussion. I did a first brief pass over the
> > FLIP
> > >> and I have a couple of questions/remarks:
> > >>
> > >> * The FLIP is called "SQL Client Gateway", but isn't this a REST
> Gateway
> > >> which would be used by Flink's SQL Client (or other applications)?
> > >>
> > >> * From a user perspective, I would have expected that we start with
> the
> > >> REST endpoint before explaining how we would integrate this into
> Flink.
> > >> Now
> > >> it's quite hard to first understand what we want to offer to users and
> > if
> > >> that will be sufficient for a first version.
> > >>
> > >> * With Flink 1.15, we're introducing an OpenAPI specification [1]. Can
> > we
> > >> also do this straight away for the REST Gateway?
> > >>
> > >> * Should we introduce the REST Gateway as part of Flink's main
> > repository?
> > >> Wouldn't we be better off to maintain this in a separate repository
> > under
> > >> ASF? Ideally you would like to be able to support multiple Flink
> > versions
> > >> with one version of the REST Gateway I think?
> > >>
> > >> * There's no mention of Batch or Streaming in this concept. If I
> recall
> > >> correctly, the current Flink SQL Gateway can only support Batch. How
> > will
> > >> we support Streaming? Will there be any requirement with JDBC, as
> there
> > >> currently is?
> > >>
> > >> * I can imagine that if a user wants to use a REST Gateway, there's
> > also a
> > >> strong need to combine this with a Catalog. Do you think this should
> be
> > >> part of this FLIP?
> > >>
> > >> Best regards,
> > >>
> > >> Martijn Visser
> > >> https://twitter.com/MartijnVisser82
> > >> https://github.com/MartijnVisser
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobmanager
> > >>
> > >> On Sun, 24 Apr 2022 at 05:29, Shengkai Fang <fs...@gmail.com>
> wrote:
> > >>
> > >> > Hi. Jiang.
> > >> >
> > >> > Thanks for your feedback！
> > >> >
> > >> > > Do the public interfaces of GatewayService refer to any service?
> > >> >
> > >> > We will only expose one GatewayService implementation. We will put
> the
> > >> > interface into the common package and the developer who wants to
> > >> implement
> > >> > a new endpoint can just rely on the interface package rather than
> the
> > >> > implementation.
> > >> >
> > >> > > What's the behavior of SQL Client Gateway working on Yarn or K8S?
> > Does
> > >> > the SQL Client Gateway support application or session mode on Yarn?
> > >> >
> > >> > I think we can support SQL Client Gateway to submit the jobs in
> > >> > application/sesison mode.
> > >> >
> > >> > > Is there any event trigger in the operation state machine?
> > >> >
> > >> > Yes. I have already updated the content and add more details about
> the
> > >> > state machine. During the revise, I found that I mix up the two
> > >> concepts:
> > >> > job submission and job execution. In fact, we only control the
> > >> submission
> > >> > mode at the gateway layer. Therefore, we don't need to mapping the
> > >> > JobStatus here. If the user expects that the synchronization
> behavior
> > >> is to
> > >> > wait for the completion of the job execution before allowing the
> next
> > >> > statement to be executed, then the Operation lifecycle should also
> > >> contains
> > >> > the job's execution, which means users should set `table.dml-sync`.
> > >> >
> > >> > > What's the return schema for the public interfaces of
> > GatewayService?
> > >> > Like getTable interface, what's the return value schema?
> > >> >
> > >> > The API of the GatewayService return the java objects and the
> endpoint
> > >> can
> > >> > organize the objects with expected schema. The return results is
> also
> > >> list
> > >> > the section ComponetAPI#GatewayService#API. The return type of the
> > >> > GatewayService#getTable is `ContextResolvedTable`.
> > >> >
> > >> > > How does the user get the operation log?
> > >> >
> > >> > The OperationManager will register the LogAppender before the
> > Operation
> > >> > execution. The Log Appender will hijack the logger and also write
> the
> > >> log
> > >> > that related to the Operation to another files. When users wants to
> > >> fetch
> > >> > the Operation log, the GatewayService will read the content in the
> > file
> > >> and
> > >> > return.
> > >> >
> > >> > Best,
> > >> > Shengkai
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Nicholas Jiang <ni...@apache.org> 于2022年4月22日周五 16:21写道：
> > >> >
> > >> > > Hi Shengkai.
> > >> > >
> > >> > > Thanks for driving the proposal of SQL Client Gateway. I have some
> > >> > > knowledge of Kyuubi and have some questions about the design:
> > >> > >
> > >> > > 1.Do the public interfaces of GatewayService refer to any service?
> > If
> > >> > > referring to HiveService, does GatewayService need interfaces like
> > >> > > getQueryId etc.
> > >> > >
> > >> > > 2.What's the behavior of SQL Client Gateway working on Yarn or
> K8S?
> > >> Does
> > >> > > the SQL Client Gateway support application or session mode on
> Yarn?
> > >> > >
> > >> > > 3.Is there any event trigger in the operation state machine?
> > >> > >
> > >> > > 4.What's the return schema for the public interfaces of
> > >> GatewayService?
> > >> > > Like getTable interface, what's the return value schema?
> > >> > >
> > >> > > 5.How does the user get the operation log?
> > >> > >
> > >> > > Thanks,
> > >> > > Nicholas Jiang
> > >> > >
> > >> > > On 2022/04/21 06:42:30 Shengkai Fang wrote:
> > >> > > > Hi, Flink developers.
> > >> > > >
> > >> > > > I want to start a discussion about the FLIP-91: Support Flink
> SQL
> > >> > > > Gateway[1]. Flink SQL Gateway is a service that allows users to
> > >> submit
> > >> > > and
> > >> > > > manage their jobs in the online environment with the pluggable
> > >> > endpoints.
> > >> > > > The reason why we introduce the Gateway with pluggable endpoints
> > is
> > >> > that
> > >> > > > many users have their preferences. For example, the HiveServer2
> > >> users
> > >> > > > prefer to use the gateway with HiveServer2-style API, which has
> > >> > numerous
> > >> > > > tools. However, some filnk-native users may prefer to use the
> REST
> > >> API.
> > >> > > > Therefore, we propose the SQL Gateway with pluggable endpoint.
> > >> > > >
> > >> > > > In the FLIP, we also propose the REST endpoint, which has the
> > >> similar
> > >> > > > APIs compared to the gateway in the
> > ververica/flink-sql-gateway[2].
> > >> At
> > >> > > the
> > >> > > > last, we discuss how to use the SQL Client to submit the
> statement
> > >> to
> > >> > the
> > >> > > > Gateway with the REST API.
> > >> > > >
> > >> > > > I am glad that you can give some feedback about FLIP-91.
> > >> > > >
> > >> > > > Best,
> > >> > > > Shengkai
> > >> > > >
> > >> > > > [1]
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >> > > > [2] https://github.com/ververica/flink-sql-gateway
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>