You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by yu zelin <yu...@gmail.com> on 2022/12/02 08:09:16 UTC

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Hi Jim,

Thanks for your feedback!

> Should this configuration be mentioned in the FLIP?

Sure. 

> some way for the server to be able to limit the number of requests it receives.
I’m sorry that this FLIP is dedicated in implementing the Remote mode, so we
didn't consider much about this. I think the option is enough currently.  I will add 
the improvement suggestions to the ‘Future Work’.

> I wonder if two other options are possible

To forward the raw format to gateway and then to client is possible. The raw 
results from sink is in ‘CollectResultIterator#bufferedResult’. First, we can find 
a way to get this result without wrapping it. Second, constructing a ‘InternalTypeInfo’.
We can construct it using the schema information (data’s logical type). After 
construction, we can get the ’TypeSerializer’ to deserialize the raw result.




> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
> 
> Hi Yu,
> 
> Thanks for moving my comments to this thread!  Also, thank you for
> answering my questions; it is helping me understand the SQL Gateway
> better.
> 
> 5.
>> Our idea is to introduce a new session option (like
> 'sql-client.result.fetch-interval') to control
> the fetching requests sending frequency. What do you think?
> 
> Should this configuration be mentioned in the FLIP?
> 
> One slight concern I have with having 'sql-client.result.fetch-interval' as
> a session configuration is that users could set it low and cause the client
> to send a large volume of requests to the SQL gateway.
> 
> Generally, I'd like to see some way for the server to be able to limit the
> number of requests it receives.  If that really needs to be done by a proxy
> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> think my concern here should be blocking in any way.)
> 
> 7.
>> What is the serialization lifecycle for results?
> 
> I wonder if two other options are possible:
> 3) Could the Gateway just forward the result byte array?  (Or does the
> Gateway need to deserialize the response in order to understand it for some
> reason?)
> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> the Client read the format which the JobManager sends?)
> 
> Thanks again!
> 
> Cheers,
> 
> Jim
> 
> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
> 
>> Hi, all
>> 
>> Thanks Jim’s questions below. Here I’d like to reply to them.
>> 
>>>  1. For the Client Parser, is it going to work with the extended syntax
>>>  from the Flink Table Store?
>>> 
>>>  2. Relatedly, what will happen if an older Client tries to handle
>> syntax
>>>  that a newer service supports?  (Suppose I use a 1.17 client with a
>> 1.18
>>>  Gateway/system which has a new keyword.  Is there anything we should be
>>>  designing for upfront?)
>>> 
>>>  3. How will client and server version mismatches be handled?  Will a
>>>  single gateway be able to support multiple endpoint versions?
>>>  4. How are commands which change a session handled?  Are those sent via
>>>  an ExecuteStatementRequest?
>>> 
>>>  5. The remote POC uses polling for getting back status and getting back
>>>  results.  Would it be possible to switch to web sockets or some other
>>>  mechanism to avoid polling?  If polling is used for both, the polling
>>>  frequency should be different between local and remote configurations.
>>> 
>>>  6. What does this sentence mean?  "The reason why we didn't get the sql
>>>  type in client side is because it's hard for the lightweight
>> client-level
>>>  parser to recognize some sql type  sql, such as query with CTE.  "
>>> 
>>>  7. What is the serialization lifecycle for results?  It makes sense to
>>>  have some control over whether the gateway returns results as SQL or
>> JSON.
>>>  I'd love to see a way to avoid needing to serialize and deserialize
>> results
>>>  on the SQL Gateway if possible.  I'm still new enough to the project
>> that
>>>  I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>> return
>>>  type can be sent as part of the request so that the JobManager can send
>>>  back results in an advantageous format?
>>> 
>>>  8. Does ErrorType need to be marked as @PublicEvolving?
>>> 
>>> I'm excited for the SQL client to support gateway mode!  Given the change
>>> in design, do you think it'll still be part of the Flink 1.17 release?
>> 
>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is because
>> if the
>> sql type is not recognized, the sql will be submitted to the gateway
>> directly.
>> 
>> For more information: Actually, the proposed ClientParser only do two
>> things:
>> (1) Tell client commands (help, clear, etc) and sqls apart.
>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print raw
>> string
>> for the SHOW CREATE result instead of table). Here the recognization of
>> sql types
>> mostly affects the print style, and unrecognized sql also can be submitted
>> to cluster.
>> So the Client with new ClientParser can work compatible with new syntax.
>> 
>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>> is two things.
>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>> mentioned
>> in #1, the sql statements which syntax is unknown will be submitted to the
>> gateway,
>> and whether they can be executed normally depends on whether the execution
>> environment supports the syntax.
>> 
>>> Is there anything we should be designing for upfront?
>> 
>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>> gateway APIs.
>> 
>> 3.
>>> How will client and server version mismatches be handled?
>> 
>> A lower version client can work compatible with a higher version gateway
>> because the
>> old interfaces won’t be deleted. When a higher version client connects to
>> a lower version
>> gateway, the client should notify the users if they try to use unsupported
>> features. For
>> example, the client start option ‘-i’  means using initialization file to
>> initialize the session.
>> We plan to use the gateway’s ‘configureSession’ to implement it. But this
>> API is not
>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>> user try to
>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>> should tell the user that
>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>> 
>>> Will a single gateway be able to support multiple endpoint versions?
>> 
>> Currently, the gateway only starts a highest version endpoint and the
>> higher version endpoint
>> is compatible with the lower version endpoint’s protocol.
>> 
>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the session
>> configuration.
>> Notice: the client can’t change the session (I mean, close current session
>> and open another
>> one). I’m not sure if you have need to change the session itself?
>> 
>> 5.
>>> Would it be possible to switch to web sockets or some other mechanism
>> to avoid polling?
>> 
>> Your suggestion is very good, but this flip is for supporting the remote
>> client. How about taking
>> it as a future work?
>> 
>>> If polling is used for both, the polling frequency should be different
>> between local and remote
>> configurations.
>> 
>> Our idea is to introduce a new session option (like
>> 'sql-client.result.fetch-interval') to control
>> the fetching requests sending frequency. What do you think?
>> 
>> For more information: we are inclined to keep the polling behavior in this
>> version. For streaming
>> query, fetching results synchronously may occupy resources of the gateway
>> in a long period.
>> For example, if the job doesn’t return results for a long time because the
>> window has not been
>> triggered, the synchronously fetching will keep occupying the connection.
>> In asynchronous
>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>> the resources
>> for other clients to use. I think we can make some improvements for the
>> whole flow path in the
>> future.
>> 
>> 6. Sorry for that there is mistakes in this sentence. Let me make it clear.
>> 
>> We proposed to add 'ContentType' to indicates the result is for what kind
>> of sql. In this sentence,
>> I want to explain why we add 'ContentType' since the ClientParser can
>> recognize the sql type too.
>> It is because the proposed ClientParser can't recognize complex syntax.
>> For example, it can’t
>> recognize query with CTE. So the result should carry content type
>> information to help the client to
>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>> the result is for a
>> query statement.
>> 
>> 7.
>>> What is the serialization lifecycle for results?
>> 
>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>> format)
>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>> 
>>> Maybe the SQL Gateway's return type can be sent as part of the request
>> so that the
>> JobManager can send  back results in an advantageous format?
>> 
>> Yes. I think it's an improvement for the Client and Gateway. We have some
>> ideas. For example,
>> 
>> 1) We can move the Gateway into the JobManager and reduce the Ser/De costs
>> from JM to Gateway.
>> 2) Or the Gateway can collect the data from the sink function directly
>> instead of JobManager.
>> 
>> But I think we can leave this as a future work and discuss in another
>> thread.
>> 
>> 8. Yes.
>> 
>>> Do you think it'll still be part of the Flink 1.17 release?
>> Yes. We will try our best to finish the work.
>> 
>> Feel free to talk to me if I’m wrong or you have any other questions.
>> 
>> 
>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>>> 
>>> Hi, all
>>> 
>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>> Client Based on SQL Gateway[1].
>>> The motivation of this FLIP is that the current SQL Client allows only
>> local connection which can not satisfy
>>> the common need of connecting to a remote cluster.
>>> 
>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>> implement the Remote SQL Client
>>> based on SQL Gateway. In our design, we proposed two main changes:
>>> 
>>> 1. New remote mode client which performs connection to the remote
>> gateway through REST API.
>>> 2. Migration of the current local mode client. We proposed to refactor
>> the local client based on SQL Gateway
>>>   to unify the interface for two modes.
>>> 
>>> Looking forward to your suggestions.
>>> 
>>> Best,
>>> Yu Zelin
>>> 
>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>> 
>> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by Shengkai Fang <fs...@gmail.com>.
Hi, zelin.

Thanks for your update. LGTM.

Best,
Shengkai

yu zelin <yu...@gmail.com> 于2022年12月20日周二 11:00写道:

> Hi all,
>
> Recently I have received some feedbacks about the REST Endpoint
> modification. The main point
> is use ‘ResultSet’ as a part of FetchResultsResponseBody’ is not
> convenient for serialization and
> deserialization. So I think it’s better to introduce a new ‘ResultInfo’ to
> carry the data. The ‘ResultInfo’
> will carry the row format information and be serialized and deserialized
> according to the row format.
>
> In FLIP, I have modified the Section: Public Interface -> REST Endpoint
> Modification. The main
> change is the ‘FetchJsonFormatResultsResponseBody' and
> ‘FetchPlainTextResultsResponseBody’
> was deleted and the ‘overview of fetching results REST API’ was minor
> modified.
>
> Best,
> Yu Zelin
>
> > 2022年12月13日 17:13,yu zelin <yu...@gmail.com> 写道:
> >
> > Hi everyone,
> >
> > Sorry for the incorrect message in my last email. I want to start the
> vote on Wednesday
> > as long as there are no questions in this period.
> >
> > Best,
> > Yu Zelin
> >
> > On Tue, Dec 13, 2022 at 5:08 PM yu zelin <yuzelin.yzl@gmail.com <mailto:
> yuzelin.yzl@gmail.com>> wrote:
> >> Hi, everyone,
> >>
> >> Looks like our new design is similar to Timo’s suggestion, and
> considering that there has
> >> no response from other devs for a long time, I want to start the vote
> on Thursday.
> >>
> >>
> >> Best,
> >> Yu Zelin
> >>
> >>> 2022年12月13日 16:23,yu zelin <yuzelin.yzl@gmail.com <mailto:
> yuzelin.yzl@gmail.com>> 写道:
> >>>
> >>> Hi, Timo,
> >>>
> >>> Thanks for your suggestion. Recently I have discussed with @Godfrey
> He, @Shengkai Fang
> >>> and @Jark Wu about the `RowFormat` (Thanks for all your suggestions).
> We finally came to
> >>> a consensus which is similar to your suggestion. The details are as
> follows:
> >>>
> >>> 1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell
> the REST Endpoint
> >>> how to deserialize the RowData int ResultSet.
> >>>
> >>>     JSON format means the RowData will be serialized to JSON format,
> which contains original
> >>>     LogicalType information, so it can be deserialized back to RowData.
> >>>
> >>>     PLAIN_TEXT format means the RowData will be serialized to
> SQL-compliant, plain strings.
> >>>     The SQL Client can print the strings directly.
> >>>
> >>> The example URI for fetching results is:
> >>> >
> /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT
> >>>
> >>> 2. Introduce two response bodies for fetching results in two formats.
> >>>
> >>> For more details, please take a look at the FLIP [
> https://cwiki.apache.org/confluence/x/T48ODg].
> >>> I have updated it with an example of query response bodies in two
> format in section:
> >>> Public Interface -> REST Endpoint Modification.
> >>>
> >>>> 2022年12月12日 18:09,Timo Walther <twalthr@apache.org <mailto:
> twalthr@apache.org>> 写道:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> sorry to jump into this discussion so late.
> >>>>
> >>>> > So we decided to revert the RowFormat related changes and let the
> client to resolve the print format.
> >>>>
> >>>> Could you elaborate a bit on this topic in the FLIP? I still believe
> that we need 2 types of output formats.
> >>>>
> >>>> Format A: for the SQL Client CLI and other interactive notebooks that
> just uses SQL CAST(... AS STRING) semantics executed on the server side
> >>>>
> >>>> Format B: for JDBC SDK or other machine-readable downstream libraries
> >>>>
> >>>> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string
> representation depends on a session configuration option. Clients might not
> be aware of this session option, so the formatting must happen on the
> server side.
> >>>>
> >>>> However, when the downstream consumer is a library, maybe the library
> would like to get the raw millis/nanos since epoch.
> >>>>
> >>>> Also nested rows and collections might be better encoded with format
> B for libraries but interactive sessions are happy if nested types are
> already formatted server-side, so not every client needs custom code for
> the formatting.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>>
> >>>> On 06.12.22 15:13, godfrey he wrote:
> >>>>> Hi, zeklin
> >>>>>> The CLI will use default print style for the non-query result.
> >>>>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
> >>>>> commands are clear.
> >>>>>> We think it’s better to add the root cause to the ErrorResponseBody.
> >>>>> LGTM
> >>>>> Best,
> >>>>> Godfrey
> >>>>> yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>>
> 于2022年12月6日周二 17:51写道:
> >>>>>>
> >>>>>> Hi, Godfrey
> >>>>>>
> >>>>>> Thanks for your feedback. Below is my thoughts about your questions.
> >>>>>>
> >>>>>> 1. About RowFormat.
> >>>>>> I agree to your opinion. So we decided to revert the RowFormat
> related changes
> >>>>>> and let the client to resolve the print format.
> >>>>>>
> >>>>>> 2. About ContentType
> >>>>>> I agree that the definition of the ContentType is not clear. But
> how to define the
> >>>>>> statement type is another big question. So, we decided to only tell
> the query result
> >>>>>> and non-query result apart. The CLI will use default print style
> for the non-query
> >>>>>> result.
> >>>>>>
> >>>>>> 3. About ErrorHandling
> >>>>>> I think reuse the current ErrorResponseBody is good, but parse the
> root cause
> >>>>>> from the exception stack strings is quite hacking. We think it’s
> better to add the
> >>>>>> root cause to the ErrorResponseBody.
> >>>>>>
> >>>>>> 4. About Runtime REST API Modifications
> >>>>>> I agree, too. This part is moved to the ‘Future Work’.
> >>>>>>
> >>>>>> Best,
> >>>>>> Yu Zelin
> >>>>>>
> >>>>>>
> >>>>>>> 2022年12月5日 18:33,godfrey he <godfreyhe@gmail.com <mailto:
> godfreyhe@gmail.com>> 写道:
> >>>>>>>
> >>>>>>> Hi Zelin,
> >>>>>>>
> >>>>>>> Thanks for driving this discussion.
> >>>>>>>
> >>>>>>> I have a few comments,
> >>>>>>>
> >>>>>>>> Add RowFormat to ResultSet to indicate the format of rows.
> >>>>>>> We should not require SqlGateway server to meet the display
> >>>>>>> requirements of a CliClient.
> >>>>>>> Because different CliClients may have different display style. The
> >>>>>>> server just need to response the data,
> >>>>>>> and the CliClient prints the result as needed. So RowFormat is not
> needed.
> >>>>>>>
> >>>>>>>> Add ContentType to ResultSet to indicate what kind of data the
> result contains.
> >>>>>>> from my first sight, the values of ContentType are intersected,
> such
> >>>>>>> as: A select query will return QUERY_RESULT,
> >>>>>>> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> >>>>>>> kind of query will return OTHER.
> >>>>>>> I recommend returning the concrete type for each statement, such as
> >>>>>>> "CREATE TABLE" for "create table xx (...) with ()",
> >>>>>>> "SELECT" for "select * from xxx". The statement type can be
> maintained
> >>>>>>> in `Operation`s.
> >>>>>>>
> >>>>>>>> Error Handling
> >>>>>>> I think current design of error handling mechanism can meet the
> >>>>>>> requirement of CliClient, we can get the root cause from
> >>>>>>> the stack (see ErrorResponseBody#errors). If it becomes a common
> >>>>>>> requirement (for many clients) in the future,
> >>>>>>> we can introduce this interface.
> >>>>>>>
> >>>>>>>> Runtime REST API Modification for Local Client Migration
> >>>>>>> I think this part is over-engineered, this part belongs to
> optimization.
> >>>>>>> The client does not require very high performance, the current
> design
> >>>>>>> can already meet our needs.
> >>>>>>> If we find performance problems in the future, do such
> optimizations.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Godfrey
> >>>>>>>
> >>>>>>> yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>>
> 于2022年12月5日周一 11:11写道:
> >>>>>>>>
> >>>>>>>> Hi, Shammon
> >>>>>>>>
> >>>>>>>> Thanks for your feedback. I think it’s good to support jdbc-sdk.
> However,
> >>>>>>>> it's not supported in the gateway side yet. In my opinion, this
> FLIP is more
> >>>>>>>> concerned with the SQL Client. How about put “supporting
> jdbc-sdk” in
> >>>>>>>> ‘Future Work’? We can discuss how to implement it in another
> thread.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Yu Zelin
> >>>>>>>>> 2022年12月2日 18:12,Shammon FY <zjureel@gmail.com <mailto:
> zjureel@gmail.com>> 写道:
> >>>>>>>>>
> >>>>>>>>> Hi zelin
> >>>>>>>>>
> >>>>>>>>> Thanks for driving this discussion.
> >>>>>>>>>
> >>>>>>>>> I notice that the sql-client will interact with sql-gateway by
> `REST
> >>>>>>>>> Client` in the `Executor` in the FLIP, how about introducing
> jdbc-sdk for
> >>>>>>>>> sql-gateway?
> >>>>>>>>>
> >>>>>>>>> Then the sql-client can connect the gateway with jdbc-sdk, on
> the other
> >>>>>>>>> hand, the other applications and tools such as jmeter can use
> the jdbc-sdk
> >>>>>>>>> to connect sql-gateway too.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Shammon
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin.yzl@gmail.com
> <ma...@gmail.com>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Jim,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for your feedback!
> >>>>>>>>>>
> >>>>>>>>>>> Should this configuration be mentioned in the FLIP?
> >>>>>>>>>>
> >>>>>>>>>> Sure.
> >>>>>>>>>>
> >>>>>>>>>>> some way for the server to be able to limit the number of
> requests it
> >>>>>>>>>> receives.
> >>>>>>>>>> I’m sorry that this FLIP is dedicated in implementing the
> Remote mode, so
> >>>>>>>>>> we
> >>>>>>>>>> didn't consider much about this. I think the option is enough
> currently.
> >>>>>>>>>> I will add
> >>>>>>>>>> the improvement suggestions to the ‘Future Work’.
> >>>>>>>>>>
> >>>>>>>>>>> I wonder if two other options are possible
> >>>>>>>>>>
> >>>>>>>>>> To forward the raw format to gateway and then to client is
> possible. The
> >>>>>>>>>> raw
> >>>>>>>>>> results from sink is in ‘CollectResultIterator#bufferedResult’.
> First, we
> >>>>>>>>>> can find
> >>>>>>>>>> a way to get this result without wrapping it. Second,
> constructing a
> >>>>>>>>>> ‘InternalTypeInfo’.
> >>>>>>>>>> We can construct it using the schema information (data’s
> logical type).
> >>>>>>>>>> After
> >>>>>>>>>> construction, we can get the ’TypeSerializer’ to deserialize
> the raw
> >>>>>>>>>> result.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Yu,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for moving my comments to this thread!  Also, thank you
> for
> >>>>>>>>>>> answering my questions; it is helping me understand the SQL
> Gateway
> >>>>>>>>>>> better.
> >>>>>>>>>>>
> >>>>>>>>>>> 5.
> >>>>>>>>>>>> Our idea is to introduce a new session option (like
> >>>>>>>>>>> 'sql-client.result.fetch-interval') to control
> >>>>>>>>>>> the fetching requests sending frequency. What do you think?
> >>>>>>>>>>>
> >>>>>>>>>>> Should this configuration be mentioned in the FLIP?
> >>>>>>>>>>>
> >>>>>>>>>>> One slight concern I have with having
> 'sql-client.result.fetch-interval'
> >>>>>>>>>> as
> >>>>>>>>>>> a session configuration is that users could set it low and
> cause the
> >>>>>>>>>> client
> >>>>>>>>>>> to send a large volume of requests to the SQL gateway.
> >>>>>>>>>>>
> >>>>>>>>>>> Generally, I'd like to see some way for the server to be able
> to limit
> >>>>>>>>>> the
> >>>>>>>>>>> number of requests it receives.  If that really needs to be
> done by a
> >>>>>>>>>> proxy
> >>>>>>>>>>> in front of the SQL gateway, that is fine as well.  (To be
> clear, I don't
> >>>>>>>>>>> think my concern here should be blocking in any way.)
> >>>>>>>>>>>
> >>>>>>>>>>> 7.
> >>>>>>>>>>>> What is the serialization lifecycle for results?
> >>>>>>>>>>>
> >>>>>>>>>>> I wonder if two other options are possible:
> >>>>>>>>>>> 3) Could the Gateway just forward the result byte array?  (Or
> does the
> >>>>>>>>>>> Gateway need to deserialize the response in order to
> understand it for
> >>>>>>>>>> some
> >>>>>>>>>>> reason?)
> >>>>>>>>>>> 4) Could the JobManager prepare the results in JSON?  (Or
> similarly could
> >>>>>>>>>>> the Client read the format which the JobManager sends?)
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks again!
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>>
> >>>>>>>>>>> Jim
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <
> yuzelin.yzl@gmail.com <ma...@gmail.com>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi, all
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> 1. For the Client Parser, is it going to work with the
> extended syntax
> >>>>>>>>>>>>> from the Flink Table Store?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2. Relatedly, what will happen if an older Client tries to
> handle
> >>>>>>>>>>>> syntax
> >>>>>>>>>>>>> that a newer service supports?  (Suppose I use a 1.17 client
> with a
> >>>>>>>>>>>> 1.18
> >>>>>>>>>>>>> Gateway/system which has a new keyword.  Is there anything
> we should
> >>>>>>>>>> be
> >>>>>>>>>>>>> designing for upfront?)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 3. How will client and server version mismatches be
> handled?  Will a
> >>>>>>>>>>>>> single gateway be able to support multiple endpoint versions?
> >>>>>>>>>>>>> 4. How are commands which change a session handled?  Are
> those sent
> >>>>>>>>>> via
> >>>>>>>>>>>>> an ExecuteStatementRequest?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 5. The remote POC uses polling for getting back status and
> getting
> >>>>>>>>>> back
> >>>>>>>>>>>>> results.  Would it be possible to switch to web sockets or
> some other
> >>>>>>>>>>>>> mechanism to avoid polling?  If polling is used for both,
> the polling
> >>>>>>>>>>>>> frequency should be different between local and remote
> configurations.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 6. What does this sentence mean?  "The reason why we didn't
> get the
> >>>>>>>>>> sql
> >>>>>>>>>>>>> type in client side is because it's hard for the lightweight
> >>>>>>>>>>>> client-level
> >>>>>>>>>>>>> parser to recognize some sql type  sql, such as query with
> CTE.  "
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 7. What is the serialization lifecycle for results?  It
> makes sense to
> >>>>>>>>>>>>> have some control over whether the gateway returns results
> as SQL or
> >>>>>>>>>>>> JSON.
> >>>>>>>>>>>>> I'd love to see a way to avoid needing to serialize and
> deserialize
> >>>>>>>>>>>> results
> >>>>>>>>>>>>> on the SQL Gateway if possible.  I'm still new enough to the
> project
> >>>>>>>>>>>> that
> >>>>>>>>>>>>> I'm not sure if that's readily possible.  Maybe the SQL
> Gateway's
> >>>>>>>>>>>> return
> >>>>>>>>>>>>> type can be sent as part of the request so that the
> JobManager can
> >>>>>>>>>> send
> >>>>>>>>>>>>> back results in an advantageous format?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm excited for the SQL client to support gateway mode!
> Given the
> >>>>>>>>>> change
> >>>>>>>>>>>>> in design, do you think it'll still be part of the Flink
> 1.17 release?
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax.
> It is
> >>>>>>>>>> because
> >>>>>>>>>>>> if the
> >>>>>>>>>>>> sql type is not recognized, the sql will be submitted to the
> gateway
> >>>>>>>>>>>> directly.
> >>>>>>>>>>>>
> >>>>>>>>>>>> For more information: Actually, the proposed ClientParser
> only do two
> >>>>>>>>>>>> things:
> >>>>>>>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
> >>>>>>>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we
> can print
> >>>>>>>>>> raw
> >>>>>>>>>>>> string
> >>>>>>>>>>>> for the SHOW CREATE result instead of table). Here the
> recognization of
> >>>>>>>>>>>> sql types
> >>>>>>>>>>>> mostly affects the print style, and unrecognized sql also can
> be
> >>>>>>>>>> submitted
> >>>>>>>>>>>> to cluster.
> >>>>>>>>>>>> So the Client with new ClientParser can work compatible with
> new syntax.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. First, I'd like to explain that the gateway APIs and
> supported syntax
> >>>>>>>>>>>> is two things.
> >>>>>>>>>>>> For example, ‘configureSession' and 'completeStatement' are
> APIs. As
> >>>>>>>>>>>> mentioned
> >>>>>>>>>>>> in #1, the sql statements which syntax is unknown will be
> submitted to
> >>>>>>>>>> the
> >>>>>>>>>>>> gateway,
> >>>>>>>>>>>> and whether they can be executed normally depends on whether
> the
> >>>>>>>>>> execution
> >>>>>>>>>>>> environment supports the syntax.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Is there anything we should be designing for upfront?
> >>>>>>>>>>>>
> >>>>>>>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is
> for sql
> >>>>>>>>>>>> gateway APIs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3.
> >>>>>>>>>>>>> How will client and server version mismatches be handled?
> >>>>>>>>>>>>
> >>>>>>>>>>>> A lower version client can work compatible with a higher
> version gateway
> >>>>>>>>>>>> because the
> >>>>>>>>>>>> old interfaces won’t be deleted. When a higher version client
> connects
> >>>>>>>>>> to
> >>>>>>>>>>>> a lower version
> >>>>>>>>>>>> gateway, the client should notify the users if they try to use
> >>>>>>>>>> unsupported
> >>>>>>>>>>>> features. For
> >>>>>>>>>>>> example, the client start option ‘-i’  means using
> initialization file
> >>>>>>>>>> to
> >>>>>>>>>>>> initialize the session.
> >>>>>>>>>>>> We plan to use the gateway’s ‘configureSession’ to implement
> it. But
> >>>>>>>>>> this
> >>>>>>>>>>>> API is not
> >>>>>>>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1),
> so if the
> >>>>>>>>>>>> user try to
> >>>>>>>>>>>> use ‘-i’ option to start the client with the 1.16 gateway,
> the client
> >>>>>>>>>>>> should tell the user that
> >>>>>>>>>>>> Can’t execute ‘-i’ option with gateway which version is lower
> than V2.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Will a single gateway be able to support multiple endpoint
> versions?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Currently, the gateway only starts a highest version endpoint
> and the
> >>>>>>>>>>>> higher version endpoint
> >>>>>>>>>>>> is compatible with the lower version endpoint’s protocol.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change
> the
> >>>>>>>>>> session
> >>>>>>>>>>>> configuration.
> >>>>>>>>>>>> Notice: the client can’t change the session (I mean, close
> current
> >>>>>>>>>> session
> >>>>>>>>>>>> and open another
> >>>>>>>>>>>> one). I’m not sure if you have need to change the session
> itself?
> >>>>>>>>>>>>
> >>>>>>>>>>>> 5.
> >>>>>>>>>>>>> Would it be possible to switch to web sockets or some other
> mechanism
> >>>>>>>>>>>> to avoid polling?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Your suggestion is very good, but this flip is for supporting
> the remote
> >>>>>>>>>>>> client. How about taking
> >>>>>>>>>>>> it as a future work?
> >>>>>>>>>>>>
> >>>>>>>>>>>>> If polling is used for both, the polling frequency should be
> different
> >>>>>>>>>>>> between local and remote
> >>>>>>>>>>>> configurations.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Our idea is to introduce a new session option (like
> >>>>>>>>>>>> 'sql-client.result.fetch-interval') to control
> >>>>>>>>>>>> the fetching requests sending frequency. What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>> For more information: we are inclined to keep the polling
> behavior in
> >>>>>>>>>> this
> >>>>>>>>>>>> version. For streaming
> >>>>>>>>>>>> query, fetching results synchronously may occupy resources of
> the
> >>>>>>>>>> gateway
> >>>>>>>>>>>> in a long period.
> >>>>>>>>>>>> For example, if the job doesn’t return results for a long
> time because
> >>>>>>>>>> the
> >>>>>>>>>>>> window has not been
> >>>>>>>>>>>> triggered, the synchronously fetching will keep occupying the
> >>>>>>>>>> connection.
> >>>>>>>>>>>> In asynchronous
> >>>>>>>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly
> and release
> >>>>>>>>>>>> the resources
> >>>>>>>>>>>> for other clients to use. I think we can make some
> improvements for the
> >>>>>>>>>>>> whole flow path in the
> >>>>>>>>>>>> future.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me
> make it
> >>>>>>>>>> clear.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We proposed to add 'ContentType' to indicates the result is
> for what
> >>>>>>>>>> kind
> >>>>>>>>>>>> of sql. In this sentence,
> >>>>>>>>>>>> I want to explain why we add 'ContentType' since the
> ClientParser can
> >>>>>>>>>>>> recognize the sql type too.
> >>>>>>>>>>>> It is because the proposed ClientParser can't recognize
> complex syntax.
> >>>>>>>>>>>> For example, it can’t
> >>>>>>>>>>>> recognize query with CTE. So the result should carry content
> type
> >>>>>>>>>>>> information to help the client to
> >>>>>>>>>>>> know the sql type. For example, the
> 'ContentType.QUERY_RESULT' indicates
> >>>>>>>>>>>> the result is for a
> >>>>>>>>>>>> query statement.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 7.
> >>>>>>>>>>>>> What is the serialization lifecycle for results?
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
> >>>>>>>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
> >>>>>>>>>>>> 3) Gateway sending            : RowData -> Byte[ ]
> (serialized to JSON
> >>>>>>>>>>>> format)
> >>>>>>>>>>>> 4) Client receiving               : Byte[ ] -> RowData
> (deserialize)
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Maybe the SQL Gateway's return type can be sent as part of
> the request
> >>>>>>>>>>>> so that the
> >>>>>>>>>>>> JobManager can send  back results in an advantageous format?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes. I think it's an improvement for the Client and Gateway.
> We have
> >>>>>>>>>> some
> >>>>>>>>>>>> ideas. For example,
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1) We can move the Gateway into the JobManager and reduce the
> Ser/De
> >>>>>>>>>> costs
> >>>>>>>>>>>> from JM to Gateway.
> >>>>>>>>>>>> 2) Or the Gateway can collect the data from the sink function
> directly
> >>>>>>>>>>>> instead of JobManager.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But I think we can leave this as a future work and discuss in
> another
> >>>>>>>>>>>> thread.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 8. Yes.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Do you think it'll still be part of the Flink 1.17 release?
> >>>>>>>>>>>> Yes. We will try our best to finish the work.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Feel free to talk to me if I’m wrong or you have any other
> questions.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> 2022年11月25日 11:48,yu zelin <yuzelin.yzl@gmail.com <mailto:
> yuzelin.yzl@gmail.com>> 写道:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi, all
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I want to initiate a discussion on the FLIP-275: Support
> Remote SQL
> >>>>>>>>>>>> Client Based on SQL Gateway[1].
> >>>>>>>>>>>>> The motivation of this FLIP is that the current SQL Client
> allows only
> >>>>>>>>>>>> local connection which can not satisfy
> >>>>>>>>>>>>> the common need of connecting to a remote cluster.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed
> to
> >>>>>>>>>>>> implement the Remote SQL Client
> >>>>>>>>>>>>> based on SQL Gateway. In our design, we proposed two main
> changes:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1. New remote mode client which performs connection to the
> remote
> >>>>>>>>>>>> gateway through REST API.
> >>>>>>>>>>>>> 2. Migration of the current local mode client. We proposed
> to refactor
> >>>>>>>>>>>> the local client based on SQL Gateway
> >>>>>>>>>>>>> to unify the interface for two modes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Looking forward to your suggestions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Yu Zelin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
> >>>>>>>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by yu zelin <yu...@gmail.com>.
Hi all,

Recently I have received some feedbacks about the REST Endpoint modification. The main point
is use ‘ResultSet’ as a part of FetchResultsResponseBody’ is not convenient for serialization and
deserialization. So I think it’s better to introduce a new ‘ResultInfo’ to carry the data. The ‘ResultInfo’
will carry the row format information and be serialized and deserialized according to the row format.

In FLIP, I have modified the Section: Public Interface -> REST Endpoint Modification. The main 
change is the ‘FetchJsonFormatResultsResponseBody' and ‘FetchPlainTextResultsResponseBody’
was deleted and the ‘overview of fetching results REST API’ was minor modified. 

Best,
Yu Zelin

> 2022年12月13日 17:13,yu zelin <yu...@gmail.com> 写道:
> 
> Hi everyone,
> 
> Sorry for the incorrect message in my last email. I want to start the vote on Wednesday 
> as long as there are no questions in this period.
> 
> Best,
> Yu Zelin
> 
> On Tue, Dec 13, 2022 at 5:08 PM yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> wrote:
>> Hi, everyone,
>> 
>> Looks like our new design is similar to Timo’s suggestion, and considering that there has
>> no response from other devs for a long time, I want to start the vote on Thursday.  
>> 
>> 
>> Best,
>> Yu Zelin
>> 
>>> 2022年12月13日 16:23,yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> 写道:
>>> 
>>> Hi, Timo,
>>> 
>>> Thanks for your suggestion. Recently I have discussed with @Godfrey He, @Shengkai Fang 
>>> and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We finally came to 
>>> a consensus which is similar to your suggestion. The details are as follows:
>>> 
>>> 1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell the REST Endpoint
>>> how to deserialize the RowData int ResultSet.
>>> 
>>>     JSON format means the RowData will be serialized to JSON format, which contains original 
>>>     LogicalType information, so it can be deserialized back to RowData.
>>> 
>>>     PLAIN_TEXT format means the RowData will be serialized to SQL-compliant, plain strings. 
>>>     The SQL Client can print the strings directly.
>>> 
>>> The example URI for fetching results is:
>>> > /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT
>>> 
>>> 2. Introduce two response bodies for fetching results in two formats.
>>> 
>>> For more details, please take a look at the FLIP [https://cwiki.apache.org/confluence/x/T48ODg]. 
>>> I have updated it with an example of query response bodies in two format in section:
>>> Public Interface -> REST Endpoint Modification.
>>> 
>>>> 2022年12月12日 18:09,Timo Walther <twalthr@apache.org <ma...@apache.org>> 写道:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> sorry to jump into this discussion so late.
>>>> 
>>>> > So we decided to revert the RowFormat related changes and let the client to resolve the print format.
>>>> 
>>>> Could you elaborate a bit on this topic in the FLIP? I still believe that we need 2 types of output formats.
>>>> 
>>>> Format A: for the SQL Client CLI and other interactive notebooks that just uses SQL CAST(... AS STRING) semantics executed on the server side
>>>> 
>>>> Format B: for JDBC SDK or other machine-readable downstream libraries
>>>> 
>>>> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string representation depends on a session configuration option. Clients might not be aware of this session option, so the formatting must happen on the server side.
>>>> 
>>>> However, when the downstream consumer is a library, maybe the library would like to get the raw millis/nanos since epoch.
>>>> 
>>>> Also nested rows and collections might be better encoded with format B for libraries but interactive sessions are happy if nested types are already formatted server-side, so not every client needs custom code for the formatting.
>>>> 
>>>> Regards,
>>>> Timo
>>>> 
>>>> 
>>>> 
>>>> On 06.12.22 15:13, godfrey he wrote:
>>>>> Hi, zeklin
>>>>>> The CLI will use default print style for the non-query result.
>>>>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
>>>>> commands are clear.
>>>>>> We think it’s better to add the root cause to the ErrorResponseBody.
>>>>> LGTM
>>>>> Best,
>>>>> Godfrey
>>>>> yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> 于2022年12月6日周二 17:51写道:
>>>>>> 
>>>>>> Hi, Godfrey
>>>>>> 
>>>>>> Thanks for your feedback. Below is my thoughts about your questions.
>>>>>> 
>>>>>> 1. About RowFormat.
>>>>>> I agree to your opinion. So we decided to revert the RowFormat related changes
>>>>>> and let the client to resolve the print format.
>>>>>> 
>>>>>> 2. About ContentType
>>>>>> I agree that the definition of the ContentType is not clear. But how to define the
>>>>>> statement type is another big question. So, we decided to only tell the query result
>>>>>> and non-query result apart. The CLI will use default print style for the non-query
>>>>>> result.
>>>>>> 
>>>>>> 3. About ErrorHandling
>>>>>> I think reuse the current ErrorResponseBody is good, but parse the root cause
>>>>>> from the exception stack strings is quite hacking. We think it’s better to add the
>>>>>> root cause to the ErrorResponseBody.
>>>>>> 
>>>>>> 4. About Runtime REST API Modifications
>>>>>> I agree, too. This part is moved to the ‘Future Work’.
>>>>>> 
>>>>>> Best,
>>>>>> Yu Zelin
>>>>>> 
>>>>>> 
>>>>>>> 2022年12月5日 18:33,godfrey he <godfreyhe@gmail.com <ma...@gmail.com>> 写道:
>>>>>>> 
>>>>>>> Hi Zelin,
>>>>>>> 
>>>>>>> Thanks for driving this discussion.
>>>>>>> 
>>>>>>> I have a few comments,
>>>>>>> 
>>>>>>>> Add RowFormat to ResultSet to indicate the format of rows.
>>>>>>> We should not require SqlGateway server to meet the display
>>>>>>> requirements of a CliClient.
>>>>>>> Because different CliClients may have different display style. The
>>>>>>> server just need to response the data,
>>>>>>> and the CliClient prints the result as needed. So RowFormat is not needed.
>>>>>>> 
>>>>>>>> Add ContentType to ResultSet to indicate what kind of data the result contains.
>>>>>>> from my first sight, the values of ContentType are intersected, such
>>>>>>> as: A select query will return QUERY_RESULT,
>>>>>>> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
>>>>>>> kind of query will return OTHER.
>>>>>>> I recommend returning the concrete type for each statement, such as
>>>>>>> "CREATE TABLE" for "create table xx (...) with ()",
>>>>>>> "SELECT" for "select * from xxx". The statement type can be maintained
>>>>>>> in `Operation`s.
>>>>>>> 
>>>>>>>> Error Handling
>>>>>>> I think current design of error handling mechanism can meet the
>>>>>>> requirement of CliClient, we can get the root cause from
>>>>>>> the stack (see ErrorResponseBody#errors). If it becomes a common
>>>>>>> requirement (for many clients) in the future,
>>>>>>> we can introduce this interface.
>>>>>>> 
>>>>>>>> Runtime REST API Modification for Local Client Migration
>>>>>>> I think this part is over-engineered, this part belongs to optimization.
>>>>>>> The client does not require very high performance, the current design
>>>>>>> can already meet our needs.
>>>>>>> If we find performance problems in the future, do such optimizations.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Godfrey
>>>>>>> 
>>>>>>> yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> 于2022年12月5日周一 11:11写道:
>>>>>>>> 
>>>>>>>> Hi, Shammon
>>>>>>>> 
>>>>>>>> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
>>>>>>>> it's not supported in the gateway side yet. In my opinion, this FLIP is more
>>>>>>>> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
>>>>>>>> ‘Future Work’? We can discuss how to implement it in another thread.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Yu Zelin
>>>>>>>>> 2022年12月2日 18:12,Shammon FY <zjureel@gmail.com <ma...@gmail.com>> 写道:
>>>>>>>>> 
>>>>>>>>> Hi zelin
>>>>>>>>> 
>>>>>>>>> Thanks for driving this discussion.
>>>>>>>>> 
>>>>>>>>> I notice that the sql-client will interact with sql-gateway by `REST
>>>>>>>>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
>>>>>>>>> sql-gateway?
>>>>>>>>> 
>>>>>>>>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
>>>>>>>>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
>>>>>>>>> to connect sql-gateway too.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Shammon
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Jim,
>>>>>>>>>> 
>>>>>>>>>> Thanks for your feedback!
>>>>>>>>>> 
>>>>>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>>>>> 
>>>>>>>>>> Sure.
>>>>>>>>>> 
>>>>>>>>>>> some way for the server to be able to limit the number of requests it
>>>>>>>>>> receives.
>>>>>>>>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>>>>>>>>>> we
>>>>>>>>>> didn't consider much about this. I think the option is enough currently.
>>>>>>>>>> I will add
>>>>>>>>>> the improvement suggestions to the ‘Future Work’.
>>>>>>>>>> 
>>>>>>>>>>> I wonder if two other options are possible
>>>>>>>>>> 
>>>>>>>>>> To forward the raw format to gateway and then to client is possible. The
>>>>>>>>>> raw
>>>>>>>>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>>>>>>>>>> can find
>>>>>>>>>> a way to get this result without wrapping it. Second, constructing a
>>>>>>>>>> ‘InternalTypeInfo’.
>>>>>>>>>> We can construct it using the schema information (data’s logical type).
>>>>>>>>>> After
>>>>>>>>>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>>>>>>>>>> result.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Yu,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for moving my comments to this thread!  Also, thank you for
>>>>>>>>>>> answering my questions; it is helping me understand the SQL Gateway
>>>>>>>>>>> better.
>>>>>>>>>>> 
>>>>>>>>>>> 5.
>>>>>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>>>>> 
>>>>>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>>>>>> 
>>>>>>>>>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>>>>>>>>>> as
>>>>>>>>>>> a session configuration is that users could set it low and cause the
>>>>>>>>>> client
>>>>>>>>>>> to send a large volume of requests to the SQL gateway.
>>>>>>>>>>> 
>>>>>>>>>>> Generally, I'd like to see some way for the server to be able to limit
>>>>>>>>>> the
>>>>>>>>>>> number of requests it receives.  If that really needs to be done by a
>>>>>>>>>> proxy
>>>>>>>>>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>>>>>>>>>> think my concern here should be blocking in any way.)
>>>>>>>>>>> 
>>>>>>>>>>> 7.
>>>>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>>>>> 
>>>>>>>>>>> I wonder if two other options are possible:
>>>>>>>>>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>>>>>>>>>> Gateway need to deserialize the response in order to understand it for
>>>>>>>>>> some
>>>>>>>>>>> reason?)
>>>>>>>>>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>>>>>>>>>> the Client read the format which the JobManager sends?)
>>>>>>>>>>> 
>>>>>>>>>>> Thanks again!
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> 
>>>>>>>>>>> Jim
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi, all
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>>>>>>>>>> from the Flink Table Store?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>>>>>>>>>> syntax
>>>>>>>>>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>>>>>>>>>> 1.18
>>>>>>>>>>>>> Gateway/system which has a new keyword.  Is there anything we should
>>>>>>>>>> be
>>>>>>>>>>>>> designing for upfront?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>>>>>>>>>> single gateway be able to support multiple endpoint versions?
>>>>>>>>>>>>> 4. How are commands which change a session handled?  Are those sent
>>>>>>>>>> via
>>>>>>>>>>>>> an ExecuteStatementRequest?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 5. The remote POC uses polling for getting back status and getting
>>>>>>>>>> back
>>>>>>>>>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>>>>>>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>>>>>>>>>> frequency should be different between local and remote configurations.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>>>>>>>>>> sql
>>>>>>>>>>>>> type in client side is because it's hard for the lightweight
>>>>>>>>>>>> client-level
>>>>>>>>>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>>>>>>>>>> have some control over whether the gateway returns results as SQL or
>>>>>>>>>>>> JSON.
>>>>>>>>>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>>>>>>>>>> results
>>>>>>>>>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>>>>>>>>>> that
>>>>>>>>>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>>>>>>>>>> return
>>>>>>>>>>>>> type can be sent as part of the request so that the JobManager can
>>>>>>>>>> send
>>>>>>>>>>>>> back results in an advantageous format?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>>>>>>>>>> change
>>>>>>>>>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>>>>> 
>>>>>>>>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>>>>>>>>>> because
>>>>>>>>>>>> if the
>>>>>>>>>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>>>>>>>>>> directly.
>>>>>>>>>>>> 
>>>>>>>>>>>> For more information: Actually, the proposed ClientParser only do two
>>>>>>>>>>>> things:
>>>>>>>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>>>>>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>>>>>>>>>> raw
>>>>>>>>>>>> string
>>>>>>>>>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>>>>>>>>>> sql types
>>>>>>>>>>>> mostly affects the print style, and unrecognized sql also can be
>>>>>>>>>> submitted
>>>>>>>>>>>> to cluster.
>>>>>>>>>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>>>>>>>>>> is two things.
>>>>>>>>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>>>>>>>>>> mentioned
>>>>>>>>>>>> in #1, the sql statements which syntax is unknown will be submitted to
>>>>>>>>>> the
>>>>>>>>>>>> gateway,
>>>>>>>>>>>> and whether they can be executed normally depends on whether the
>>>>>>>>>> execution
>>>>>>>>>>>> environment supports the syntax.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Is there anything we should be designing for upfront?
>>>>>>>>>>>> 
>>>>>>>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>>>>>>>>>> gateway APIs.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3.
>>>>>>>>>>>>> How will client and server version mismatches be handled?
>>>>>>>>>>>> 
>>>>>>>>>>>> A lower version client can work compatible with a higher version gateway
>>>>>>>>>>>> because the
>>>>>>>>>>>> old interfaces won’t be deleted. When a higher version client connects
>>>>>>>>>> to
>>>>>>>>>>>> a lower version
>>>>>>>>>>>> gateway, the client should notify the users if they try to use
>>>>>>>>>> unsupported
>>>>>>>>>>>> features. For
>>>>>>>>>>>> example, the client start option ‘-i’  means using initialization file
>>>>>>>>>> to
>>>>>>>>>>>> initialize the session.
>>>>>>>>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>>>>>>>>>> this
>>>>>>>>>>>> API is not
>>>>>>>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>>>>>>>>>> user try to
>>>>>>>>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>>>>>>>>>> should tell the user that
>>>>>>>>>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>>>>>>>>>> 
>>>>>>>>>>>> Currently, the gateway only starts a highest version endpoint and the
>>>>>>>>>>>> higher version endpoint
>>>>>>>>>>>> is compatible with the lower version endpoint’s protocol.
>>>>>>>>>>>> 
>>>>>>>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>>>>>>>>>> session
>>>>>>>>>>>> configuration.
>>>>>>>>>>>> Notice: the client can’t change the session (I mean, close current
>>>>>>>>>> session
>>>>>>>>>>>> and open another
>>>>>>>>>>>> one). I’m not sure if you have need to change the session itself?
>>>>>>>>>>>> 
>>>>>>>>>>>> 5.
>>>>>>>>>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>>>>>>>>>> to avoid polling?
>>>>>>>>>>>> 
>>>>>>>>>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>>>>>>>>>> client. How about taking
>>>>>>>>>>>> it as a future work?
>>>>>>>>>>>> 
>>>>>>>>>>>>> If polling is used for both, the polling frequency should be different
>>>>>>>>>>>> between local and remote
>>>>>>>>>>>> configurations.
>>>>>>>>>>>> 
>>>>>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>>>>>> 
>>>>>>>>>>>> For more information: we are inclined to keep the polling behavior in
>>>>>>>>>> this
>>>>>>>>>>>> version. For streaming
>>>>>>>>>>>> query, fetching results synchronously may occupy resources of the
>>>>>>>>>> gateway
>>>>>>>>>>>> in a long period.
>>>>>>>>>>>> For example, if the job doesn’t return results for a long time because
>>>>>>>>>> the
>>>>>>>>>>>> window has not been
>>>>>>>>>>>> triggered, the synchronously fetching will keep occupying the
>>>>>>>>>> connection.
>>>>>>>>>>>> In asynchronous
>>>>>>>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>>>>>>>>>> the resources
>>>>>>>>>>>> for other clients to use. I think we can make some improvements for the
>>>>>>>>>>>> whole flow path in the
>>>>>>>>>>>> future.
>>>>>>>>>>>> 
>>>>>>>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>>>>>>>>>> clear.
>>>>>>>>>>>> 
>>>>>>>>>>>> We proposed to add 'ContentType' to indicates the result is for what
>>>>>>>>>> kind
>>>>>>>>>>>> of sql. In this sentence,
>>>>>>>>>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>>>>>>>>>> recognize the sql type too.
>>>>>>>>>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>>>>>>>>>> For example, it can’t
>>>>>>>>>>>> recognize query with CTE. So the result should carry content type
>>>>>>>>>>>> information to help the client to
>>>>>>>>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>>>>>>>>>> the result is for a
>>>>>>>>>>>> query statement.
>>>>>>>>>>>> 
>>>>>>>>>>>> 7.
>>>>>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>>>>>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>>>>>>>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>>>>>>>>>> format)
>>>>>>>>>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>>>>>>>>>> so that the
>>>>>>>>>>>> JobManager can send  back results in an advantageous format?
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>>>>>>>>>> some
>>>>>>>>>>>> ideas. For example,
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>>>>>>>>>> costs
>>>>>>>>>>>> from JM to Gateway.
>>>>>>>>>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>>>>>>>>>> instead of JobManager.
>>>>>>>>>>>> 
>>>>>>>>>>>> But I think we can leave this as a future work and discuss in another
>>>>>>>>>>>> thread.
>>>>>>>>>>>> 
>>>>>>>>>>>> 8. Yes.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>>>>> Yes. We will try our best to finish the work.
>>>>>>>>>>>> 
>>>>>>>>>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 2022年11月25日 11:48,yu zelin <yuzelin.yzl@gmail.com <ma...@gmail.com>> 写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi, all
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>>>>>>>>>> Client Based on SQL Gateway[1].
>>>>>>>>>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>>>>>>>>>> local connection which can not satisfy
>>>>>>>>>>>>> the common need of connecting to a remote cluster.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>>>>>>>>>> implement the Remote SQL Client
>>>>>>>>>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. New remote mode client which performs connection to the remote
>>>>>>>>>>>> gateway through REST API.
>>>>>>>>>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>>>>>>>>>> the local client based on SQL Gateway
>>>>>>>>>>>>> to unify the interface for two modes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Looking forward to your suggestions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Yu Zelin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>>>>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by yu zelin <yu...@gmail.com>.
Hi everyone,

Sorry for the incorrect message in my last email. I want to start the vote
on Wednesday
as long as there are no questions in this period.

Best,
Yu Zelin

On Tue, Dec 13, 2022 at 5:08 PM yu zelin <yu...@gmail.com> wrote:

> Hi, everyone,
>
> Looks like our new design is similar to Timo’s suggestion, and considering
> that there has
> no response from other devs for a long time, I want to start the vote on
> Thursday.
>
>
> Best,
> Yu Zelin
>
> 2022年12月13日 16:23,yu zelin <yu...@gmail.com> 写道:
>
> Hi, Timo,
>
> Thanks for your suggestion. Recently I have discussed with @Godfrey He,
> @Shengkai Fang
> and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We
> finally came to
> a consensus which is similar to your suggestion. The details are as
> follows:
>
> 1. Add a REST query parameter ‘*RowFormat*’ = JSON/PLAIN_TEXT to tell the
> REST Endpoint
> how to deserialize the RowData int ResultSet.
>
>     JSON format means the RowData will be serialized to JSON format, which
> contains original
> *    LogicalType* information, so it can be deserialized back to RowData.
>
>     PLAIN_TEXT format means the RowData will be serialized to
> SQL-compliant, plain strings.
>     The SQL Client can print the strings directly.
>
> The example URI for fetching results is:
> > /v2/sessions/:session_handle/operations/:operation_handle/result/:token?
> rowFormat=PLAIN_TEXT
>
> 2. Introduce two response bodies for fetching results in two formats.
>
> For more details, please take a look at the FLIP [
> https://cwiki.apache.org/confluence/x/T48ODg].
> I have updated it with an example of query response bodies in two format
> in section:
> Public Interface -> REST Endpoint Modification.
>
> 2022年12月12日 18:09,Timo Walther <tw...@apache.org> 写道:
>
> Hi everyone,
>
> sorry to jump into this discussion so late.
>
> > So we decided to revert the RowFormat related changes and let the client
> to resolve the print format.
>
> Could you elaborate a bit on this topic in the FLIP? I still believe that
> we need 2 types of output formats.
>
> Format A: for the SQL Client CLI and other interactive notebooks that just
> uses SQL CAST(... AS STRING) semantics executed on the server side
>
> Format B: for JDBC SDK or other machine-readable downstream libraries
>
> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string
> representation depends on a session configuration option. Clients might not
> be aware of this session option, so the formatting must happen on the
> server side.
>
> However, when the downstream consumer is a library, maybe the library
> would like to get the raw millis/nanos since epoch.
>
> Also nested rows and collections might be better encoded with format B for
> libraries but interactive sessions are happy if nested types are already
> formatted server-side, so not every client needs custom code for the
> formatting.
>
> Regards,
> Timo
>
>
>
> On 06.12.22 15:13, godfrey he wrote:
>
> Hi, zeklin
>
> The CLI will use default print style for the non-query result.
>
> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
> commands are clear.
>
> We think it’s better to add the root cause to the ErrorResponseBody.
>
> LGTM
> Best,
> Godfrey
> yu zelin <yu...@gmail.com> 于2022年12月6日周二 17:51写道:
>
>
> Hi, Godfrey
>
> Thanks for your feedback. Below is my thoughts about your questions.
>
> 1. About RowFormat.
> I agree to your opinion. So we decided to revert the RowFormat related
> changes
> and let the client to resolve the print format.
>
> 2. About ContentType
> I agree that the definition of the ContentType is not clear. But how to
> define the
> statement type is another big question. So, we decided to only tell the
> query result
> and non-query result apart. The CLI will use default print style for the
> non-query
> result.
>
> 3. About ErrorHandling
> I think reuse the current ErrorResponseBody is good, but parse the root
> cause
> from the exception stack strings is quite hacking. We think it’s better to
> add the
> root cause to the ErrorResponseBody.
>
> 4. About Runtime REST API Modifications
> I agree, too. This part is moved to the ‘Future Work’.
>
> Best,
> Yu Zelin
>
>
> 2022年12月5日 18:33,godfrey he <go...@gmail.com> 写道:
>
> Hi Zelin,
>
> Thanks for driving this discussion.
>
> I have a few comments,
>
> Add RowFormat to ResultSet to indicate the format of rows.
>
> We should not require SqlGateway server to meet the display
> requirements of a CliClient.
> Because different CliClients may have different display style. The
> server just need to response the data,
> and the CliClient prints the result as needed. So RowFormat is not needed.
>
> Add ContentType to ResultSet to indicate what kind of data the result
> contains.
>
> from my first sight, the values of ContentType are intersected, such
> as: A select query will return QUERY_RESULT,
> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> kind of query will return OTHER.
> I recommend returning the concrete type for each statement, such as
> "CREATE TABLE" for "create table xx (...) with ()",
> "SELECT" for "select * from xxx". The statement type can be maintained
> in `Operation`s.
>
> Error Handling
>
> I think current design of error handling mechanism can meet the
> requirement of CliClient, we can get the root cause from
> the stack (see ErrorResponseBody#errors). If it becomes a common
> requirement (for many clients) in the future,
> we can introduce this interface.
>
> Runtime REST API Modification for Local Client Migration
>
> I think this part is over-engineered, this part belongs to optimization.
> The client does not require very high performance, the current design
> can already meet our needs.
> If we find performance problems in the future, do such optimizations.
>
> Best,
> Godfrey
>
> yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
>
>
> Hi, Shammon
>
> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
> it's not supported in the gateway side yet. In my opinion, this FLIP is
> more
> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
> ‘Future Work’? We can discuss how to implement it in another thread.
>
> Best,
> Yu Zelin
>
> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
>
> Hi zelin
>
> Thanks for driving this discussion.
>
> I notice that the sql-client will interact with sql-gateway by `REST
> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> sql-gateway?
>
> Then the sql-client can connect the gateway with jdbc-sdk, on the other
> hand, the other applications and tools such as jmeter can use the jdbc-sdk
> to connect sql-gateway too.
>
> Best,
> Shammon
>
>
> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
>
> Hi Jim,
>
> Thanks for your feedback!
>
> Should this configuration be mentioned in the FLIP?
>
>
> Sure.
>
> some way for the server to be able to limit the number of requests it
>
> receives.
> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
> we
> didn't consider much about this. I think the option is enough currently.
> I will add
> the improvement suggestions to the ‘Future Work’.
>
> I wonder if two other options are possible
>
>
> To forward the raw format to gateway and then to client is possible. The
> raw
> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
> can find
> a way to get this result without wrapping it. Second, constructing a
> ‘InternalTypeInfo’.
> We can construct it using the schema information (data’s logical type).
> After
> construction, we can get the ’TypeSerializer’ to deserialize the raw
> result.
>
>
>
>
> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>
> Hi Yu,
>
> Thanks for moving my comments to this thread!  Also, thank you for
> answering my questions; it is helping me understand the SQL Gateway
> better.
>
> 5.
>
> Our idea is to introduce a new session option (like
>
> 'sql-client.result.fetch-interval') to control
> the fetching requests sending frequency. What do you think?
>
> Should this configuration be mentioned in the FLIP?
>
> One slight concern I have with having 'sql-client.result.fetch-interval'
>
> as
>
> a session configuration is that users could set it low and cause the
>
> client
>
> to send a large volume of requests to the SQL gateway.
>
> Generally, I'd like to see some way for the server to be able to limit
>
> the
>
> number of requests it receives.  If that really needs to be done by a
>
> proxy
>
> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> think my concern here should be blocking in any way.)
>
> 7.
>
> What is the serialization lifecycle for results?
>
>
> I wonder if two other options are possible:
> 3) Could the Gateway just forward the result byte array?  (Or does the
> Gateway need to deserialize the response in order to understand it for
>
> some
>
> reason?)
> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> the Client read the format which the JobManager sends?)
>
> Thanks again!
>
> Cheers,
>
> Jim
>
> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
>
> Hi, all
>
> Thanks Jim’s questions below. Here I’d like to reply to them.
>
> 1. For the Client Parser, is it going to work with the extended syntax
> from the Flink Table Store?
>
> 2. Relatedly, what will happen if an older Client tries to handle
>
> syntax
>
> that a newer service supports?  (Suppose I use a 1.17 client with a
>
> 1.18
>
> Gateway/system which has a new keyword.  Is there anything we should
>
> be
>
> designing for upfront?)
>
> 3. How will client and server version mismatches be handled?  Will a
> single gateway be able to support multiple endpoint versions?
> 4. How are commands which change a session handled?  Are those sent
>
> via
>
> an ExecuteStatementRequest?
>
> 5. The remote POC uses polling for getting back status and getting
>
> back
>
> results.  Would it be possible to switch to web sockets or some other
> mechanism to avoid polling?  If polling is used for both, the polling
> frequency should be different between local and remote configurations.
>
> 6. What does this sentence mean?  "The reason why we didn't get the
>
> sql
>
> type in client side is because it's hard for the lightweight
>
> client-level
>
> parser to recognize some sql type  sql, such as query with CTE.  "
>
> 7. What is the serialization lifecycle for results?  It makes sense to
> have some control over whether the gateway returns results as SQL or
>
> JSON.
>
> I'd love to see a way to avoid needing to serialize and deserialize
>
> results
>
> on the SQL Gateway if possible.  I'm still new enough to the project
>
> that
>
> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>
> return
>
> type can be sent as part of the request so that the JobManager can
>
> send
>
> back results in an advantageous format?
>
> 8. Does ErrorType need to be marked as @PublicEvolving?
>
> I'm excited for the SQL client to support gateway mode!  Given the
>
> change
>
> in design, do you think it'll still be part of the Flink 1.17 release?
>
>
> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>
> because
>
> if the
> sql type is not recognized, the sql will be submitted to the gateway
> directly.
>
> For more information: Actually, the proposed ClientParser only do two
> things:
> (1) Tell client commands (help, clear, etc) and sqls apart.
> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>
> raw
>
> string
> for the SHOW CREATE result instead of table). Here the recognization of
> sql types
> mostly affects the print style, and unrecognized sql also can be
>
> submitted
>
> to cluster.
> So the Client with new ClientParser can work compatible with new syntax.
>
> 2. First, I'd like to explain that the gateway APIs and supported syntax
> is two things.
> For example, ‘configureSession' and 'completeStatement' are APIs. As
> mentioned
> in #1, the sql statements which syntax is unknown will be submitted to
>
> the
>
> gateway,
> and whether they can be executed normally depends on whether the
>
> execution
>
> environment supports the syntax.
>
> Is there anything we should be designing for upfront?
>
>
> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
> gateway APIs.
>
> 3.
>
> How will client and server version mismatches be handled?
>
>
> A lower version client can work compatible with a higher version gateway
> because the
> old interfaces won’t be deleted. When a higher version client connects
>
> to
>
> a lower version
> gateway, the client should notify the users if they try to use
>
> unsupported
>
> features. For
> example, the client start option ‘-i’  means using initialization file
>
> to
>
> initialize the session.
> We plan to use the gateway’s ‘configureSession’ to implement it. But
>
> this
>
> API is not
> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
> user try to
> use ‘-i’ option to start the client with the 1.16 gateway, the client
> should tell the user that
> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>
> Will a single gateway be able to support multiple endpoint versions?
>
>
> Currently, the gateway only starts a highest version endpoint and the
> higher version endpoint
> is compatible with the lower version endpoint’s protocol.
>
> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>
> session
>
> configuration.
> Notice: the client can’t change the session (I mean, close current
>
> session
>
> and open another
> one). I’m not sure if you have need to change the session itself?
>
> 5.
>
> Would it be possible to switch to web sockets or some other mechanism
>
> to avoid polling?
>
> Your suggestion is very good, but this flip is for supporting the remote
> client. How about taking
> it as a future work?
>
> If polling is used for both, the polling frequency should be different
>
> between local and remote
> configurations.
>
> Our idea is to introduce a new session option (like
> 'sql-client.result.fetch-interval') to control
> the fetching requests sending frequency. What do you think?
>
> For more information: we are inclined to keep the polling behavior in
>
> this
>
> version. For streaming
> query, fetching results synchronously may occupy resources of the
>
> gateway
>
> in a long period.
> For example, if the job doesn’t return results for a long time because
>
> the
>
> window has not been
> triggered, the synchronously fetching will keep occupying the
>
> connection.
>
> In asynchronous
> situation, the gateway can return a NOT_READY_RESULT quickly and release
> the resources
> for other clients to use. I think we can make some improvements for the
> whole flow path in the
> future.
>
> 6. Sorry for that there is mistakes in this sentence. Let me make it
>
> clear.
>
>
> We proposed to add 'ContentType' to indicates the result is for what
>
> kind
>
> of sql. In this sentence,
> I want to explain why we add 'ContentType' since the ClientParser can
> recognize the sql type too.
> It is because the proposed ClientParser can't recognize complex syntax.
> For example, it can’t
> recognize query with CTE. So the result should carry content type
> information to help the client to
> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
> the result is for a
> query statement.
>
> 7.
>
> What is the serialization lifecycle for results?
>
>
> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
> format)
> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>
> Maybe the SQL Gateway's return type can be sent as part of the request
>
> so that the
> JobManager can send  back results in an advantageous format?
>
> Yes. I think it's an improvement for the Client and Gateway. We have
>
> some
>
> ideas. For example,
>
> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>
> costs
>
> from JM to Gateway.
> 2) Or the Gateway can collect the data from the sink function directly
> instead of JobManager.
>
> But I think we can leave this as a future work and discuss in another
> thread.
>
> 8. Yes.
>
> Do you think it'll still be part of the Flink 1.17 release?
>
> Yes. We will try our best to finish the work.
>
> Feel free to talk to me if I’m wrong or you have any other questions.
>
>
> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>
> Hi, all
>
> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>
> Client Based on SQL Gateway[1].
>
> The motivation of this FLIP is that the current SQL Client allows only
>
> local connection which can not satisfy
>
> the common need of connecting to a remote cluster.
>
> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>
> implement the Remote SQL Client
>
> based on SQL Gateway. In our design, we proposed two main changes:
>
> 1. New remote mode client which performs connection to the remote
>
> gateway through REST API.
>
> 2. Migration of the current local mode client. We proposed to refactor
>
> the local client based on SQL Gateway
>
> to unify the interface for two modes.
>
> Looking forward to your suggestions.
>
> Best,
> Yu Zelin
>
> [1] https://cwiki.apache.org/confluence/x/T48ODg
> [2] https://cwiki.apache.org/confluence/x/rIyMC
>
>
>
>
>
>
>
>
>
>
>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by yu zelin <yu...@gmail.com>.
Hi, everyone,

Looks like our new design is similar to Timo’s suggestion, and considering that there has
no response from other devs for a long time, I want to start the vote on Thursday.  


Best,
Yu Zelin

> 2022年12月13日 16:23,yu zelin <yu...@gmail.com> 写道:
> 
> Hi, Timo,
> 
> Thanks for your suggestion. Recently I have discussed with @Godfrey He, @Shengkai Fang 
> and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We finally came to 
> a consensus which is similar to your suggestion. The details are as follows:
> 
> 1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell the REST Endpoint
> how to deserialize the RowData int ResultSet.
> 
>     JSON format means the RowData will be serialized to JSON format, which contains original 
>     LogicalType information, so it can be deserialized back to RowData.
> 
>     PLAIN_TEXT format means the RowData will be serialized to SQL-compliant, plain strings. 
>     The SQL Client can print the strings directly.
> 
> The example URI for fetching results is:
> > /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT
> 
> 2. Introduce two response bodies for fetching results in two formats.
> 
> For more details, please take a look at the FLIP [https://cwiki.apache.org/confluence/x/T48ODg]. 
> I have updated it with an example of query response bodies in two format in section:
> Public Interface -> REST Endpoint Modification.
> 
>> 2022年12月12日 18:09,Timo Walther <tw...@apache.org> 写道:
>> 
>> Hi everyone,
>> 
>> sorry to jump into this discussion so late.
>> 
>> > So we decided to revert the RowFormat related changes and let the client to resolve the print format.
>> 
>> Could you elaborate a bit on this topic in the FLIP? I still believe that we need 2 types of output formats.
>> 
>> Format A: for the SQL Client CLI and other interactive notebooks that just uses SQL CAST(... AS STRING) semantics executed on the server side
>> 
>> Format B: for JDBC SDK or other machine-readable downstream libraries
>> 
>> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string representation depends on a session configuration option. Clients might not be aware of this session option, so the formatting must happen on the server side.
>> 
>> However, when the downstream consumer is a library, maybe the library would like to get the raw millis/nanos since epoch.
>> 
>> Also nested rows and collections might be better encoded with format B for libraries but interactive sessions are happy if nested types are already formatted server-side, so not every client needs custom code for the formatting.
>> 
>> Regards,
>> Timo
>> 
>> 
>> 
>> On 06.12.22 15:13, godfrey he wrote:
>>> Hi, zeklin
>>>> The CLI will use default print style for the non-query result.
>>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
>>> commands are clear.
>>>> We think it’s better to add the root cause to the ErrorResponseBody.
>>> LGTM
>>> Best,
>>> Godfrey
>>> yu zelin <yu...@gmail.com> 于2022年12月6日周二 17:51写道:
>>>> 
>>>> Hi, Godfrey
>>>> 
>>>> Thanks for your feedback. Below is my thoughts about your questions.
>>>> 
>>>> 1. About RowFormat.
>>>> I agree to your opinion. So we decided to revert the RowFormat related changes
>>>> and let the client to resolve the print format.
>>>> 
>>>> 2. About ContentType
>>>> I agree that the definition of the ContentType is not clear. But how to define the
>>>> statement type is another big question. So, we decided to only tell the query result
>>>> and non-query result apart. The CLI will use default print style for the non-query
>>>> result.
>>>> 
>>>> 3. About ErrorHandling
>>>> I think reuse the current ErrorResponseBody is good, but parse the root cause
>>>> from the exception stack strings is quite hacking. We think it’s better to add the
>>>> root cause to the ErrorResponseBody.
>>>> 
>>>> 4. About Runtime REST API Modifications
>>>> I agree, too. This part is moved to the ‘Future Work’.
>>>> 
>>>> Best,
>>>> Yu Zelin
>>>> 
>>>> 
>>>>> 2022年12月5日 18:33,godfrey he <go...@gmail.com> 写道:
>>>>> 
>>>>> Hi Zelin,
>>>>> 
>>>>> Thanks for driving this discussion.
>>>>> 
>>>>> I have a few comments,
>>>>> 
>>>>>> Add RowFormat to ResultSet to indicate the format of rows.
>>>>> We should not require SqlGateway server to meet the display
>>>>> requirements of a CliClient.
>>>>> Because different CliClients may have different display style. The
>>>>> server just need to response the data,
>>>>> and the CliClient prints the result as needed. So RowFormat is not needed.
>>>>> 
>>>>>> Add ContentType to ResultSet to indicate what kind of data the result contains.
>>>>> from my first sight, the values of ContentType are intersected, such
>>>>> as: A select query will return QUERY_RESULT,
>>>>> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
>>>>> kind of query will return OTHER.
>>>>> I recommend returning the concrete type for each statement, such as
>>>>> "CREATE TABLE" for "create table xx (...) with ()",
>>>>> "SELECT" for "select * from xxx". The statement type can be maintained
>>>>> in `Operation`s.
>>>>> 
>>>>>> Error Handling
>>>>> I think current design of error handling mechanism can meet the
>>>>> requirement of CliClient, we can get the root cause from
>>>>> the stack (see ErrorResponseBody#errors). If it becomes a common
>>>>> requirement (for many clients) in the future,
>>>>> we can introduce this interface.
>>>>> 
>>>>>> Runtime REST API Modification for Local Client Migration
>>>>> I think this part is over-engineered, this part belongs to optimization.
>>>>> The client does not require very high performance, the current design
>>>>> can already meet our needs.
>>>>> If we find performance problems in the future, do such optimizations.
>>>>> 
>>>>> Best,
>>>>> Godfrey
>>>>> 
>>>>> yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
>>>>>> 
>>>>>> Hi, Shammon
>>>>>> 
>>>>>> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
>>>>>> it's not supported in the gateway side yet. In my opinion, this FLIP is more
>>>>>> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
>>>>>> ‘Future Work’? We can discuss how to implement it in another thread.
>>>>>> 
>>>>>> Best,
>>>>>> Yu Zelin
>>>>>>> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
>>>>>>> 
>>>>>>> Hi zelin
>>>>>>> 
>>>>>>> Thanks for driving this discussion.
>>>>>>> 
>>>>>>> I notice that the sql-client will interact with sql-gateway by `REST
>>>>>>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
>>>>>>> sql-gateway?
>>>>>>> 
>>>>>>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
>>>>>>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
>>>>>>> to connect sql-gateway too.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Shammon
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Jim,
>>>>>>>> 
>>>>>>>> Thanks for your feedback!
>>>>>>>> 
>>>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>>> 
>>>>>>>> Sure.
>>>>>>>> 
>>>>>>>>> some way for the server to be able to limit the number of requests it
>>>>>>>> receives.
>>>>>>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>>>>>>>> we
>>>>>>>> didn't consider much about this. I think the option is enough currently.
>>>>>>>> I will add
>>>>>>>> the improvement suggestions to the ‘Future Work’.
>>>>>>>> 
>>>>>>>>> I wonder if two other options are possible
>>>>>>>> 
>>>>>>>> To forward the raw format to gateway and then to client is possible. The
>>>>>>>> raw
>>>>>>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>>>>>>>> can find
>>>>>>>> a way to get this result without wrapping it. Second, constructing a
>>>>>>>> ‘InternalTypeInfo’.
>>>>>>>> We can construct it using the schema information (data’s logical type).
>>>>>>>> After
>>>>>>>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>>>>>>>> result.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>>>>>>>>> 
>>>>>>>>> Hi Yu,
>>>>>>>>> 
>>>>>>>>> Thanks for moving my comments to this thread!  Also, thank you for
>>>>>>>>> answering my questions; it is helping me understand the SQL Gateway
>>>>>>>>> better.
>>>>>>>>> 
>>>>>>>>> 5.
>>>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>>> 
>>>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>>>> 
>>>>>>>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>>>>>>>> as
>>>>>>>>> a session configuration is that users could set it low and cause the
>>>>>>>> client
>>>>>>>>> to send a large volume of requests to the SQL gateway.
>>>>>>>>> 
>>>>>>>>> Generally, I'd like to see some way for the server to be able to limit
>>>>>>>> the
>>>>>>>>> number of requests it receives.  If that really needs to be done by a
>>>>>>>> proxy
>>>>>>>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>>>>>>>> think my concern here should be blocking in any way.)
>>>>>>>>> 
>>>>>>>>> 7.
>>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>>> 
>>>>>>>>> I wonder if two other options are possible:
>>>>>>>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>>>>>>>> Gateway need to deserialize the response in order to understand it for
>>>>>>>> some
>>>>>>>>> reason?)
>>>>>>>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>>>>>>>> the Client read the format which the JobManager sends?)
>>>>>>>>> 
>>>>>>>>> Thanks again!
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Jim
>>>>>>>>> 
>>>>>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi, all
>>>>>>>>>> 
>>>>>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>>>>>>>> 
>>>>>>>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>>>>>>>> from the Flink Table Store?
>>>>>>>>>>> 
>>>>>>>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>>>>>>>> syntax
>>>>>>>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>>>>>>>> 1.18
>>>>>>>>>>> Gateway/system which has a new keyword.  Is there anything we should
>>>>>>>> be
>>>>>>>>>>> designing for upfront?)
>>>>>>>>>>> 
>>>>>>>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>>>>>>>> single gateway be able to support multiple endpoint versions?
>>>>>>>>>>> 4. How are commands which change a session handled?  Are those sent
>>>>>>>> via
>>>>>>>>>>> an ExecuteStatementRequest?
>>>>>>>>>>> 
>>>>>>>>>>> 5. The remote POC uses polling for getting back status and getting
>>>>>>>> back
>>>>>>>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>>>>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>>>>>>>> frequency should be different between local and remote configurations.
>>>>>>>>>>> 
>>>>>>>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>>>>>>>> sql
>>>>>>>>>>> type in client side is because it's hard for the lightweight
>>>>>>>>>> client-level
>>>>>>>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>>>>>>>> 
>>>>>>>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>>>>>>>> have some control over whether the gateway returns results as SQL or
>>>>>>>>>> JSON.
>>>>>>>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>>>>>>>> results
>>>>>>>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>>>>>>>> that
>>>>>>>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>>>>>>>> return
>>>>>>>>>>> type can be sent as part of the request so that the JobManager can
>>>>>>>> send
>>>>>>>>>>> back results in an advantageous format?
>>>>>>>>>>> 
>>>>>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>>>>>>>> 
>>>>>>>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>>>>>>>> change
>>>>>>>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>>> 
>>>>>>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>>>>>>>> because
>>>>>>>>>> if the
>>>>>>>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>>>>>>>> directly.
>>>>>>>>>> 
>>>>>>>>>> For more information: Actually, the proposed ClientParser only do two
>>>>>>>>>> things:
>>>>>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>>>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>>>>>>>> raw
>>>>>>>>>> string
>>>>>>>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>>>>>>>> sql types
>>>>>>>>>> mostly affects the print style, and unrecognized sql also can be
>>>>>>>> submitted
>>>>>>>>>> to cluster.
>>>>>>>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>>>>>>>> 
>>>>>>>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>>>>>>>> is two things.
>>>>>>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>>>>>>>> mentioned
>>>>>>>>>> in #1, the sql statements which syntax is unknown will be submitted to
>>>>>>>> the
>>>>>>>>>> gateway,
>>>>>>>>>> and whether they can be executed normally depends on whether the
>>>>>>>> execution
>>>>>>>>>> environment supports the syntax.
>>>>>>>>>> 
>>>>>>>>>>> Is there anything we should be designing for upfront?
>>>>>>>>>> 
>>>>>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>>>>>>>> gateway APIs.
>>>>>>>>>> 
>>>>>>>>>> 3.
>>>>>>>>>>> How will client and server version mismatches be handled?
>>>>>>>>>> 
>>>>>>>>>> A lower version client can work compatible with a higher version gateway
>>>>>>>>>> because the
>>>>>>>>>> old interfaces won’t be deleted. When a higher version client connects
>>>>>>>> to
>>>>>>>>>> a lower version
>>>>>>>>>> gateway, the client should notify the users if they try to use
>>>>>>>> unsupported
>>>>>>>>>> features. For
>>>>>>>>>> example, the client start option ‘-i’  means using initialization file
>>>>>>>> to
>>>>>>>>>> initialize the session.
>>>>>>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>>>>>>>> this
>>>>>>>>>> API is not
>>>>>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>>>>>>>> user try to
>>>>>>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>>>>>>>> should tell the user that
>>>>>>>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>>>>>>>> 
>>>>>>>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>>>>>>>> 
>>>>>>>>>> Currently, the gateway only starts a highest version endpoint and the
>>>>>>>>>> higher version endpoint
>>>>>>>>>> is compatible with the lower version endpoint’s protocol.
>>>>>>>>>> 
>>>>>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>>>>>>>> session
>>>>>>>>>> configuration.
>>>>>>>>>> Notice: the client can’t change the session (I mean, close current
>>>>>>>> session
>>>>>>>>>> and open another
>>>>>>>>>> one). I’m not sure if you have need to change the session itself?
>>>>>>>>>> 
>>>>>>>>>> 5.
>>>>>>>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>>>>>>>> to avoid polling?
>>>>>>>>>> 
>>>>>>>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>>>>>>>> client. How about taking
>>>>>>>>>> it as a future work?
>>>>>>>>>> 
>>>>>>>>>>> If polling is used for both, the polling frequency should be different
>>>>>>>>>> between local and remote
>>>>>>>>>> configurations.
>>>>>>>>>> 
>>>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>>>> 
>>>>>>>>>> For more information: we are inclined to keep the polling behavior in
>>>>>>>> this
>>>>>>>>>> version. For streaming
>>>>>>>>>> query, fetching results synchronously may occupy resources of the
>>>>>>>> gateway
>>>>>>>>>> in a long period.
>>>>>>>>>> For example, if the job doesn’t return results for a long time because
>>>>>>>> the
>>>>>>>>>> window has not been
>>>>>>>>>> triggered, the synchronously fetching will keep occupying the
>>>>>>>> connection.
>>>>>>>>>> In asynchronous
>>>>>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>>>>>>>> the resources
>>>>>>>>>> for other clients to use. I think we can make some improvements for the
>>>>>>>>>> whole flow path in the
>>>>>>>>>> future.
>>>>>>>>>> 
>>>>>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>>>>>>>> clear.
>>>>>>>>>> 
>>>>>>>>>> We proposed to add 'ContentType' to indicates the result is for what
>>>>>>>> kind
>>>>>>>>>> of sql. In this sentence,
>>>>>>>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>>>>>>>> recognize the sql type too.
>>>>>>>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>>>>>>>> For example, it can’t
>>>>>>>>>> recognize query with CTE. So the result should carry content type
>>>>>>>>>> information to help the client to
>>>>>>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>>>>>>>> the result is for a
>>>>>>>>>> query statement.
>>>>>>>>>> 
>>>>>>>>>> 7.
>>>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>>>> 
>>>>>>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>>>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>>>>>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>>>>>>>> format)
>>>>>>>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>>>>>>>> 
>>>>>>>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>>>>>>>> so that the
>>>>>>>>>> JobManager can send  back results in an advantageous format?
>>>>>>>>>> 
>>>>>>>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>>>>>>>> some
>>>>>>>>>> ideas. For example,
>>>>>>>>>> 
>>>>>>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>>>>>>>> costs
>>>>>>>>>> from JM to Gateway.
>>>>>>>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>>>>>>>> instead of JobManager.
>>>>>>>>>> 
>>>>>>>>>> But I think we can leave this as a future work and discuss in another
>>>>>>>>>> thread.
>>>>>>>>>> 
>>>>>>>>>> 8. Yes.
>>>>>>>>>> 
>>>>>>>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>>> Yes. We will try our best to finish the work.
>>>>>>>>>> 
>>>>>>>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Hi, all
>>>>>>>>>>> 
>>>>>>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>>>>>>>> Client Based on SQL Gateway[1].
>>>>>>>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>>>>>>>> local connection which can not satisfy
>>>>>>>>>>> the common need of connecting to a remote cluster.
>>>>>>>>>>> 
>>>>>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>>>>>>>> implement the Remote SQL Client
>>>>>>>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>>>>>>>> 
>>>>>>>>>>> 1. New remote mode client which performs connection to the remote
>>>>>>>>>> gateway through REST API.
>>>>>>>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>>>>>>>> the local client based on SQL Gateway
>>>>>>>>>>> to unify the interface for two modes.
>>>>>>>>>>> 
>>>>>>>>>>> Looking forward to your suggestions.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Yu Zelin
>>>>>>>>>>> 
>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> 
> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by yu zelin <yu...@gmail.com>.
Hi, Timo,

Thanks for your suggestion. Recently I have discussed with @Godfrey He, @Shengkai Fang 
and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We finally came to 
a consensus which is similar to your suggestion. The details are as follows:

1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell the REST Endpoint
how to deserialize the RowData int ResultSet.

    JSON format means the RowData will be serialized to JSON format, which contains original 
    LogicalType information, so it can be deserialized back to RowData.

    PLAIN_TEXT format means the RowData will be serialized to SQL-compliant, plain strings. 
    The SQL Client can print the strings directly.

The example URI for fetching results is:
> /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT

2. Introduce two response bodies for fetching results in two formats.

For more details, please take a look at the FLIP [https://cwiki.apache.org/confluence/x/T48ODg]. 
I have updated it with an example of query response bodies in two format in section:
Public Interface -> REST Endpoint Modification.

> 2022年12月12日 18:09,Timo Walther <tw...@apache.org> 写道:
> 
> Hi everyone,
> 
> sorry to jump into this discussion so late.
> 
> > So we decided to revert the RowFormat related changes and let the client to resolve the print format.
> 
> Could you elaborate a bit on this topic in the FLIP? I still believe that we need 2 types of output formats.
> 
> Format A: for the SQL Client CLI and other interactive notebooks that just uses SQL CAST(... AS STRING) semantics executed on the server side
> 
> Format B: for JDBC SDK or other machine-readable downstream libraries
> 
> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string representation depends on a session configuration option. Clients might not be aware of this session option, so the formatting must happen on the server side.
> 
> However, when the downstream consumer is a library, maybe the library would like to get the raw millis/nanos since epoch.
> 
> Also nested rows and collections might be better encoded with format B for libraries but interactive sessions are happy if nested types are already formatted server-side, so not every client needs custom code for the formatting.
> 
> Regards,
> Timo
> 
> 
> 
> On 06.12.22 15:13, godfrey he wrote:
>> Hi, zeklin
>>> The CLI will use default print style for the non-query result.
>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
>> commands are clear.
>>> We think it’s better to add the root cause to the ErrorResponseBody.
>> LGTM
>> Best,
>> Godfrey
>> yu zelin <yu...@gmail.com> 于2022年12月6日周二 17:51写道:
>>> 
>>> Hi, Godfrey
>>> 
>>> Thanks for your feedback. Below is my thoughts about your questions.
>>> 
>>> 1. About RowFormat.
>>> I agree to your opinion. So we decided to revert the RowFormat related changes
>>> and let the client to resolve the print format.
>>> 
>>> 2. About ContentType
>>> I agree that the definition of the ContentType is not clear. But how to define the
>>> statement type is another big question. So, we decided to only tell the query result
>>> and non-query result apart. The CLI will use default print style for the non-query
>>> result.
>>> 
>>> 3. About ErrorHandling
>>> I think reuse the current ErrorResponseBody is good, but parse the root cause
>>> from the exception stack strings is quite hacking. We think it’s better to add the
>>> root cause to the ErrorResponseBody.
>>> 
>>> 4. About Runtime REST API Modifications
>>> I agree, too. This part is moved to the ‘Future Work’.
>>> 
>>> Best,
>>> Yu Zelin
>>> 
>>> 
>>>> 2022年12月5日 18:33,godfrey he <go...@gmail.com> 写道:
>>>> 
>>>> Hi Zelin,
>>>> 
>>>> Thanks for driving this discussion.
>>>> 
>>>> I have a few comments,
>>>> 
>>>>> Add RowFormat to ResultSet to indicate the format of rows.
>>>> We should not require SqlGateway server to meet the display
>>>> requirements of a CliClient.
>>>> Because different CliClients may have different display style. The
>>>> server just need to response the data,
>>>> and the CliClient prints the result as needed. So RowFormat is not needed.
>>>> 
>>>>> Add ContentType to ResultSet to indicate what kind of data the result contains.
>>>> from my first sight, the values of ContentType are intersected, such
>>>> as: A select query will return QUERY_RESULT,
>>>> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
>>>> kind of query will return OTHER.
>>>> I recommend returning the concrete type for each statement, such as
>>>> "CREATE TABLE" for "create table xx (...) with ()",
>>>> "SELECT" for "select * from xxx". The statement type can be maintained
>>>> in `Operation`s.
>>>> 
>>>>> Error Handling
>>>> I think current design of error handling mechanism can meet the
>>>> requirement of CliClient, we can get the root cause from
>>>> the stack (see ErrorResponseBody#errors). If it becomes a common
>>>> requirement (for many clients) in the future,
>>>> we can introduce this interface.
>>>> 
>>>>> Runtime REST API Modification for Local Client Migration
>>>> I think this part is over-engineered, this part belongs to optimization.
>>>> The client does not require very high performance, the current design
>>>> can already meet our needs.
>>>> If we find performance problems in the future, do such optimizations.
>>>> 
>>>> Best,
>>>> Godfrey
>>>> 
>>>> yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
>>>>> 
>>>>> Hi, Shammon
>>>>> 
>>>>> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
>>>>> it's not supported in the gateway side yet. In my opinion, this FLIP is more
>>>>> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
>>>>> ‘Future Work’? We can discuss how to implement it in another thread.
>>>>> 
>>>>> Best,
>>>>> Yu Zelin
>>>>>> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
>>>>>> 
>>>>>> Hi zelin
>>>>>> 
>>>>>> Thanks for driving this discussion.
>>>>>> 
>>>>>> I notice that the sql-client will interact with sql-gateway by `REST
>>>>>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
>>>>>> sql-gateway?
>>>>>> 
>>>>>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
>>>>>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
>>>>>> to connect sql-gateway too.
>>>>>> 
>>>>>> Best,
>>>>>> Shammon
>>>>>> 
>>>>>> 
>>>>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Jim,
>>>>>>> 
>>>>>>> Thanks for your feedback!
>>>>>>> 
>>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>> 
>>>>>>> Sure.
>>>>>>> 
>>>>>>>> some way for the server to be able to limit the number of requests it
>>>>>>> receives.
>>>>>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>>>>>>> we
>>>>>>> didn't consider much about this. I think the option is enough currently.
>>>>>>> I will add
>>>>>>> the improvement suggestions to the ‘Future Work’.
>>>>>>> 
>>>>>>>> I wonder if two other options are possible
>>>>>>> 
>>>>>>> To forward the raw format to gateway and then to client is possible. The
>>>>>>> raw
>>>>>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>>>>>>> can find
>>>>>>> a way to get this result without wrapping it. Second, constructing a
>>>>>>> ‘InternalTypeInfo’.
>>>>>>> We can construct it using the schema information (data’s logical type).
>>>>>>> After
>>>>>>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>>>>>>> result.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>>>>>>>> 
>>>>>>>> Hi Yu,
>>>>>>>> 
>>>>>>>> Thanks for moving my comments to this thread!  Also, thank you for
>>>>>>>> answering my questions; it is helping me understand the SQL Gateway
>>>>>>>> better.
>>>>>>>> 
>>>>>>>> 5.
>>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>> 
>>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>>> 
>>>>>>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>>>>>>> as
>>>>>>>> a session configuration is that users could set it low and cause the
>>>>>>> client
>>>>>>>> to send a large volume of requests to the SQL gateway.
>>>>>>>> 
>>>>>>>> Generally, I'd like to see some way for the server to be able to limit
>>>>>>> the
>>>>>>>> number of requests it receives.  If that really needs to be done by a
>>>>>>> proxy
>>>>>>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>>>>>>> think my concern here should be blocking in any way.)
>>>>>>>> 
>>>>>>>> 7.
>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>> 
>>>>>>>> I wonder if two other options are possible:
>>>>>>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>>>>>>> Gateway need to deserialize the response in order to understand it for
>>>>>>> some
>>>>>>>> reason?)
>>>>>>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>>>>>>> the Client read the format which the JobManager sends?)
>>>>>>>> 
>>>>>>>> Thanks again!
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> Jim
>>>>>>>> 
>>>>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi, all
>>>>>>>>> 
>>>>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>>>>>>> 
>>>>>>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>>>>>>> from the Flink Table Store?
>>>>>>>>>> 
>>>>>>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>>>>>>> syntax
>>>>>>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>>>>>>> 1.18
>>>>>>>>>> Gateway/system which has a new keyword.  Is there anything we should
>>>>>>> be
>>>>>>>>>> designing for upfront?)
>>>>>>>>>> 
>>>>>>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>>>>>>> single gateway be able to support multiple endpoint versions?
>>>>>>>>>> 4. How are commands which change a session handled?  Are those sent
>>>>>>> via
>>>>>>>>>> an ExecuteStatementRequest?
>>>>>>>>>> 
>>>>>>>>>> 5. The remote POC uses polling for getting back status and getting
>>>>>>> back
>>>>>>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>>>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>>>>>>> frequency should be different between local and remote configurations.
>>>>>>>>>> 
>>>>>>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>>>>>>> sql
>>>>>>>>>> type in client side is because it's hard for the lightweight
>>>>>>>>> client-level
>>>>>>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>>>>>>> 
>>>>>>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>>>>>>> have some control over whether the gateway returns results as SQL or
>>>>>>>>> JSON.
>>>>>>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>>>>>>> results
>>>>>>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>>>>>>> that
>>>>>>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>>>>>>> return
>>>>>>>>>> type can be sent as part of the request so that the JobManager can
>>>>>>> send
>>>>>>>>>> back results in an advantageous format?
>>>>>>>>>> 
>>>>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>>>>>>> 
>>>>>>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>>>>>>> change
>>>>>>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>> 
>>>>>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>>>>>>> because
>>>>>>>>> if the
>>>>>>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>>>>>>> directly.
>>>>>>>>> 
>>>>>>>>> For more information: Actually, the proposed ClientParser only do two
>>>>>>>>> things:
>>>>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>>>>>>> raw
>>>>>>>>> string
>>>>>>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>>>>>>> sql types
>>>>>>>>> mostly affects the print style, and unrecognized sql also can be
>>>>>>> submitted
>>>>>>>>> to cluster.
>>>>>>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>>>>>>> 
>>>>>>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>>>>>>> is two things.
>>>>>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>>>>>>> mentioned
>>>>>>>>> in #1, the sql statements which syntax is unknown will be submitted to
>>>>>>> the
>>>>>>>>> gateway,
>>>>>>>>> and whether they can be executed normally depends on whether the
>>>>>>> execution
>>>>>>>>> environment supports the syntax.
>>>>>>>>> 
>>>>>>>>>> Is there anything we should be designing for upfront?
>>>>>>>>> 
>>>>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>>>>>>> gateway APIs.
>>>>>>>>> 
>>>>>>>>> 3.
>>>>>>>>>> How will client and server version mismatches be handled?
>>>>>>>>> 
>>>>>>>>> A lower version client can work compatible with a higher version gateway
>>>>>>>>> because the
>>>>>>>>> old interfaces won’t be deleted. When a higher version client connects
>>>>>>> to
>>>>>>>>> a lower version
>>>>>>>>> gateway, the client should notify the users if they try to use
>>>>>>> unsupported
>>>>>>>>> features. For
>>>>>>>>> example, the client start option ‘-i’  means using initialization file
>>>>>>> to
>>>>>>>>> initialize the session.
>>>>>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>>>>>>> this
>>>>>>>>> API is not
>>>>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>>>>>>> user try to
>>>>>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>>>>>>> should tell the user that
>>>>>>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>>>>>>> 
>>>>>>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>>>>>>> 
>>>>>>>>> Currently, the gateway only starts a highest version endpoint and the
>>>>>>>>> higher version endpoint
>>>>>>>>> is compatible with the lower version endpoint’s protocol.
>>>>>>>>> 
>>>>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>>>>>>> session
>>>>>>>>> configuration.
>>>>>>>>> Notice: the client can’t change the session (I mean, close current
>>>>>>> session
>>>>>>>>> and open another
>>>>>>>>> one). I’m not sure if you have need to change the session itself?
>>>>>>>>> 
>>>>>>>>> 5.
>>>>>>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>>>>>>> to avoid polling?
>>>>>>>>> 
>>>>>>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>>>>>>> client. How about taking
>>>>>>>>> it as a future work?
>>>>>>>>> 
>>>>>>>>>> If polling is used for both, the polling frequency should be different
>>>>>>>>> between local and remote
>>>>>>>>> configurations.
>>>>>>>>> 
>>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>>> 
>>>>>>>>> For more information: we are inclined to keep the polling behavior in
>>>>>>> this
>>>>>>>>> version. For streaming
>>>>>>>>> query, fetching results synchronously may occupy resources of the
>>>>>>> gateway
>>>>>>>>> in a long period.
>>>>>>>>> For example, if the job doesn’t return results for a long time because
>>>>>>> the
>>>>>>>>> window has not been
>>>>>>>>> triggered, the synchronously fetching will keep occupying the
>>>>>>> connection.
>>>>>>>>> In asynchronous
>>>>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>>>>>>> the resources
>>>>>>>>> for other clients to use. I think we can make some improvements for the
>>>>>>>>> whole flow path in the
>>>>>>>>> future.
>>>>>>>>> 
>>>>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>>>>>>> clear.
>>>>>>>>> 
>>>>>>>>> We proposed to add 'ContentType' to indicates the result is for what
>>>>>>> kind
>>>>>>>>> of sql. In this sentence,
>>>>>>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>>>>>>> recognize the sql type too.
>>>>>>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>>>>>>> For example, it can’t
>>>>>>>>> recognize query with CTE. So the result should carry content type
>>>>>>>>> information to help the client to
>>>>>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>>>>>>> the result is for a
>>>>>>>>> query statement.
>>>>>>>>> 
>>>>>>>>> 7.
>>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>>> 
>>>>>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>>>>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>>>>>>> format)
>>>>>>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>>>>>>> 
>>>>>>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>>>>>>> so that the
>>>>>>>>> JobManager can send  back results in an advantageous format?
>>>>>>>>> 
>>>>>>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>>>>>>> some
>>>>>>>>> ideas. For example,
>>>>>>>>> 
>>>>>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>>>>>>> costs
>>>>>>>>> from JM to Gateway.
>>>>>>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>>>>>>> instead of JobManager.
>>>>>>>>> 
>>>>>>>>> But I think we can leave this as a future work and discuss in another
>>>>>>>>> thread.
>>>>>>>>> 
>>>>>>>>> 8. Yes.
>>>>>>>>> 
>>>>>>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>> Yes. We will try our best to finish the work.
>>>>>>>>> 
>>>>>>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>>>>>>>>>> 
>>>>>>>>>> Hi, all
>>>>>>>>>> 
>>>>>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>>>>>>> Client Based on SQL Gateway[1].
>>>>>>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>>>>>>> local connection which can not satisfy
>>>>>>>>>> the common need of connecting to a remote cluster.
>>>>>>>>>> 
>>>>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>>>>>>> implement the Remote SQL Client
>>>>>>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>>>>>>> 
>>>>>>>>>> 1. New remote mode client which performs connection to the remote
>>>>>>>>> gateway through REST API.
>>>>>>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>>>>>>> the local client based on SQL Gateway
>>>>>>>>>> to unify the interface for two modes.
>>>>>>>>>> 
>>>>>>>>>> Looking forward to your suggestions.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Yu Zelin
>>>>>>>>>> 
>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by Timo Walther <tw...@apache.org>.
Hi everyone,

sorry to jump into this discussion so late.

 > So we decided to revert the RowFormat related changes and let the 
client to resolve the print format.

Could you elaborate a bit on this topic in the FLIP? I still believe 
that we need 2 types of output formats.

Format A: for the SQL Client CLI and other interactive notebooks that 
just uses SQL CAST(... AS STRING) semantics executed on the server side

Format B: for JDBC SDK or other machine-readable downstream libraries

Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string 
representation depends on a session configuration option. Clients might 
not be aware of this session option, so the formatting must happen on 
the server side.

However, when the downstream consumer is a library, maybe the library 
would like to get the raw millis/nanos since epoch.

Also nested rows and collections might be better encoded with format B 
for libraries but interactive sessions are happy if nested types are 
already formatted server-side, so not every client needs custom code for 
the formatting.

Regards,
Timo



On 06.12.22 15:13, godfrey he wrote:
> Hi, zeklin
> 
>> The CLI will use default print style for the non-query result.
> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
> commands are clear.
> 
>> We think it’s better to add the root cause to the ErrorResponseBody.
> LGTM
> 
> Best,
> Godfrey
> 
> yu zelin <yu...@gmail.com> 于2022年12月6日周二 17:51写道:
>>
>> Hi, Godfrey
>>
>> Thanks for your feedback. Below is my thoughts about your questions.
>>
>> 1. About RowFormat.
>> I agree to your opinion. So we decided to revert the RowFormat related changes
>> and let the client to resolve the print format.
>>
>> 2. About ContentType
>> I agree that the definition of the ContentType is not clear. But how to define the
>> statement type is another big question. So, we decided to only tell the query result
>> and non-query result apart. The CLI will use default print style for the non-query
>> result.
>>
>> 3. About ErrorHandling
>> I think reuse the current ErrorResponseBody is good, but parse the root cause
>> from the exception stack strings is quite hacking. We think it’s better to add the
>> root cause to the ErrorResponseBody.
>>
>> 4. About Runtime REST API Modifications
>> I agree, too. This part is moved to the ‘Future Work’.
>>
>> Best,
>> Yu Zelin
>>
>>
>>> 2022年12月5日 18:33,godfrey he <go...@gmail.com> 写道:
>>>
>>> Hi Zelin,
>>>
>>> Thanks for driving this discussion.
>>>
>>> I have a few comments,
>>>
>>>> Add RowFormat to ResultSet to indicate the format of rows.
>>> We should not require SqlGateway server to meet the display
>>> requirements of a CliClient.
>>> Because different CliClients may have different display style. The
>>> server just need to response the data,
>>> and the CliClient prints the result as needed. So RowFormat is not needed.
>>>
>>>> Add ContentType to ResultSet to indicate what kind of data the result contains.
>>> from my first sight, the values of ContentType are intersected, such
>>> as: A select query will return QUERY_RESULT,
>>> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
>>> kind of query will return OTHER.
>>> I recommend returning the concrete type for each statement, such as
>>> "CREATE TABLE" for "create table xx (...) with ()",
>>> "SELECT" for "select * from xxx". The statement type can be maintained
>>> in `Operation`s.
>>>
>>>> Error Handling
>>> I think current design of error handling mechanism can meet the
>>> requirement of CliClient, we can get the root cause from
>>> the stack (see ErrorResponseBody#errors). If it becomes a common
>>> requirement (for many clients) in the future,
>>> we can introduce this interface.
>>>
>>>> Runtime REST API Modification for Local Client Migration
>>> I think this part is over-engineered, this part belongs to optimization.
>>> The client does not require very high performance, the current design
>>> can already meet our needs.
>>> If we find performance problems in the future, do such optimizations.
>>>
>>> Best,
>>> Godfrey
>>>
>>> yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
>>>>
>>>> Hi, Shammon
>>>>
>>>> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
>>>> it's not supported in the gateway side yet. In my opinion, this FLIP is more
>>>> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
>>>> ‘Future Work’? We can discuss how to implement it in another thread.
>>>>
>>>> Best,
>>>> Yu Zelin
>>>>> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
>>>>>
>>>>> Hi zelin
>>>>>
>>>>> Thanks for driving this discussion.
>>>>>
>>>>> I notice that the sql-client will interact with sql-gateway by `REST
>>>>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
>>>>> sql-gateway?
>>>>>
>>>>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
>>>>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
>>>>> to connect sql-gateway too.
>>>>>
>>>>> Best,
>>>>> Shammon
>>>>>
>>>>>
>>>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
>>>>>
>>>>>> Hi Jim,
>>>>>>
>>>>>> Thanks for your feedback!
>>>>>>
>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>
>>>>>> Sure.
>>>>>>
>>>>>>> some way for the server to be able to limit the number of requests it
>>>>>> receives.
>>>>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>>>>>> we
>>>>>> didn't consider much about this. I think the option is enough currently.
>>>>>> I will add
>>>>>> the improvement suggestions to the ‘Future Work’.
>>>>>>
>>>>>>> I wonder if two other options are possible
>>>>>>
>>>>>> To forward the raw format to gateway and then to client is possible. The
>>>>>> raw
>>>>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>>>>>> can find
>>>>>> a way to get this result without wrapping it. Second, constructing a
>>>>>> ‘InternalTypeInfo’.
>>>>>> We can construct it using the schema information (data’s logical type).
>>>>>> After
>>>>>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>>>>>> result.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>>>>>>>
>>>>>>> Hi Yu,
>>>>>>>
>>>>>>> Thanks for moving my comments to this thread!  Also, thank you for
>>>>>>> answering my questions; it is helping me understand the SQL Gateway
>>>>>>> better.
>>>>>>>
>>>>>>> 5.
>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>
>>>>>>> Should this configuration be mentioned in the FLIP?
>>>>>>>
>>>>>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>>>>>> as
>>>>>>> a session configuration is that users could set it low and cause the
>>>>>> client
>>>>>>> to send a large volume of requests to the SQL gateway.
>>>>>>>
>>>>>>> Generally, I'd like to see some way for the server to be able to limit
>>>>>> the
>>>>>>> number of requests it receives.  If that really needs to be done by a
>>>>>> proxy
>>>>>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>>>>>> think my concern here should be blocking in any way.)
>>>>>>>
>>>>>>> 7.
>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>
>>>>>>> I wonder if two other options are possible:
>>>>>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>>>>>> Gateway need to deserialize the response in order to understand it for
>>>>>> some
>>>>>>> reason?)
>>>>>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>>>>>> the Client read the format which the JobManager sends?)
>>>>>>>
>>>>>>> Thanks again!
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jim
>>>>>>>
>>>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi, all
>>>>>>>>
>>>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>>>>>>
>>>>>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>>>>>> from the Flink Table Store?
>>>>>>>>>
>>>>>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>>>>>> syntax
>>>>>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>>>>>> 1.18
>>>>>>>>> Gateway/system which has a new keyword.  Is there anything we should
>>>>>> be
>>>>>>>>> designing for upfront?)
>>>>>>>>>
>>>>>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>>>>>> single gateway be able to support multiple endpoint versions?
>>>>>>>>> 4. How are commands which change a session handled?  Are those sent
>>>>>> via
>>>>>>>>> an ExecuteStatementRequest?
>>>>>>>>>
>>>>>>>>> 5. The remote POC uses polling for getting back status and getting
>>>>>> back
>>>>>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>>>>>> frequency should be different between local and remote configurations.
>>>>>>>>>
>>>>>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>>>>>> sql
>>>>>>>>> type in client side is because it's hard for the lightweight
>>>>>>>> client-level
>>>>>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>>>>>>
>>>>>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>>>>>> have some control over whether the gateway returns results as SQL or
>>>>>>>> JSON.
>>>>>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>>>>>> results
>>>>>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>>>>>> that
>>>>>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>>>>>> return
>>>>>>>>> type can be sent as part of the request so that the JobManager can
>>>>>> send
>>>>>>>>> back results in an advantageous format?
>>>>>>>>>
>>>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>>>>>>
>>>>>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>>>>>> change
>>>>>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>>>>>>
>>>>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>>>>>> because
>>>>>>>> if the
>>>>>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>>>>>> directly.
>>>>>>>>
>>>>>>>> For more information: Actually, the proposed ClientParser only do two
>>>>>>>> things:
>>>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>>>>>> raw
>>>>>>>> string
>>>>>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>>>>>> sql types
>>>>>>>> mostly affects the print style, and unrecognized sql also can be
>>>>>> submitted
>>>>>>>> to cluster.
>>>>>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>>>>>>
>>>>>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>>>>>> is two things.
>>>>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>>>>>> mentioned
>>>>>>>> in #1, the sql statements which syntax is unknown will be submitted to
>>>>>> the
>>>>>>>> gateway,
>>>>>>>> and whether they can be executed normally depends on whether the
>>>>>> execution
>>>>>>>> environment supports the syntax.
>>>>>>>>
>>>>>>>>> Is there anything we should be designing for upfront?
>>>>>>>>
>>>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>>>>>> gateway APIs.
>>>>>>>>
>>>>>>>> 3.
>>>>>>>>> How will client and server version mismatches be handled?
>>>>>>>>
>>>>>>>> A lower version client can work compatible with a higher version gateway
>>>>>>>> because the
>>>>>>>> old interfaces won’t be deleted. When a higher version client connects
>>>>>> to
>>>>>>>> a lower version
>>>>>>>> gateway, the client should notify the users if they try to use
>>>>>> unsupported
>>>>>>>> features. For
>>>>>>>> example, the client start option ‘-i’  means using initialization file
>>>>>> to
>>>>>>>> initialize the session.
>>>>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>>>>>> this
>>>>>>>> API is not
>>>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>>>>>> user try to
>>>>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>>>>>> should tell the user that
>>>>>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>>>>>>
>>>>>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>>>>>>
>>>>>>>> Currently, the gateway only starts a highest version endpoint and the
>>>>>>>> higher version endpoint
>>>>>>>> is compatible with the lower version endpoint’s protocol.
>>>>>>>>
>>>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>>>>>> session
>>>>>>>> configuration.
>>>>>>>> Notice: the client can’t change the session (I mean, close current
>>>>>> session
>>>>>>>> and open another
>>>>>>>> one). I’m not sure if you have need to change the session itself?
>>>>>>>>
>>>>>>>> 5.
>>>>>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>>>>>> to avoid polling?
>>>>>>>>
>>>>>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>>>>>> client. How about taking
>>>>>>>> it as a future work?
>>>>>>>>
>>>>>>>>> If polling is used for both, the polling frequency should be different
>>>>>>>> between local and remote
>>>>>>>> configurations.
>>>>>>>>
>>>>>>>> Our idea is to introduce a new session option (like
>>>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>>>> the fetching requests sending frequency. What do you think?
>>>>>>>>
>>>>>>>> For more information: we are inclined to keep the polling behavior in
>>>>>> this
>>>>>>>> version. For streaming
>>>>>>>> query, fetching results synchronously may occupy resources of the
>>>>>> gateway
>>>>>>>> in a long period.
>>>>>>>> For example, if the job doesn’t return results for a long time because
>>>>>> the
>>>>>>>> window has not been
>>>>>>>> triggered, the synchronously fetching will keep occupying the
>>>>>> connection.
>>>>>>>> In asynchronous
>>>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>>>>>> the resources
>>>>>>>> for other clients to use. I think we can make some improvements for the
>>>>>>>> whole flow path in the
>>>>>>>> future.
>>>>>>>>
>>>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>>>>>> clear.
>>>>>>>>
>>>>>>>> We proposed to add 'ContentType' to indicates the result is for what
>>>>>> kind
>>>>>>>> of sql. In this sentence,
>>>>>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>>>>>> recognize the sql type too.
>>>>>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>>>>>> For example, it can’t
>>>>>>>> recognize query with CTE. So the result should carry content type
>>>>>>>> information to help the client to
>>>>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>>>>>> the result is for a
>>>>>>>> query statement.
>>>>>>>>
>>>>>>>> 7.
>>>>>>>>> What is the serialization lifecycle for results?
>>>>>>>>
>>>>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>>>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>>>>>> format)
>>>>>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>>>>>>
>>>>>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>>>>>> so that the
>>>>>>>> JobManager can send  back results in an advantageous format?
>>>>>>>>
>>>>>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>>>>>> some
>>>>>>>> ideas. For example,
>>>>>>>>
>>>>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>>>>>> costs
>>>>>>>> from JM to Gateway.
>>>>>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>>>>>> instead of JobManager.
>>>>>>>>
>>>>>>>> But I think we can leave this as a future work and discuss in another
>>>>>>>> thread.
>>>>>>>>
>>>>>>>> 8. Yes.
>>>>>>>>
>>>>>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>>>>>> Yes. We will try our best to finish the work.
>>>>>>>>
>>>>>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>>>>>>>>>
>>>>>>>>> Hi, all
>>>>>>>>>
>>>>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>>>>>> Client Based on SQL Gateway[1].
>>>>>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>>>>>> local connection which can not satisfy
>>>>>>>>> the common need of connecting to a remote cluster.
>>>>>>>>>
>>>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>>>>>> implement the Remote SQL Client
>>>>>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>>>>>>
>>>>>>>>> 1. New remote mode client which performs connection to the remote
>>>>>>>> gateway through REST API.
>>>>>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>>>>>> the local client based on SQL Gateway
>>>>>>>>> to unify the interface for two modes.
>>>>>>>>>
>>>>>>>>> Looking forward to your suggestions.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Yu Zelin
>>>>>>>>>
>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>
> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by godfrey he <go...@gmail.com>.
Hi, zeklin

>The CLI will use default print style for the non-query result.
Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
commands are clear.

> We think it’s better to add the root cause to the ErrorResponseBody.
LGTM

Best,
Godfrey

yu zelin <yu...@gmail.com> 于2022年12月6日周二 17:51写道:
>
> Hi, Godfrey
>
> Thanks for your feedback. Below is my thoughts about your questions.
>
> 1. About RowFormat.
> I agree to your opinion. So we decided to revert the RowFormat related changes
> and let the client to resolve the print format.
>
> 2. About ContentType
> I agree that the definition of the ContentType is not clear. But how to define the
> statement type is another big question. So, we decided to only tell the query result
> and non-query result apart. The CLI will use default print style for the non-query
> result.
>
> 3. About ErrorHandling
> I think reuse the current ErrorResponseBody is good, but parse the root cause
> from the exception stack strings is quite hacking. We think it’s better to add the
> root cause to the ErrorResponseBody.
>
> 4. About Runtime REST API Modifications
> I agree, too. This part is moved to the ‘Future Work’.
>
> Best,
> Yu Zelin
>
>
> > 2022年12月5日 18:33,godfrey he <go...@gmail.com> 写道:
> >
> > Hi Zelin,
> >
> > Thanks for driving this discussion.
> >
> > I have a few comments,
> >
> >> Add RowFormat to ResultSet to indicate the format of rows.
> > We should not require SqlGateway server to meet the display
> > requirements of a CliClient.
> > Because different CliClients may have different display style. The
> > server just need to response the data,
> > and the CliClient prints the result as needed. So RowFormat is not needed.
> >
> >> Add ContentType to ResultSet to indicate what kind of data the result contains.
> > from my first sight, the values of ContentType are intersected, such
> > as: A select query will return QUERY_RESULT,
> > but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> > kind of query will return OTHER.
> > I recommend returning the concrete type for each statement, such as
> > "CREATE TABLE" for "create table xx (...) with ()",
> > "SELECT" for "select * from xxx". The statement type can be maintained
> > in `Operation`s.
> >
> >> Error Handling
> > I think current design of error handling mechanism can meet the
> > requirement of CliClient, we can get the root cause from
> > the stack (see ErrorResponseBody#errors). If it becomes a common
> > requirement (for many clients) in the future,
> > we can introduce this interface.
> >
> >> Runtime REST API Modification for Local Client Migration
> > I think this part is over-engineered, this part belongs to optimization.
> > The client does not require very high performance, the current design
> > can already meet our needs.
> > If we find performance problems in the future, do such optimizations.
> >
> > Best,
> > Godfrey
> >
> > yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
> >>
> >> Hi, Shammon
> >>
> >> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
> >> it's not supported in the gateway side yet. In my opinion, this FLIP is more
> >> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
> >> ‘Future Work’? We can discuss how to implement it in another thread.
> >>
> >> Best,
> >> Yu Zelin
> >>> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
> >>>
> >>> Hi zelin
> >>>
> >>> Thanks for driving this discussion.
> >>>
> >>> I notice that the sql-client will interact with sql-gateway by `REST
> >>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> >>> sql-gateway?
> >>>
> >>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
> >>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
> >>> to connect sql-gateway too.
> >>>
> >>> Best,
> >>> Shammon
> >>>
> >>>
> >>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
> >>>
> >>>> Hi Jim,
> >>>>
> >>>> Thanks for your feedback!
> >>>>
> >>>>> Should this configuration be mentioned in the FLIP?
> >>>>
> >>>> Sure.
> >>>>
> >>>>> some way for the server to be able to limit the number of requests it
> >>>> receives.
> >>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
> >>>> we
> >>>> didn't consider much about this. I think the option is enough currently.
> >>>> I will add
> >>>> the improvement suggestions to the ‘Future Work’.
> >>>>
> >>>>> I wonder if two other options are possible
> >>>>
> >>>> To forward the raw format to gateway and then to client is possible. The
> >>>> raw
> >>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
> >>>> can find
> >>>> a way to get this result without wrapping it. Second, constructing a
> >>>> ‘InternalTypeInfo’.
> >>>> We can construct it using the schema information (data’s logical type).
> >>>> After
> >>>> construction, we can get the ’TypeSerializer’ to deserialize the raw
> >>>> result.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
> >>>>>
> >>>>> Hi Yu,
> >>>>>
> >>>>> Thanks for moving my comments to this thread!  Also, thank you for
> >>>>> answering my questions; it is helping me understand the SQL Gateway
> >>>>> better.
> >>>>>
> >>>>> 5.
> >>>>>> Our idea is to introduce a new session option (like
> >>>>> 'sql-client.result.fetch-interval') to control
> >>>>> the fetching requests sending frequency. What do you think?
> >>>>>
> >>>>> Should this configuration be mentioned in the FLIP?
> >>>>>
> >>>>> One slight concern I have with having 'sql-client.result.fetch-interval'
> >>>> as
> >>>>> a session configuration is that users could set it low and cause the
> >>>> client
> >>>>> to send a large volume of requests to the SQL gateway.
> >>>>>
> >>>>> Generally, I'd like to see some way for the server to be able to limit
> >>>> the
> >>>>> number of requests it receives.  If that really needs to be done by a
> >>>> proxy
> >>>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> >>>>> think my concern here should be blocking in any way.)
> >>>>>
> >>>>> 7.
> >>>>>> What is the serialization lifecycle for results?
> >>>>>
> >>>>> I wonder if two other options are possible:
> >>>>> 3) Could the Gateway just forward the result byte array?  (Or does the
> >>>>> Gateway need to deserialize the response in order to understand it for
> >>>> some
> >>>>> reason?)
> >>>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> >>>>> the Client read the format which the JobManager sends?)
> >>>>>
> >>>>> Thanks again!
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Jim
> >>>>>
> >>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi, all
> >>>>>>
> >>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
> >>>>>>
> >>>>>>> 1. For the Client Parser, is it going to work with the extended syntax
> >>>>>>> from the Flink Table Store?
> >>>>>>>
> >>>>>>> 2. Relatedly, what will happen if an older Client tries to handle
> >>>>>> syntax
> >>>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
> >>>>>> 1.18
> >>>>>>> Gateway/system which has a new keyword.  Is there anything we should
> >>>> be
> >>>>>>> designing for upfront?)
> >>>>>>>
> >>>>>>> 3. How will client and server version mismatches be handled?  Will a
> >>>>>>> single gateway be able to support multiple endpoint versions?
> >>>>>>> 4. How are commands which change a session handled?  Are those sent
> >>>> via
> >>>>>>> an ExecuteStatementRequest?
> >>>>>>>
> >>>>>>> 5. The remote POC uses polling for getting back status and getting
> >>>> back
> >>>>>>> results.  Would it be possible to switch to web sockets or some other
> >>>>>>> mechanism to avoid polling?  If polling is used for both, the polling
> >>>>>>> frequency should be different between local and remote configurations.
> >>>>>>>
> >>>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
> >>>> sql
> >>>>>>> type in client side is because it's hard for the lightweight
> >>>>>> client-level
> >>>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
> >>>>>>>
> >>>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
> >>>>>>> have some control over whether the gateway returns results as SQL or
> >>>>>> JSON.
> >>>>>>> I'd love to see a way to avoid needing to serialize and deserialize
> >>>>>> results
> >>>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
> >>>>>> that
> >>>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
> >>>>>> return
> >>>>>>> type can be sent as part of the request so that the JobManager can
> >>>> send
> >>>>>>> back results in an advantageous format?
> >>>>>>>
> >>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
> >>>>>>>
> >>>>>>> I'm excited for the SQL client to support gateway mode!  Given the
> >>>> change
> >>>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
> >>>>>>
> >>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
> >>>> because
> >>>>>> if the
> >>>>>> sql type is not recognized, the sql will be submitted to the gateway
> >>>>>> directly.
> >>>>>>
> >>>>>> For more information: Actually, the proposed ClientParser only do two
> >>>>>> things:
> >>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
> >>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
> >>>> raw
> >>>>>> string
> >>>>>> for the SHOW CREATE result instead of table). Here the recognization of
> >>>>>> sql types
> >>>>>> mostly affects the print style, and unrecognized sql also can be
> >>>> submitted
> >>>>>> to cluster.
> >>>>>> So the Client with new ClientParser can work compatible with new syntax.
> >>>>>>
> >>>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
> >>>>>> is two things.
> >>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
> >>>>>> mentioned
> >>>>>> in #1, the sql statements which syntax is unknown will be submitted to
> >>>> the
> >>>>>> gateway,
> >>>>>> and whether they can be executed normally depends on whether the
> >>>> execution
> >>>>>> environment supports the syntax.
> >>>>>>
> >>>>>>> Is there anything we should be designing for upfront?
> >>>>>>
> >>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
> >>>>>> gateway APIs.
> >>>>>>
> >>>>>> 3.
> >>>>>>> How will client and server version mismatches be handled?
> >>>>>>
> >>>>>> A lower version client can work compatible with a higher version gateway
> >>>>>> because the
> >>>>>> old interfaces won’t be deleted. When a higher version client connects
> >>>> to
> >>>>>> a lower version
> >>>>>> gateway, the client should notify the users if they try to use
> >>>> unsupported
> >>>>>> features. For
> >>>>>> example, the client start option ‘-i’  means using initialization file
> >>>> to
> >>>>>> initialize the session.
> >>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
> >>>> this
> >>>>>> API is not
> >>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
> >>>>>> user try to
> >>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
> >>>>>> should tell the user that
> >>>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
> >>>>>>
> >>>>>>> Will a single gateway be able to support multiple endpoint versions?
> >>>>>>
> >>>>>> Currently, the gateway only starts a highest version endpoint and the
> >>>>>> higher version endpoint
> >>>>>> is compatible with the lower version endpoint’s protocol.
> >>>>>>
> >>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
> >>>> session
> >>>>>> configuration.
> >>>>>> Notice: the client can’t change the session (I mean, close current
> >>>> session
> >>>>>> and open another
> >>>>>> one). I’m not sure if you have need to change the session itself?
> >>>>>>
> >>>>>> 5.
> >>>>>>> Would it be possible to switch to web sockets or some other mechanism
> >>>>>> to avoid polling?
> >>>>>>
> >>>>>> Your suggestion is very good, but this flip is for supporting the remote
> >>>>>> client. How about taking
> >>>>>> it as a future work?
> >>>>>>
> >>>>>>> If polling is used for both, the polling frequency should be different
> >>>>>> between local and remote
> >>>>>> configurations.
> >>>>>>
> >>>>>> Our idea is to introduce a new session option (like
> >>>>>> 'sql-client.result.fetch-interval') to control
> >>>>>> the fetching requests sending frequency. What do you think?
> >>>>>>
> >>>>>> For more information: we are inclined to keep the polling behavior in
> >>>> this
> >>>>>> version. For streaming
> >>>>>> query, fetching results synchronously may occupy resources of the
> >>>> gateway
> >>>>>> in a long period.
> >>>>>> For example, if the job doesn’t return results for a long time because
> >>>> the
> >>>>>> window has not been
> >>>>>> triggered, the synchronously fetching will keep occupying the
> >>>> connection.
> >>>>>> In asynchronous
> >>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
> >>>>>> the resources
> >>>>>> for other clients to use. I think we can make some improvements for the
> >>>>>> whole flow path in the
> >>>>>> future.
> >>>>>>
> >>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
> >>>> clear.
> >>>>>>
> >>>>>> We proposed to add 'ContentType' to indicates the result is for what
> >>>> kind
> >>>>>> of sql. In this sentence,
> >>>>>> I want to explain why we add 'ContentType' since the ClientParser can
> >>>>>> recognize the sql type too.
> >>>>>> It is because the proposed ClientParser can't recognize complex syntax.
> >>>>>> For example, it can’t
> >>>>>> recognize query with CTE. So the result should carry content type
> >>>>>> information to help the client to
> >>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
> >>>>>> the result is for a
> >>>>>> query statement.
> >>>>>>
> >>>>>> 7.
> >>>>>>> What is the serialization lifecycle for results?
> >>>>>>
> >>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
> >>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
> >>>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
> >>>>>> format)
> >>>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
> >>>>>>
> >>>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
> >>>>>> so that the
> >>>>>> JobManager can send  back results in an advantageous format?
> >>>>>>
> >>>>>> Yes. I think it's an improvement for the Client and Gateway. We have
> >>>> some
> >>>>>> ideas. For example,
> >>>>>>
> >>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
> >>>> costs
> >>>>>> from JM to Gateway.
> >>>>>> 2) Or the Gateway can collect the data from the sink function directly
> >>>>>> instead of JobManager.
> >>>>>>
> >>>>>> But I think we can leave this as a future work and discuss in another
> >>>>>> thread.
> >>>>>>
> >>>>>> 8. Yes.
> >>>>>>
> >>>>>>> Do you think it'll still be part of the Flink 1.17 release?
> >>>>>> Yes. We will try our best to finish the work.
> >>>>>>
> >>>>>> Feel free to talk to me if I’m wrong or you have any other questions.
> >>>>>>
> >>>>>>
> >>>>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
> >>>>>>>
> >>>>>>> Hi, all
> >>>>>>>
> >>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
> >>>>>> Client Based on SQL Gateway[1].
> >>>>>>> The motivation of this FLIP is that the current SQL Client allows only
> >>>>>> local connection which can not satisfy
> >>>>>>> the common need of connecting to a remote cluster.
> >>>>>>>
> >>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
> >>>>>> implement the Remote SQL Client
> >>>>>>> based on SQL Gateway. In our design, we proposed two main changes:
> >>>>>>>
> >>>>>>> 1. New remote mode client which performs connection to the remote
> >>>>>> gateway through REST API.
> >>>>>>> 2. Migration of the current local mode client. We proposed to refactor
> >>>>>> the local client based on SQL Gateway
> >>>>>>> to unify the interface for two modes.
> >>>>>>>
> >>>>>>> Looking forward to your suggestions.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Yu Zelin
> >>>>>>>
> >>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
> >>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by yu zelin <yu...@gmail.com>.
Hi, Godfrey

Thanks for your feedback. Below is my thoughts about your questions.

1. About RowFormat.
I agree to your opinion. So we decided to revert the RowFormat related changes 
and let the client to resolve the print format.

2. About ContentType
I agree that the definition of the ContentType is not clear. But how to define the 
statement type is another big question. So, we decided to only tell the query result 
and non-query result apart. The CLI will use default print style for the non-query
result.

3. About ErrorHandling
I think reuse the current ErrorResponseBody is good, but parse the root cause 
from the exception stack strings is quite hacking. We think it’s better to add the 
root cause to the ErrorResponseBody.

4. About Runtime REST API Modifications
I agree, too. This part is moved to the ‘Future Work’.

Best,
Yu Zelin


> 2022年12月5日 18:33,godfrey he <go...@gmail.com> 写道:
> 
> Hi Zelin,
> 
> Thanks for driving this discussion.
> 
> I have a few comments,
> 
>> Add RowFormat to ResultSet to indicate the format of rows.
> We should not require SqlGateway server to meet the display
> requirements of a CliClient.
> Because different CliClients may have different display style. The
> server just need to response the data,
> and the CliClient prints the result as needed. So RowFormat is not needed.
> 
>> Add ContentType to ResultSet to indicate what kind of data the result contains.
> from my first sight, the values of ContentType are intersected, such
> as: A select query will return QUERY_RESULT,
> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> kind of query will return OTHER.
> I recommend returning the concrete type for each statement, such as
> "CREATE TABLE" for "create table xx (...) with ()",
> "SELECT" for "select * from xxx". The statement type can be maintained
> in `Operation`s.
> 
>> Error Handling
> I think current design of error handling mechanism can meet the
> requirement of CliClient, we can get the root cause from
> the stack (see ErrorResponseBody#errors). If it becomes a common
> requirement (for many clients) in the future,
> we can introduce this interface.
> 
>> Runtime REST API Modification for Local Client Migration
> I think this part is over-engineered, this part belongs to optimization.
> The client does not require very high performance, the current design
> can already meet our needs.
> If we find performance problems in the future, do such optimizations.
> 
> Best,
> Godfrey
> 
> yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
>> 
>> Hi, Shammon
>> 
>> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
>> it's not supported in the gateway side yet. In my opinion, this FLIP is more
>> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
>> ‘Future Work’? We can discuss how to implement it in another thread.
>> 
>> Best,
>> Yu Zelin
>>> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
>>> 
>>> Hi zelin
>>> 
>>> Thanks for driving this discussion.
>>> 
>>> I notice that the sql-client will interact with sql-gateway by `REST
>>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
>>> sql-gateway?
>>> 
>>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
>>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
>>> to connect sql-gateway too.
>>> 
>>> Best,
>>> Shammon
>>> 
>>> 
>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
>>> 
>>>> Hi Jim,
>>>> 
>>>> Thanks for your feedback!
>>>> 
>>>>> Should this configuration be mentioned in the FLIP?
>>>> 
>>>> Sure.
>>>> 
>>>>> some way for the server to be able to limit the number of requests it
>>>> receives.
>>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>>>> we
>>>> didn't consider much about this. I think the option is enough currently.
>>>> I will add
>>>> the improvement suggestions to the ‘Future Work’.
>>>> 
>>>>> I wonder if two other options are possible
>>>> 
>>>> To forward the raw format to gateway and then to client is possible. The
>>>> raw
>>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>>>> can find
>>>> a way to get this result without wrapping it. Second, constructing a
>>>> ‘InternalTypeInfo’.
>>>> We can construct it using the schema information (data’s logical type).
>>>> After
>>>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>>>> result.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>>>>> 
>>>>> Hi Yu,
>>>>> 
>>>>> Thanks for moving my comments to this thread!  Also, thank you for
>>>>> answering my questions; it is helping me understand the SQL Gateway
>>>>> better.
>>>>> 
>>>>> 5.
>>>>>> Our idea is to introduce a new session option (like
>>>>> 'sql-client.result.fetch-interval') to control
>>>>> the fetching requests sending frequency. What do you think?
>>>>> 
>>>>> Should this configuration be mentioned in the FLIP?
>>>>> 
>>>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>>>> as
>>>>> a session configuration is that users could set it low and cause the
>>>> client
>>>>> to send a large volume of requests to the SQL gateway.
>>>>> 
>>>>> Generally, I'd like to see some way for the server to be able to limit
>>>> the
>>>>> number of requests it receives.  If that really needs to be done by a
>>>> proxy
>>>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>>>> think my concern here should be blocking in any way.)
>>>>> 
>>>>> 7.
>>>>>> What is the serialization lifecycle for results?
>>>>> 
>>>>> I wonder if two other options are possible:
>>>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>>>> Gateway need to deserialize the response in order to understand it for
>>>> some
>>>>> reason?)
>>>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>>>> the Client read the format which the JobManager sends?)
>>>>> 
>>>>> Thanks again!
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
>>>>> 
>>>>>> Hi, all
>>>>>> 
>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>>>> 
>>>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>>>> from the Flink Table Store?
>>>>>>> 
>>>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>>>> syntax
>>>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>>>> 1.18
>>>>>>> Gateway/system which has a new keyword.  Is there anything we should
>>>> be
>>>>>>> designing for upfront?)
>>>>>>> 
>>>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>>>> single gateway be able to support multiple endpoint versions?
>>>>>>> 4. How are commands which change a session handled?  Are those sent
>>>> via
>>>>>>> an ExecuteStatementRequest?
>>>>>>> 
>>>>>>> 5. The remote POC uses polling for getting back status and getting
>>>> back
>>>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>>>> frequency should be different between local and remote configurations.
>>>>>>> 
>>>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>>>> sql
>>>>>>> type in client side is because it's hard for the lightweight
>>>>>> client-level
>>>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>>>> 
>>>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>>>> have some control over whether the gateway returns results as SQL or
>>>>>> JSON.
>>>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>>>> results
>>>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>>>> that
>>>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>>>> return
>>>>>>> type can be sent as part of the request so that the JobManager can
>>>> send
>>>>>>> back results in an advantageous format?
>>>>>>> 
>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>>>> 
>>>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>>>> change
>>>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>>>> 
>>>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>>>> because
>>>>>> if the
>>>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>>>> directly.
>>>>>> 
>>>>>> For more information: Actually, the proposed ClientParser only do two
>>>>>> things:
>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>>>> raw
>>>>>> string
>>>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>>>> sql types
>>>>>> mostly affects the print style, and unrecognized sql also can be
>>>> submitted
>>>>>> to cluster.
>>>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>>>> 
>>>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>>>> is two things.
>>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>>>> mentioned
>>>>>> in #1, the sql statements which syntax is unknown will be submitted to
>>>> the
>>>>>> gateway,
>>>>>> and whether they can be executed normally depends on whether the
>>>> execution
>>>>>> environment supports the syntax.
>>>>>> 
>>>>>>> Is there anything we should be designing for upfront?
>>>>>> 
>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>>>> gateway APIs.
>>>>>> 
>>>>>> 3.
>>>>>>> How will client and server version mismatches be handled?
>>>>>> 
>>>>>> A lower version client can work compatible with a higher version gateway
>>>>>> because the
>>>>>> old interfaces won’t be deleted. When a higher version client connects
>>>> to
>>>>>> a lower version
>>>>>> gateway, the client should notify the users if they try to use
>>>> unsupported
>>>>>> features. For
>>>>>> example, the client start option ‘-i’  means using initialization file
>>>> to
>>>>>> initialize the session.
>>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>>>> this
>>>>>> API is not
>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>>>> user try to
>>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>>>> should tell the user that
>>>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>>>> 
>>>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>>>> 
>>>>>> Currently, the gateway only starts a highest version endpoint and the
>>>>>> higher version endpoint
>>>>>> is compatible with the lower version endpoint’s protocol.
>>>>>> 
>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>>>> session
>>>>>> configuration.
>>>>>> Notice: the client can’t change the session (I mean, close current
>>>> session
>>>>>> and open another
>>>>>> one). I’m not sure if you have need to change the session itself?
>>>>>> 
>>>>>> 5.
>>>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>>>> to avoid polling?
>>>>>> 
>>>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>>>> client. How about taking
>>>>>> it as a future work?
>>>>>> 
>>>>>>> If polling is used for both, the polling frequency should be different
>>>>>> between local and remote
>>>>>> configurations.
>>>>>> 
>>>>>> Our idea is to introduce a new session option (like
>>>>>> 'sql-client.result.fetch-interval') to control
>>>>>> the fetching requests sending frequency. What do you think?
>>>>>> 
>>>>>> For more information: we are inclined to keep the polling behavior in
>>>> this
>>>>>> version. For streaming
>>>>>> query, fetching results synchronously may occupy resources of the
>>>> gateway
>>>>>> in a long period.
>>>>>> For example, if the job doesn’t return results for a long time because
>>>> the
>>>>>> window has not been
>>>>>> triggered, the synchronously fetching will keep occupying the
>>>> connection.
>>>>>> In asynchronous
>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>>>> the resources
>>>>>> for other clients to use. I think we can make some improvements for the
>>>>>> whole flow path in the
>>>>>> future.
>>>>>> 
>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>>>> clear.
>>>>>> 
>>>>>> We proposed to add 'ContentType' to indicates the result is for what
>>>> kind
>>>>>> of sql. In this sentence,
>>>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>>>> recognize the sql type too.
>>>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>>>> For example, it can’t
>>>>>> recognize query with CTE. So the result should carry content type
>>>>>> information to help the client to
>>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>>>> the result is for a
>>>>>> query statement.
>>>>>> 
>>>>>> 7.
>>>>>>> What is the serialization lifecycle for results?
>>>>>> 
>>>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>>>> format)
>>>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>>>> 
>>>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>>>> so that the
>>>>>> JobManager can send  back results in an advantageous format?
>>>>>> 
>>>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>>>> some
>>>>>> ideas. For example,
>>>>>> 
>>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>>>> costs
>>>>>> from JM to Gateway.
>>>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>>>> instead of JobManager.
>>>>>> 
>>>>>> But I think we can leave this as a future work and discuss in another
>>>>>> thread.
>>>>>> 
>>>>>> 8. Yes.
>>>>>> 
>>>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>>>> Yes. We will try our best to finish the work.
>>>>>> 
>>>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>>>> 
>>>>>> 
>>>>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>>>>>>> 
>>>>>>> Hi, all
>>>>>>> 
>>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>>>> Client Based on SQL Gateway[1].
>>>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>>>> local connection which can not satisfy
>>>>>>> the common need of connecting to a remote cluster.
>>>>>>> 
>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>>>> implement the Remote SQL Client
>>>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>>>> 
>>>>>>> 1. New remote mode client which performs connection to the remote
>>>>>> gateway through REST API.
>>>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>>>> the local client based on SQL Gateway
>>>>>>> to unify the interface for two modes.
>>>>>>> 
>>>>>>> Looking forward to your suggestions.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Yu Zelin
>>>>>>> 
>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by godfrey he <go...@gmail.com>.
Hi Zelin,

Thanks for driving this discussion.

I have a few comments,

> Add RowFormat to ResultSet to indicate the format of rows.
We should not require SqlGateway server to meet the display
requirements of a CliClient.
Because different CliClients may have different display style. The
server just need to response the data,
and the CliClient prints the result as needed. So RowFormat is not needed.

> Add ContentType to ResultSet to indicate what kind of data the result contains.
from my first sight, the values of ContentType are intersected, such
as: A select query will return QUERY_RESULT,
but it also has JOB_ID. OTHER is too ambiguous, I don't know which
kind of query will return OTHER.
I recommend returning the concrete type for each statement, such as
"CREATE TABLE" for "create table xx (...) with ()",
"SELECT" for "select * from xxx". The statement type can be maintained
in `Operation`s.

>Error Handling
I think current design of error handling mechanism can meet the
requirement of CliClient, we can get the root cause from
the stack (see ErrorResponseBody#errors). If it becomes a common
requirement (for many clients) in the future,
we can introduce this interface.

>Runtime REST API Modification for Local Client Migration
I think this part is over-engineered, this part belongs to optimization.
The client does not require very high performance, the current design
can already meet our needs.
If we find performance problems in the future, do such optimizations.

Best,
Godfrey

yu zelin <yu...@gmail.com> 于2022年12月5日周一 11:11写道:
>
> Hi, Shammon
>
> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
> it's not supported in the gateway side yet. In my opinion, this FLIP is more
> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
> ‘Future Work’? We can discuss how to implement it in another thread.
>
> Best,
> Yu Zelin
> > 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
> >
> > Hi zelin
> >
> > Thanks for driving this discussion.
> >
> > I notice that the sql-client will interact with sql-gateway by `REST
> > Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> > sql-gateway?
> >
> > Then the sql-client can connect the gateway with jdbc-sdk, on the other
> > hand, the other applications and tools such as jmeter can use the jdbc-sdk
> > to connect sql-gateway too.
> >
> > Best,
> > Shammon
> >
> >
> > On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
> >
> >> Hi Jim,
> >>
> >> Thanks for your feedback!
> >>
> >>> Should this configuration be mentioned in the FLIP?
> >>
> >> Sure.
> >>
> >>> some way for the server to be able to limit the number of requests it
> >> receives.
> >> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
> >> we
> >> didn't consider much about this. I think the option is enough currently.
> >> I will add
> >> the improvement suggestions to the ‘Future Work’.
> >>
> >>> I wonder if two other options are possible
> >>
> >> To forward the raw format to gateway and then to client is possible. The
> >> raw
> >> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
> >> can find
> >> a way to get this result without wrapping it. Second, constructing a
> >> ‘InternalTypeInfo’.
> >> We can construct it using the schema information (data’s logical type).
> >> After
> >> construction, we can get the ’TypeSerializer’ to deserialize the raw
> >> result.
> >>
> >>
> >>
> >>
> >>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
> >>>
> >>> Hi Yu,
> >>>
> >>> Thanks for moving my comments to this thread!  Also, thank you for
> >>> answering my questions; it is helping me understand the SQL Gateway
> >>> better.
> >>>
> >>> 5.
> >>>> Our idea is to introduce a new session option (like
> >>> 'sql-client.result.fetch-interval') to control
> >>> the fetching requests sending frequency. What do you think?
> >>>
> >>> Should this configuration be mentioned in the FLIP?
> >>>
> >>> One slight concern I have with having 'sql-client.result.fetch-interval'
> >> as
> >>> a session configuration is that users could set it low and cause the
> >> client
> >>> to send a large volume of requests to the SQL gateway.
> >>>
> >>> Generally, I'd like to see some way for the server to be able to limit
> >> the
> >>> number of requests it receives.  If that really needs to be done by a
> >> proxy
> >>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> >>> think my concern here should be blocking in any way.)
> >>>
> >>> 7.
> >>>> What is the serialization lifecycle for results?
> >>>
> >>> I wonder if two other options are possible:
> >>> 3) Could the Gateway just forward the result byte array?  (Or does the
> >>> Gateway need to deserialize the response in order to understand it for
> >> some
> >>> reason?)
> >>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> >>> the Client read the format which the JobManager sends?)
> >>>
> >>> Thanks again!
> >>>
> >>> Cheers,
> >>>
> >>> Jim
> >>>
> >>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
> >>>
> >>>> Hi, all
> >>>>
> >>>> Thanks Jim’s questions below. Here I’d like to reply to them.
> >>>>
> >>>>> 1. For the Client Parser, is it going to work with the extended syntax
> >>>>> from the Flink Table Store?
> >>>>>
> >>>>> 2. Relatedly, what will happen if an older Client tries to handle
> >>>> syntax
> >>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
> >>>> 1.18
> >>>>> Gateway/system which has a new keyword.  Is there anything we should
> >> be
> >>>>> designing for upfront?)
> >>>>>
> >>>>> 3. How will client and server version mismatches be handled?  Will a
> >>>>> single gateway be able to support multiple endpoint versions?
> >>>>> 4. How are commands which change a session handled?  Are those sent
> >> via
> >>>>> an ExecuteStatementRequest?
> >>>>>
> >>>>> 5. The remote POC uses polling for getting back status and getting
> >> back
> >>>>> results.  Would it be possible to switch to web sockets or some other
> >>>>> mechanism to avoid polling?  If polling is used for both, the polling
> >>>>> frequency should be different between local and remote configurations.
> >>>>>
> >>>>> 6. What does this sentence mean?  "The reason why we didn't get the
> >> sql
> >>>>> type in client side is because it's hard for the lightweight
> >>>> client-level
> >>>>> parser to recognize some sql type  sql, such as query with CTE.  "
> >>>>>
> >>>>> 7. What is the serialization lifecycle for results?  It makes sense to
> >>>>> have some control over whether the gateway returns results as SQL or
> >>>> JSON.
> >>>>> I'd love to see a way to avoid needing to serialize and deserialize
> >>>> results
> >>>>> on the SQL Gateway if possible.  I'm still new enough to the project
> >>>> that
> >>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
> >>>> return
> >>>>> type can be sent as part of the request so that the JobManager can
> >> send
> >>>>> back results in an advantageous format?
> >>>>>
> >>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
> >>>>>
> >>>>> I'm excited for the SQL client to support gateway mode!  Given the
> >> change
> >>>>> in design, do you think it'll still be part of the Flink 1.17 release?
> >>>>
> >>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
> >> because
> >>>> if the
> >>>> sql type is not recognized, the sql will be submitted to the gateway
> >>>> directly.
> >>>>
> >>>> For more information: Actually, the proposed ClientParser only do two
> >>>> things:
> >>>> (1) Tell client commands (help, clear, etc) and sqls apart.
> >>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
> >> raw
> >>>> string
> >>>> for the SHOW CREATE result instead of table). Here the recognization of
> >>>> sql types
> >>>> mostly affects the print style, and unrecognized sql also can be
> >> submitted
> >>>> to cluster.
> >>>> So the Client with new ClientParser can work compatible with new syntax.
> >>>>
> >>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
> >>>> is two things.
> >>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
> >>>> mentioned
> >>>> in #1, the sql statements which syntax is unknown will be submitted to
> >> the
> >>>> gateway,
> >>>> and whether they can be executed normally depends on whether the
> >> execution
> >>>> environment supports the syntax.
> >>>>
> >>>>> Is there anything we should be designing for upfront?
> >>>>
> >>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
> >>>> gateway APIs.
> >>>>
> >>>> 3.
> >>>>> How will client and server version mismatches be handled?
> >>>>
> >>>> A lower version client can work compatible with a higher version gateway
> >>>> because the
> >>>> old interfaces won’t be deleted. When a higher version client connects
> >> to
> >>>> a lower version
> >>>> gateway, the client should notify the users if they try to use
> >> unsupported
> >>>> features. For
> >>>> example, the client start option ‘-i’  means using initialization file
> >> to
> >>>> initialize the session.
> >>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
> >> this
> >>>> API is not
> >>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
> >>>> user try to
> >>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
> >>>> should tell the user that
> >>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
> >>>>
> >>>>> Will a single gateway be able to support multiple endpoint versions?
> >>>>
> >>>> Currently, the gateway only starts a highest version endpoint and the
> >>>> higher version endpoint
> >>>> is compatible with the lower version endpoint’s protocol.
> >>>>
> >>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
> >> session
> >>>> configuration.
> >>>> Notice: the client can’t change the session (I mean, close current
> >> session
> >>>> and open another
> >>>> one). I’m not sure if you have need to change the session itself?
> >>>>
> >>>> 5.
> >>>>> Would it be possible to switch to web sockets or some other mechanism
> >>>> to avoid polling?
> >>>>
> >>>> Your suggestion is very good, but this flip is for supporting the remote
> >>>> client. How about taking
> >>>> it as a future work?
> >>>>
> >>>>> If polling is used for both, the polling frequency should be different
> >>>> between local and remote
> >>>> configurations.
> >>>>
> >>>> Our idea is to introduce a new session option (like
> >>>> 'sql-client.result.fetch-interval') to control
> >>>> the fetching requests sending frequency. What do you think?
> >>>>
> >>>> For more information: we are inclined to keep the polling behavior in
> >> this
> >>>> version. For streaming
> >>>> query, fetching results synchronously may occupy resources of the
> >> gateway
> >>>> in a long period.
> >>>> For example, if the job doesn’t return results for a long time because
> >> the
> >>>> window has not been
> >>>> triggered, the synchronously fetching will keep occupying the
> >> connection.
> >>>> In asynchronous
> >>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
> >>>> the resources
> >>>> for other clients to use. I think we can make some improvements for the
> >>>> whole flow path in the
> >>>> future.
> >>>>
> >>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
> >> clear.
> >>>>
> >>>> We proposed to add 'ContentType' to indicates the result is for what
> >> kind
> >>>> of sql. In this sentence,
> >>>> I want to explain why we add 'ContentType' since the ClientParser can
> >>>> recognize the sql type too.
> >>>> It is because the proposed ClientParser can't recognize complex syntax.
> >>>> For example, it can’t
> >>>> recognize query with CTE. So the result should carry content type
> >>>> information to help the client to
> >>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
> >>>> the result is for a
> >>>> query statement.
> >>>>
> >>>> 7.
> >>>>> What is the serialization lifecycle for results?
> >>>>
> >>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
> >>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
> >>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
> >>>> format)
> >>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
> >>>>
> >>>>> Maybe the SQL Gateway's return type can be sent as part of the request
> >>>> so that the
> >>>> JobManager can send  back results in an advantageous format?
> >>>>
> >>>> Yes. I think it's an improvement for the Client and Gateway. We have
> >> some
> >>>> ideas. For example,
> >>>>
> >>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
> >> costs
> >>>> from JM to Gateway.
> >>>> 2) Or the Gateway can collect the data from the sink function directly
> >>>> instead of JobManager.
> >>>>
> >>>> But I think we can leave this as a future work and discuss in another
> >>>> thread.
> >>>>
> >>>> 8. Yes.
> >>>>
> >>>>> Do you think it'll still be part of the Flink 1.17 release?
> >>>> Yes. We will try our best to finish the work.
> >>>>
> >>>> Feel free to talk to me if I’m wrong or you have any other questions.
> >>>>
> >>>>
> >>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
> >>>>>
> >>>>> Hi, all
> >>>>>
> >>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
> >>>> Client Based on SQL Gateway[1].
> >>>>> The motivation of this FLIP is that the current SQL Client allows only
> >>>> local connection which can not satisfy
> >>>>> the common need of connecting to a remote cluster.
> >>>>>
> >>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
> >>>> implement the Remote SQL Client
> >>>>> based on SQL Gateway. In our design, we proposed two main changes:
> >>>>>
> >>>>> 1. New remote mode client which performs connection to the remote
> >>>> gateway through REST API.
> >>>>> 2. Migration of the current local mode client. We proposed to refactor
> >>>> the local client based on SQL Gateway
> >>>>>  to unify the interface for two modes.
> >>>>>
> >>>>> Looking forward to your suggestions.
> >>>>>
> >>>>> Best,
> >>>>> Yu Zelin
> >>>>>
> >>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
> >>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
> >>>>
> >>>>
> >>
> >>
>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by yu zelin <yu...@gmail.com>.
Hi, Shammon

Thanks for your feedback. I think it’s good to support jdbc-sdk. However, 
it's not supported in the gateway side yet. In my opinion, this FLIP is more
concerned with the SQL Client. How about put “supporting jdbc-sdk” in 
‘Future Work’? We can discuss how to implement it in another thread.

Best,
Yu Zelin
> 2022年12月2日 18:12,Shammon FY <zj...@gmail.com> 写道:
> 
> Hi zelin
> 
> Thanks for driving this discussion.
> 
> I notice that the sql-client will interact with sql-gateway by `REST
> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> sql-gateway?
> 
> Then the sql-client can connect the gateway with jdbc-sdk, on the other
> hand, the other applications and tools such as jmeter can use the jdbc-sdk
> to connect sql-gateway too.
> 
> Best,
> Shammon
> 
> 
> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:
> 
>> Hi Jim,
>> 
>> Thanks for your feedback!
>> 
>>> Should this configuration be mentioned in the FLIP?
>> 
>> Sure.
>> 
>>> some way for the server to be able to limit the number of requests it
>> receives.
>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>> we
>> didn't consider much about this. I think the option is enough currently.
>> I will add
>> the improvement suggestions to the ‘Future Work’.
>> 
>>> I wonder if two other options are possible
>> 
>> To forward the raw format to gateway and then to client is possible. The
>> raw
>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>> can find
>> a way to get this result without wrapping it. Second, constructing a
>> ‘InternalTypeInfo’.
>> We can construct it using the schema information (data’s logical type).
>> After
>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>> result.
>> 
>> 
>> 
>> 
>>> 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
>>> 
>>> Hi Yu,
>>> 
>>> Thanks for moving my comments to this thread!  Also, thank you for
>>> answering my questions; it is helping me understand the SQL Gateway
>>> better.
>>> 
>>> 5.
>>>> Our idea is to introduce a new session option (like
>>> 'sql-client.result.fetch-interval') to control
>>> the fetching requests sending frequency. What do you think?
>>> 
>>> Should this configuration be mentioned in the FLIP?
>>> 
>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>> as
>>> a session configuration is that users could set it low and cause the
>> client
>>> to send a large volume of requests to the SQL gateway.
>>> 
>>> Generally, I'd like to see some way for the server to be able to limit
>> the
>>> number of requests it receives.  If that really needs to be done by a
>> proxy
>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>> think my concern here should be blocking in any way.)
>>> 
>>> 7.
>>>> What is the serialization lifecycle for results?
>>> 
>>> I wonder if two other options are possible:
>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>> Gateway need to deserialize the response in order to understand it for
>> some
>>> reason?)
>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>> the Client read the format which the JobManager sends?)
>>> 
>>> Thanks again!
>>> 
>>> Cheers,
>>> 
>>> Jim
>>> 
>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
>>> 
>>>> Hi, all
>>>> 
>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>> 
>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>> from the Flink Table Store?
>>>>> 
>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>> syntax
>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>> 1.18
>>>>> Gateway/system which has a new keyword.  Is there anything we should
>> be
>>>>> designing for upfront?)
>>>>> 
>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>> single gateway be able to support multiple endpoint versions?
>>>>> 4. How are commands which change a session handled?  Are those sent
>> via
>>>>> an ExecuteStatementRequest?
>>>>> 
>>>>> 5. The remote POC uses polling for getting back status and getting
>> back
>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>> frequency should be different between local and remote configurations.
>>>>> 
>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>> sql
>>>>> type in client side is because it's hard for the lightweight
>>>> client-level
>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>> 
>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>> have some control over whether the gateway returns results as SQL or
>>>> JSON.
>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>> results
>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>> that
>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>> return
>>>>> type can be sent as part of the request so that the JobManager can
>> send
>>>>> back results in an advantageous format?
>>>>> 
>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>> 
>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>> change
>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>> 
>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>> because
>>>> if the
>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>> directly.
>>>> 
>>>> For more information: Actually, the proposed ClientParser only do two
>>>> things:
>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>> raw
>>>> string
>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>> sql types
>>>> mostly affects the print style, and unrecognized sql also can be
>> submitted
>>>> to cluster.
>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>> 
>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>> is two things.
>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>> mentioned
>>>> in #1, the sql statements which syntax is unknown will be submitted to
>> the
>>>> gateway,
>>>> and whether they can be executed normally depends on whether the
>> execution
>>>> environment supports the syntax.
>>>> 
>>>>> Is there anything we should be designing for upfront?
>>>> 
>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>> gateway APIs.
>>>> 
>>>> 3.
>>>>> How will client and server version mismatches be handled?
>>>> 
>>>> A lower version client can work compatible with a higher version gateway
>>>> because the
>>>> old interfaces won’t be deleted. When a higher version client connects
>> to
>>>> a lower version
>>>> gateway, the client should notify the users if they try to use
>> unsupported
>>>> features. For
>>>> example, the client start option ‘-i’  means using initialization file
>> to
>>>> initialize the session.
>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>> this
>>>> API is not
>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>> user try to
>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>> should tell the user that
>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>> 
>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>> 
>>>> Currently, the gateway only starts a highest version endpoint and the
>>>> higher version endpoint
>>>> is compatible with the lower version endpoint’s protocol.
>>>> 
>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>> session
>>>> configuration.
>>>> Notice: the client can’t change the session (I mean, close current
>> session
>>>> and open another
>>>> one). I’m not sure if you have need to change the session itself?
>>>> 
>>>> 5.
>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>> to avoid polling?
>>>> 
>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>> client. How about taking
>>>> it as a future work?
>>>> 
>>>>> If polling is used for both, the polling frequency should be different
>>>> between local and remote
>>>> configurations.
>>>> 
>>>> Our idea is to introduce a new session option (like
>>>> 'sql-client.result.fetch-interval') to control
>>>> the fetching requests sending frequency. What do you think?
>>>> 
>>>> For more information: we are inclined to keep the polling behavior in
>> this
>>>> version. For streaming
>>>> query, fetching results synchronously may occupy resources of the
>> gateway
>>>> in a long period.
>>>> For example, if the job doesn’t return results for a long time because
>> the
>>>> window has not been
>>>> triggered, the synchronously fetching will keep occupying the
>> connection.
>>>> In asynchronous
>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>> the resources
>>>> for other clients to use. I think we can make some improvements for the
>>>> whole flow path in the
>>>> future.
>>>> 
>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>> clear.
>>>> 
>>>> We proposed to add 'ContentType' to indicates the result is for what
>> kind
>>>> of sql. In this sentence,
>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>> recognize the sql type too.
>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>> For example, it can’t
>>>> recognize query with CTE. So the result should carry content type
>>>> information to help the client to
>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>> the result is for a
>>>> query statement.
>>>> 
>>>> 7.
>>>>> What is the serialization lifecycle for results?
>>>> 
>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>> format)
>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>> 
>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>> so that the
>>>> JobManager can send  back results in an advantageous format?
>>>> 
>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>> some
>>>> ideas. For example,
>>>> 
>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>> costs
>>>> from JM to Gateway.
>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>> instead of JobManager.
>>>> 
>>>> But I think we can leave this as a future work and discuss in another
>>>> thread.
>>>> 
>>>> 8. Yes.
>>>> 
>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>> Yes. We will try our best to finish the work.
>>>> 
>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>> 
>>>> 
>>>>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
>>>>> 
>>>>> Hi, all
>>>>> 
>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>> Client Based on SQL Gateway[1].
>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>> local connection which can not satisfy
>>>>> the common need of connecting to a remote cluster.
>>>>> 
>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>> implement the Remote SQL Client
>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>> 
>>>>> 1. New remote mode client which performs connection to the remote
>>>> gateway through REST API.
>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>> the local client based on SQL Gateway
>>>>>  to unify the interface for two modes.
>>>>> 
>>>>> Looking forward to your suggestions.
>>>>> 
>>>>> Best,
>>>>> Yu Zelin
>>>>> 
>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Posted by Shammon FY <zj...@gmail.com>.
Hi zelin

Thanks for driving this discussion.

I notice that the sql-client will interact with sql-gateway by `REST
Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
sql-gateway?

Then the sql-client can connect the gateway with jdbc-sdk, on the other
hand, the other applications and tools such as jmeter can use the jdbc-sdk
to connect sql-gateway too.

Best,
Shammon


On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yu...@gmail.com> wrote:

> Hi Jim,
>
> Thanks for your feedback!
>
> > Should this configuration be mentioned in the FLIP?
>
> Sure.
>
> > some way for the server to be able to limit the number of requests it
> receives.
> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
> we
> didn't consider much about this. I think the option is enough currently.
> I will add
> the improvement suggestions to the ‘Future Work’.
>
> > I wonder if two other options are possible
>
> To forward the raw format to gateway and then to client is possible. The
> raw
> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
> can find
> a way to get this result without wrapping it. Second, constructing a
> ‘InternalTypeInfo’.
> We can construct it using the schema information (data’s logical type).
> After
> construction, we can get the ’TypeSerializer’ to deserialize the raw
> result.
>
>
>
>
> > 2022年12月1日 04:54,Jim Hughes <jh...@confluent.io.INVALID> 写道:
> >
> > Hi Yu,
> >
> > Thanks for moving my comments to this thread!  Also, thank you for
> > answering my questions; it is helping me understand the SQL Gateway
> > better.
> >
> > 5.
> >> Our idea is to introduce a new session option (like
> > 'sql-client.result.fetch-interval') to control
> > the fetching requests sending frequency. What do you think?
> >
> > Should this configuration be mentioned in the FLIP?
> >
> > One slight concern I have with having 'sql-client.result.fetch-interval'
> as
> > a session configuration is that users could set it low and cause the
> client
> > to send a large volume of requests to the SQL gateway.
> >
> > Generally, I'd like to see some way for the server to be able to limit
> the
> > number of requests it receives.  If that really needs to be done by a
> proxy
> > in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> > think my concern here should be blocking in any way.)
> >
> > 7.
> >> What is the serialization lifecycle for results?
> >
> > I wonder if two other options are possible:
> > 3) Could the Gateway just forward the result byte array?  (Or does the
> > Gateway need to deserialize the response in order to understand it for
> some
> > reason?)
> > 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> > the Client read the format which the JobManager sends?)
> >
> > Thanks again!
> >
> > Cheers,
> >
> > Jim
> >
> > On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yu...@gmail.com> wrote:
> >
> >> Hi, all
> >>
> >> Thanks Jim’s questions below. Here I’d like to reply to them.
> >>
> >>>  1. For the Client Parser, is it going to work with the extended syntax
> >>>  from the Flink Table Store?
> >>>
> >>>  2. Relatedly, what will happen if an older Client tries to handle
> >> syntax
> >>>  that a newer service supports?  (Suppose I use a 1.17 client with a
> >> 1.18
> >>>  Gateway/system which has a new keyword.  Is there anything we should
> be
> >>>  designing for upfront?)
> >>>
> >>>  3. How will client and server version mismatches be handled?  Will a
> >>>  single gateway be able to support multiple endpoint versions?
> >>>  4. How are commands which change a session handled?  Are those sent
> via
> >>>  an ExecuteStatementRequest?
> >>>
> >>>  5. The remote POC uses polling for getting back status and getting
> back
> >>>  results.  Would it be possible to switch to web sockets or some other
> >>>  mechanism to avoid polling?  If polling is used for both, the polling
> >>>  frequency should be different between local and remote configurations.
> >>>
> >>>  6. What does this sentence mean?  "The reason why we didn't get the
> sql
> >>>  type in client side is because it's hard for the lightweight
> >> client-level
> >>>  parser to recognize some sql type  sql, such as query with CTE.  "
> >>>
> >>>  7. What is the serialization lifecycle for results?  It makes sense to
> >>>  have some control over whether the gateway returns results as SQL or
> >> JSON.
> >>>  I'd love to see a way to avoid needing to serialize and deserialize
> >> results
> >>>  on the SQL Gateway if possible.  I'm still new enough to the project
> >> that
> >>>  I'm not sure if that's readily possible.  Maybe the SQL Gateway's
> >> return
> >>>  type can be sent as part of the request so that the JobManager can
> send
> >>>  back results in an advantageous format?
> >>>
> >>>  8. Does ErrorType need to be marked as @PublicEvolving?
> >>>
> >>> I'm excited for the SQL client to support gateway mode!  Given the
> change
> >>> in design, do you think it'll still be part of the Flink 1.17 release?
> >>
> >> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
> because
> >> if the
> >> sql type is not recognized, the sql will be submitted to the gateway
> >> directly.
> >>
> >> For more information: Actually, the proposed ClientParser only do two
> >> things:
> >> (1) Tell client commands (help, clear, etc) and sqls apart.
> >> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
> raw
> >> string
> >> for the SHOW CREATE result instead of table). Here the recognization of
> >> sql types
> >> mostly affects the print style, and unrecognized sql also can be
> submitted
> >> to cluster.
> >> So the Client with new ClientParser can work compatible with new syntax.
> >>
> >> 2. First, I'd like to explain that the gateway APIs and supported syntax
> >> is two things.
> >> For example, ‘configureSession' and 'completeStatement' are APIs. As
> >> mentioned
> >> in #1, the sql statements which syntax is unknown will be submitted to
> the
> >> gateway,
> >> and whether they can be executed normally depends on whether the
> execution
> >> environment supports the syntax.
> >>
> >>> Is there anything we should be designing for upfront?
> >>
> >> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
> >> gateway APIs.
> >>
> >> 3.
> >>> How will client and server version mismatches be handled?
> >>
> >> A lower version client can work compatible with a higher version gateway
> >> because the
> >> old interfaces won’t be deleted. When a higher version client connects
> to
> >> a lower version
> >> gateway, the client should notify the users if they try to use
> unsupported
> >> features. For
> >> example, the client start option ‘-i’  means using initialization file
> to
> >> initialize the session.
> >> We plan to use the gateway’s ‘configureSession’ to implement it. But
> this
> >> API is not
> >> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
> >> user try to
> >> use ‘-i’ option to start the client with the 1.16 gateway, the client
> >> should tell the user that
> >> Can’t execute ‘-i’ option with gateway which version is lower than V2.
> >>
> >>> Will a single gateway be able to support multiple endpoint versions?
> >>
> >> Currently, the gateway only starts a highest version endpoint and the
> >> higher version endpoint
> >> is compatible with the lower version endpoint’s protocol.
> >>
> >> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
> session
> >> configuration.
> >> Notice: the client can’t change the session (I mean, close current
> session
> >> and open another
> >> one). I’m not sure if you have need to change the session itself?
> >>
> >> 5.
> >>> Would it be possible to switch to web sockets or some other mechanism
> >> to avoid polling?
> >>
> >> Your suggestion is very good, but this flip is for supporting the remote
> >> client. How about taking
> >> it as a future work?
> >>
> >>> If polling is used for both, the polling frequency should be different
> >> between local and remote
> >> configurations.
> >>
> >> Our idea is to introduce a new session option (like
> >> 'sql-client.result.fetch-interval') to control
> >> the fetching requests sending frequency. What do you think?
> >>
> >> For more information: we are inclined to keep the polling behavior in
> this
> >> version. For streaming
> >> query, fetching results synchronously may occupy resources of the
> gateway
> >> in a long period.
> >> For example, if the job doesn’t return results for a long time because
> the
> >> window has not been
> >> triggered, the synchronously fetching will keep occupying the
> connection.
> >> In asynchronous
> >> situation, the gateway can return a NOT_READY_RESULT quickly and release
> >> the resources
> >> for other clients to use. I think we can make some improvements for the
> >> whole flow path in the
> >> future.
> >>
> >> 6. Sorry for that there is mistakes in this sentence. Let me make it
> clear.
> >>
> >> We proposed to add 'ContentType' to indicates the result is for what
> kind
> >> of sql. In this sentence,
> >> I want to explain why we add 'ContentType' since the ClientParser can
> >> recognize the sql type too.
> >> It is because the proposed ClientParser can't recognize complex syntax.
> >> For example, it can’t
> >> recognize query with CTE. So the result should carry content type
> >> information to help the client to
> >> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
> >> the result is for a
> >> query statement.
> >>
> >> 7.
> >>> What is the serialization lifecycle for results?
> >>
> >> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
> >> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
> >> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
> >> format)
> >> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
> >>
> >>> Maybe the SQL Gateway's return type can be sent as part of the request
> >> so that the
> >> JobManager can send  back results in an advantageous format?
> >>
> >> Yes. I think it's an improvement for the Client and Gateway. We have
> some
> >> ideas. For example,
> >>
> >> 1) We can move the Gateway into the JobManager and reduce the Ser/De
> costs
> >> from JM to Gateway.
> >> 2) Or the Gateway can collect the data from the sink function directly
> >> instead of JobManager.
> >>
> >> But I think we can leave this as a future work and discuss in another
> >> thread.
> >>
> >> 8. Yes.
> >>
> >>> Do you think it'll still be part of the Flink 1.17 release?
> >> Yes. We will try our best to finish the work.
> >>
> >> Feel free to talk to me if I’m wrong or you have any other questions.
> >>
> >>
> >>> 2022年11月25日 11:48,yu zelin <yu...@gmail.com> 写道:
> >>>
> >>> Hi, all
> >>>
> >>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
> >> Client Based on SQL Gateway[1].
> >>> The motivation of this FLIP is that the current SQL Client allows only
> >> local connection which can not satisfy
> >>> the common need of connecting to a remote cluster.
> >>>
> >>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
> >> implement the Remote SQL Client
> >>> based on SQL Gateway. In our design, we proposed two main changes:
> >>>
> >>> 1. New remote mode client which performs connection to the remote
> >> gateway through REST API.
> >>> 2. Migration of the current local mode client. We proposed to refactor
> >> the local client based on SQL Gateway
> >>>   to unify the interface for two modes.
> >>>
> >>> Looking forward to your suggestions.
> >>>
> >>> Best,
> >>> Yu Zelin
> >>>
> >>> [1] https://cwiki.apache.org/confluence/x/T48ODg
> >>> [2] https://cwiki.apache.org/confluence/x/rIyMC
> >>
> >>
>
>