You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Ferenc Csaky <fe...@pm.me.INVALID> on 2023/06/22 16:25:04 UTC

[DISCUSS] Persistent SQL Gateway

Hello devs,

I would like to open a discussion about persistence possibilitis for the SQL Gateway. At Cloudera, we are happy to see the work already done on this project and looking for ways to utilize it on our platform as well, but currently it lacks some features that would be essential in our case, where we could help out.

I am not sure if any thought went into gateway persistence specifics already, and this feature could be implemented in fundamentally differnt ways, so I think the frist step could be to agree on the basics.

First, in my opinion, persistence should be an optional feature of the gateway, that can be enabled if desired. There can be a lot of implementation details, but there can be some major directions to follow:

- Utilize Hive catalog: The Hive catalog can already be used to have persistenct meta-objects, so the crucial thing that would be missing in this case is other catalogs. Personally, I would not pursue this option, because in my opinion it would limit the usability of this feature too much.
- Serialize the session as is: Saving the whole session (or its context) [1] as is to durable storage, so it can be kept and picked up again.
- Serialize the required elements (catalogs, tables, functions, etc.), not necessarily as a whole: The main point here would be to serialize a different object, so the persistent data will not be that sensitive to changes of the session (or its context). There can be numerous factors here, like try to keep the model close to the session itself, so the boilerplate required for the mapping can be kept to minimal, or focus on saving what is actually necessary, making the persistent storage more portable.

WDYT?

Cheers,
F

[1] https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java

Re: [DISCUSS] Persistent SQL Gateway

Posted by Ferenc Csaky <fe...@pm.me.INVALID>.

Hi Shammon,

Thank you for your answer and explanation, my latest experiment was a SELECT query and my assumptions were based on that, INSERT works as described.

Regarding the state of FLIP-295, I just checked out the recently created jiras [1] and if I can help out with any part, please let me know.

Cheers,
F

[1] https://issues.apache.org/jira/browse/FLINK-32427


------- Original Message -------
On Tuesday, June 27th, 2023 at 13:39, Shammon FY <zj...@gmail.com> wrote:


> 
> 
> Hi Ferenc,
> 
> If I understand correctly, there will be two types of jobs in sql-gateway:
> `SELECT` and `NON-SELECT` such as `INSERT`.
> 
> 1. `SELECT` jobs need to collect results from Flink cluster in a
> corresponding session of sql gateway, and when the session is closed, the
> job should be canceled. These jobs are generally short queries similar to
> OLAP and I think it may be acceptable.
> 
> 2. `NON-SELECT` jobs may be batch or streaming jobs, and when the jobs are
> submitted successfully, they won't be killed or canceled even if the
> session or sql-gateway is closed. After these assignments are successfully
> submitted, the lifecycle is no longer managed by SQL gateway.
> 
> I don't know if it covers your usage scenario. Could you describe yours for
> us to test and confirm?
> 
> Best,
> Shammon FY
> 
> 
> On Tue, Jun 27, 2023 at 6:43 PM Ferenc Csaky ferenc.csaky@pm.me.invalid
> 
> wrote:
> 
> > Hi Jark,
> > 
> > In the current implementation, any job submitted via the SQL Gateway has
> > to be done through a session, cause all the operations are grouped under
> > sessions.
> > 
> > Starting from there, if I close a session, that will close the
> > "SessionContext", which closes the "OperationManager" [1], and the
> > "OperationManager" closes all submitted operations tied to that session
> > [2], which results closing all the jobs executed in the session.
> > 
> > Maybe I am missing something, but my experience is that the jobs I submit
> > via the SQL Gateway are getting cleaned up on gateway session close.
> > 
> > WDYT?
> > 
> > Cheers,
> > F
> > 
> > [1]
> > https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/context/SessionContext.java#L204
> > [2]
> > https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/operation/OperationManager.java#L194
> > 
> > ------- Original Message -------
> > On Tuesday, June 27th, 2023 at 04:37, Jark Wu imjark@gmail.com wrote:
> > 
> > > Hi Ferenc,
> > > 
> > > But the job lifecycle doesn't tie to the SQL Gateway session.
> > > Even if the session is closed, all the running jobs are not affected.
> > > 
> > > Best,
> > > Jark
> > > 
> > > On Tue, 27 Jun 2023 at 04:14, Ferenc Csaky ferenc.csaky@pm.me.invalid
> > > 
> > > wrote:
> > > 
> > > > Hi Jark,
> > > > 
> > > > Thank you for pointing out FLIP-295 abouth catalog persistence, I was
> > > > not
> > > > aware the current state. Although as far as I see, that persistent
> > > > catalogs
> > > > are necessary, but not sufficient achieving a "persistent gateway".
> > > > 
> > > > The current implementation ties the job lifecycle to the SQL gateway
> > > > session, so if it gets closed, it will cancel all the jobs. So that
> > > > would
> > > > be the next step I think. Any work or thought regarding this aspect?
> > > > We are
> > > > definitely willing to help out on this front.
> > > > 
> > > > Cheers,
> > > > F
> > > > 
> > > > ------- Original Message -------
> > > > On Sunday, June 25th, 2023 at 06:23, Jark Wu imjark@gmail.com wrote:
> > > > 
> > > > > Hi Ferenc,
> > > > > 
> > > > > Making SQL Gateway to be an easy-to-use platform infrastructure of
> > > > > Flink
> > > > > SQL
> > > > > is one of the important roadmaps 1.
> > > > > 
> > > > > The persistence ability of the SQL Gateway is a major work in 1.18
> > > > > release.
> > > > > One of the persistence demand is that the registered catalogs are
> > > > > currently
> > > > > kept in memory and lost when Gateway restarts. There is an accepted
> > > > > FLIP
> > > > > (FLIP-295)[2] target to resolve this issue and make Gateway can
> > > > > persist
> > > > > the
> > > > > registered catalogs information into files or databases.
> > > > > 
> > > > > I'm not sure whether this is something you are looking for?
> > > > > 
> > > > > Best,
> > > > > Jark
> > > > > 
> > > > > [2]:
> > 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > 
> > > > > On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky ferenc.csaky@pm.me.invalid
> > > > > 
> > > > > wrote:
> > > > > 
> > > > > > Hello devs,
> > > > > > 
> > > > > > I would like to open a discussion about persistence possibilitis
> > > > > > for
> > > > > > the
> > > > > > SQL Gateway. At Cloudera, we are happy to see the work already
> > > > > > done on
> > > > > > this
> > > > > > project and looking for ways to utilize it on our platform as
> > > > > > well, but
> > > > > > currently it lacks some features that would be essential in our
> > > > > > case,
> > > > > > where
> > > > > > we could help out.
> > > > > > 
> > > > > > I am not sure if any thought went into gateway persistence
> > > > > > specifics
> > > > > > already, and this feature could be implemented in fundamentally
> > > > > > differnt
> > > > > > ways, so I think the frist step could be to agree on the basics.
> > > > > > 
> > > > > > First, in my opinion, persistence should be an optional feature of
> > > > > > the
> > > > > > gateway, that can be enabled if desired. There can be a lot of
> > > > > > implementation details, but there can be some major directions to
> > > > > > follow:
> > > > > > 
> > > > > > - Utilize Hive catalog: The Hive catalog can already be used to
> > > > > > have
> > > > > > persistenct meta-objects, so the crucial thing that would be
> > > > > > missing in
> > > > > > this case is other catalogs. Personally, I would not pursue this
> > > > > > option,
> > > > > > because in my opinion it would limit the usability of this feature
> > > > > > too
> > > > > > much.
> > > > > > - Serialize the session as is: Saving the whole session (or its
> > > > > > context)
> > > > > > 1 as is to durable storage, so it can be kept and picked up again.
> > > > > > - Serialize the required elements (catalogs, tables, functions,
> > > > > > etc.),
> > > > > > not
> > > > > > necessarily as a whole: The main point here would be to serialize a
> > > > > > different object, so the persistent data will not be that
> > > > > > sensitive to
> > > > > > changes of the session (or its context). There can be numerous
> > > > > > factors
> > > > > > here, like try to keep the model close to the session itself, so
> > > > > > the
> > > > > > boilerplate required for the mapping can be kept to minimal, or
> > > > > > focus
> > > > > > on
> > > > > > saving what is actually necessary, making the persistent storage
> > > > > > more
> > > > > > portable.
> > > > > > 
> > > > > > WDYT?
> > > > > > 
> > > > > > Cheers,
> > > > > > F
> > > > > > 
> > > > > > 1
> > 
> > https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java

Re: [DISCUSS] Persistent SQL Gateway

Posted by Shammon FY <zj...@gmail.com>.

Hi Ferenc,

If I understand correctly, there will be two types of jobs in sql-gateway:
`SELECT` and `NON-SELECT` such as `INSERT`.

1. `SELECT` jobs need to collect results from Flink cluster in a
corresponding session of sql gateway, and when the session is closed, the
job should be canceled. These jobs are generally short queries similar to
OLAP and I think it may be acceptable.

2. `NON-SELECT` jobs may be batch or streaming jobs, and when the jobs are
submitted successfully, they won't be killed or canceled even if the
session or sql-gateway is closed. After these assignments are successfully
submitted, the lifecycle is no longer managed by SQL gateway.

I don't know if it covers your usage scenario. Could you describe yours for
us to test and confirm?

Best,
Shammon FY


On Tue, Jun 27, 2023 at 6:43 PM Ferenc Csaky <fe...@pm.me.invalid>
wrote:

> Hi Jark,
>
> In the current implementation, any job submitted via the SQL Gateway has
> to be done through a session, cause all the operations are grouped under
> sessions.
>
> Starting from there, if I close a session, that will close the
> "SessionContext", which closes the "OperationManager" [1], and the
> "OperationManager" closes all submitted operations tied to that session
> [2], which results closing all the jobs executed in the session.
>
> Maybe I am missing something, but my experience is that the jobs I submit
> via the SQL Gateway are getting cleaned up on gateway session close.
>
> WDYT?
>
> Cheers,
> F
>
> [1]
> https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/context/SessionContext.java#L204
> [2]
> https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/operation/OperationManager.java#L194
>
>
>
> ------- Original Message -------
> On Tuesday, June 27th, 2023 at 04:37, Jark Wu <im...@gmail.com> wrote:
>
>
> >
> >
> > Hi Ferenc,
> >
> > But the job lifecycle doesn't tie to the SQL Gateway session.
> > Even if the session is closed, all the running jobs are not affected.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> > On Tue, 27 Jun 2023 at 04:14, Ferenc Csaky ferenc.csaky@pm.me.invalid
> >
> > wrote:
> >
> > > Hi Jark,
> > >
> > > Thank you for pointing out FLIP-295 abouth catalog persistence, I was
> not
> > > aware the current state. Although as far as I see, that persistent
> catalogs
> > > are necessary, but not sufficient achieving a "persistent gateway".
> > >
> > > The current implementation ties the job lifecycle to the SQL gateway
> > > session, so if it gets closed, it will cancel all the jobs. So that
> would
> > > be the next step I think. Any work or thought regarding this aspect?
> We are
> > > definitely willing to help out on this front.
> > >
> > > Cheers,
> > > F
> > >
> > > ------- Original Message -------
> > > On Sunday, June 25th, 2023 at 06:23, Jark Wu imjark@gmail.com wrote:
> > >
> > > > Hi Ferenc,
> > > >
> > > > Making SQL Gateway to be an easy-to-use platform infrastructure of
> Flink
> > > > SQL
> > > > is one of the important roadmaps 1.
> > > >
> > > > The persistence ability of the SQL Gateway is a major work in 1.18
> > > > release.
> > > > One of the persistence demand is that the registered catalogs are
> > > > currently
> > > > kept in memory and lost when Gateway restarts. There is an accepted
> FLIP
> > > > (FLIP-295)[2] target to resolve this issue and make Gateway can
> persist
> > > > the
> > > > registered catalogs information into files or databases.
> > > >
> > > > I'm not sure whether this is something you are looking for?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [2]:
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > >
> > > > On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky ferenc.csaky@pm.me.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hello devs,
> > > > >
> > > > > I would like to open a discussion about persistence possibilitis
> for
> > > > > the
> > > > > SQL Gateway. At Cloudera, we are happy to see the work already
> done on
> > > > > this
> > > > > project and looking for ways to utilize it on our platform as
> well, but
> > > > > currently it lacks some features that would be essential in our
> case,
> > > > > where
> > > > > we could help out.
> > > > >
> > > > > I am not sure if any thought went into gateway persistence
> specifics
> > > > > already, and this feature could be implemented in fundamentally
> > > > > differnt
> > > > > ways, so I think the frist step could be to agree on the basics.
> > > > >
> > > > > First, in my opinion, persistence should be an optional feature of
> the
> > > > > gateway, that can be enabled if desired. There can be a lot of
> > > > > implementation details, but there can be some major directions to
> > > > > follow:
> > > > >
> > > > > - Utilize Hive catalog: The Hive catalog can already be used to
> have
> > > > > persistenct meta-objects, so the crucial thing that would be
> missing in
> > > > > this case is other catalogs. Personally, I would not pursue this
> > > > > option,
> > > > > because in my opinion it would limit the usability of this feature
> too
> > > > > much.
> > > > > - Serialize the session as is: Saving the whole session (or its
> > > > > context)
> > > > > 1 as is to durable storage, so it can be kept and picked up again.
> > > > > - Serialize the required elements (catalogs, tables, functions,
> etc.),
> > > > > not
> > > > > necessarily as a whole: The main point here would be to serialize a
> > > > > different object, so the persistent data will not be that
> sensitive to
> > > > > changes of the session (or its context). There can be numerous
> factors
> > > > > here, like try to keep the model close to the session itself, so
> the
> > > > > boilerplate required for the mapping can be kept to minimal, or
> focus
> > > > > on
> > > > > saving what is actually necessary, making the persistent storage
> more
> > > > > portable.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Cheers,
> > > > > F
> > > > >
> > > > > 1
> > >
> > >
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java
>

Re: [DISCUSS] Persistent SQL Gateway

Posted by Ferenc Csaky <fe...@pm.me.INVALID>.

Hi Jark,

In the current implementation, any job submitted via the SQL Gateway has to be done through a session, cause all the operations are grouped under sessions.

Starting from there, if I close a session, that will close the "SessionContext", which closes the "OperationManager" [1], and the "OperationManager" closes all submitted operations tied to that session [2], which results closing all the jobs executed in the session.

Maybe I am missing something, but my experience is that the jobs I submit via the SQL Gateway are getting cleaned up on gateway session close.

WDYT?

Cheers,
F

[1] https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/context/SessionContext.java#L204
[2] https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/operation/OperationManager.java#L194



------- Original Message -------
On Tuesday, June 27th, 2023 at 04:37, Jark Wu <im...@gmail.com> wrote:


> 
> 
> Hi Ferenc,
> 
> But the job lifecycle doesn't tie to the SQL Gateway session.
> Even if the session is closed, all the running jobs are not affected.
> 
> Best,
> Jark
> 
> 
> 
> 
> On Tue, 27 Jun 2023 at 04:14, Ferenc Csaky ferenc.csaky@pm.me.invalid
> 
> wrote:
> 
> > Hi Jark,
> > 
> > Thank you for pointing out FLIP-295 abouth catalog persistence, I was not
> > aware the current state. Although as far as I see, that persistent catalogs
> > are necessary, but not sufficient achieving a "persistent gateway".
> > 
> > The current implementation ties the job lifecycle to the SQL gateway
> > session, so if it gets closed, it will cancel all the jobs. So that would
> > be the next step I think. Any work or thought regarding this aspect? We are
> > definitely willing to help out on this front.
> > 
> > Cheers,
> > F
> > 
> > ------- Original Message -------
> > On Sunday, June 25th, 2023 at 06:23, Jark Wu imjark@gmail.com wrote:
> > 
> > > Hi Ferenc,
> > > 
> > > Making SQL Gateway to be an easy-to-use platform infrastructure of Flink
> > > SQL
> > > is one of the important roadmaps 1.
> > > 
> > > The persistence ability of the SQL Gateway is a major work in 1.18
> > > release.
> > > One of the persistence demand is that the registered catalogs are
> > > currently
> > > kept in memory and lost when Gateway restarts. There is an accepted FLIP
> > > (FLIP-295)[2] target to resolve this issue and make Gateway can persist
> > > the
> > > registered catalogs information into files or databases.
> > > 
> > > I'm not sure whether this is something you are looking for?
> > > 
> > > Best,
> > > Jark
> > > 
> > > [2]:
> > 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > 
> > > On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky ferenc.csaky@pm.me.invalid
> > > 
> > > wrote:
> > > 
> > > > Hello devs,
> > > > 
> > > > I would like to open a discussion about persistence possibilitis for
> > > > the
> > > > SQL Gateway. At Cloudera, we are happy to see the work already done on
> > > > this
> > > > project and looking for ways to utilize it on our platform as well, but
> > > > currently it lacks some features that would be essential in our case,
> > > > where
> > > > we could help out.
> > > > 
> > > > I am not sure if any thought went into gateway persistence specifics
> > > > already, and this feature could be implemented in fundamentally
> > > > differnt
> > > > ways, so I think the frist step could be to agree on the basics.
> > > > 
> > > > First, in my opinion, persistence should be an optional feature of the
> > > > gateway, that can be enabled if desired. There can be a lot of
> > > > implementation details, but there can be some major directions to
> > > > follow:
> > > > 
> > > > - Utilize Hive catalog: The Hive catalog can already be used to have
> > > > persistenct meta-objects, so the crucial thing that would be missing in
> > > > this case is other catalogs. Personally, I would not pursue this
> > > > option,
> > > > because in my opinion it would limit the usability of this feature too
> > > > much.
> > > > - Serialize the session as is: Saving the whole session (or its
> > > > context)
> > > > 1 as is to durable storage, so it can be kept and picked up again.
> > > > - Serialize the required elements (catalogs, tables, functions, etc.),
> > > > not
> > > > necessarily as a whole: The main point here would be to serialize a
> > > > different object, so the persistent data will not be that sensitive to
> > > > changes of the session (or its context). There can be numerous factors
> > > > here, like try to keep the model close to the session itself, so the
> > > > boilerplate required for the mapping can be kept to minimal, or focus
> > > > on
> > > > saving what is actually necessary, making the persistent storage more
> > > > portable.
> > > > 
> > > > WDYT?
> > > > 
> > > > Cheers,
> > > > F
> > > > 
> > > > 1
> > 
> > https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java

Re: [DISCUSS] Persistent SQL Gateway

Posted by Jark Wu <im...@gmail.com>.

Hi Ferenc,

But the job lifecycle doesn't tie to the SQL Gateway session.
Even if the session is closed, all the running jobs are not affected.

Best,
Jark




On Tue, 27 Jun 2023 at 04:14, Ferenc Csaky <fe...@pm.me.invalid>
wrote:

> Hi Jark,
>
> Thank you for pointing out FLIP-295 abouth catalog persistence, I was not
> aware the current state. Although as far as I see, that persistent catalogs
> are necessary, but not sufficient achieving a "persistent gateway".
>
> The current implementation ties the job lifecycle to the SQL gateway
> session, so if it gets closed, it will cancel all the jobs. So that would
> be the next step I think. Any work or thought regarding this aspect? We are
> definitely willing to help out on this front.
>
> Cheers,
> F
>
>
> ------- Original Message -------
> On Sunday, June 25th, 2023 at 06:23, Jark Wu <im...@gmail.com> wrote:
>
>
> >
> >
> > Hi Ferenc,
> >
> > Making SQL Gateway to be an easy-to-use platform infrastructure of Flink
> > SQL
> > is one of the important roadmaps [1].
> >
> > The persistence ability of the SQL Gateway is a major work in 1.18
> release.
> > One of the persistence demand is that the registered catalogs are
> currently
> > kept in memory and lost when Gateway restarts. There is an accepted FLIP
> > (FLIP-295)[2] target to resolve this issue and make Gateway can persist
> the
> > registered catalogs information into files or databases.
> >
> > I'm not sure whether this is something you are looking for?
> >
> > Best,
> > Jark
> >
> >
> > [1]: https://flink.apache.org/roadmap/#a-unified-sql-platform
> > [2]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> >
> > On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky ferenc.csaky@pm.me.invalid
> >
> > wrote:
> >
> > > Hello devs,
> > >
> > > I would like to open a discussion about persistence possibilitis for
> the
> > > SQL Gateway. At Cloudera, we are happy to see the work already done on
> this
> > > project and looking for ways to utilize it on our platform as well, but
> > > currently it lacks some features that would be essential in our case,
> where
> > > we could help out.
> > >
> > > I am not sure if any thought went into gateway persistence specifics
> > > already, and this feature could be implemented in fundamentally
> differnt
> > > ways, so I think the frist step could be to agree on the basics.
> > >
> > > First, in my opinion, persistence should be an optional feature of the
> > > gateway, that can be enabled if desired. There can be a lot of
> > > implementation details, but there can be some major directions to
> follow:
> > >
> > > - Utilize Hive catalog: The Hive catalog can already be used to have
> > > persistenct meta-objects, so the crucial thing that would be missing in
> > > this case is other catalogs. Personally, I would not pursue this
> option,
> > > because in my opinion it would limit the usability of this feature too
> much.
> > > - Serialize the session as is: Saving the whole session (or its
> context)
> > > [1] as is to durable storage, so it can be kept and picked up again.
> > > - Serialize the required elements (catalogs, tables, functions, etc.),
> not
> > > necessarily as a whole: The main point here would be to serialize a
> > > different object, so the persistent data will not be that sensitive to
> > > changes of the session (or its context). There can be numerous factors
> > > here, like try to keep the model close to the session itself, so the
> > > boilerplate required for the mapping can be kept to minimal, or focus
> on
> > > saving what is actually necessary, making the persistent storage more
> > > portable.
> > >
> > > WDYT?
> > >
> > > Cheers,
> > > F
> > >
> > > [1]
> > >
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java
>

Re: [DISCUSS] Persistent SQL Gateway

Posted by Ferenc Csaky <fe...@pm.me.INVALID>.

Hi Jark,

Thank you for pointing out FLIP-295 abouth catalog persistence, I was not aware the current state. Although as far as I see, that persistent catalogs are necessary, but not sufficient achieving a "persistent gateway".

The current implementation ties the job lifecycle to the SQL gateway session, so if it gets closed, it will cancel all the jobs. So that would be the next step I think. Any work or thought regarding this aspect? We are definitely willing to help out on this front.

Cheers,
F


------- Original Message -------
On Sunday, June 25th, 2023 at 06:23, Jark Wu <im...@gmail.com> wrote:


>
>
> Hi Ferenc,
>
> Making SQL Gateway to be an easy-to-use platform infrastructure of Flink
> SQL
> is one of the important roadmaps [1].
>
> The persistence ability of the SQL Gateway is a major work in 1.18 release.
> One of the persistence demand is that the registered catalogs are currently
> kept in memory and lost when Gateway restarts. There is an accepted FLIP
> (FLIP-295)[2] target to resolve this issue and make Gateway can persist the
> registered catalogs information into files or databases.
>
> I'm not sure whether this is something you are looking for?
>
> Best,
> Jark
>
>
> [1]: https://flink.apache.org/roadmap/#a-unified-sql-platform
> [2]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
>
> On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky ferenc.csaky@pm.me.invalid
>
> wrote:
>
> > Hello devs,
> >
> > I would like to open a discussion about persistence possibilitis for the
> > SQL Gateway. At Cloudera, we are happy to see the work already done on this
> > project and looking for ways to utilize it on our platform as well, but
> > currently it lacks some features that would be essential in our case, where
> > we could help out.
> >
> > I am not sure if any thought went into gateway persistence specifics
> > already, and this feature could be implemented in fundamentally differnt
> > ways, so I think the frist step could be to agree on the basics.
> >
> > First, in my opinion, persistence should be an optional feature of the
> > gateway, that can be enabled if desired. There can be a lot of
> > implementation details, but there can be some major directions to follow:
> >
> > - Utilize Hive catalog: The Hive catalog can already be used to have
> > persistenct meta-objects, so the crucial thing that would be missing in
> > this case is other catalogs. Personally, I would not pursue this option,
> > because in my opinion it would limit the usability of this feature too much.
> > - Serialize the session as is: Saving the whole session (or its context)
> > [1] as is to durable storage, so it can be kept and picked up again.
> > - Serialize the required elements (catalogs, tables, functions, etc.), not
> > necessarily as a whole: The main point here would be to serialize a
> > different object, so the persistent data will not be that sensitive to
> > changes of the session (or its context). There can be numerous factors
> > here, like try to keep the model close to the session itself, so the
> > boilerplate required for the mapping can be kept to minimal, or focus on
> > saving what is actually necessary, making the persistent storage more
> > portable.
> >
> > WDYT?
> >
> > Cheers,
> > F
> >
> > [1]
> > https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java

Re: [DISCUSS] Persistent SQL Gateway

Posted by Jark Wu <im...@gmail.com>.

Hi Ferenc,

Making SQL Gateway to be an easy-to-use platform infrastructure of Flink
SQL
is one of the important roadmaps [1].

The persistence ability of the SQL Gateway is a major work in 1.18 release.
One of the persistence demand is that the registered catalogs are currently
kept in memory and lost when Gateway restarts. There is an accepted FLIP
(FLIP-295)[2] target to resolve this issue and make Gateway can persist the
registered catalogs information into files or databases.

I'm not sure whether this is something you are looking for?

Best,
Jark


[1]: https://flink.apache.org/roadmap/#a-unified-sql-platform
[2]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations

On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky <fe...@pm.me.invalid>
wrote:

> Hello devs,
>
> I would like to open a discussion about persistence possibilitis for the
> SQL Gateway. At Cloudera, we are happy to see the work already done on this
> project and looking for ways to utilize it on our platform as well, but
> currently it lacks some features that would be essential in our case, where
> we could help out.
>
> I am not sure if any thought went into gateway persistence specifics
> already, and this feature could be implemented in fundamentally differnt
> ways, so I think the frist step could be to agree on the basics.
>
> First, in my opinion, persistence should be an optional feature of the
> gateway, that can be enabled if desired. There can be a lot of
> implementation details, but there can be some major directions to follow:
>
> - Utilize Hive catalog: The Hive catalog can already be used to have
> persistenct meta-objects, so the crucial thing that would be missing in
> this case is other catalogs. Personally, I would not pursue this option,
> because in my opinion it would limit the usability of this feature too much.
> - Serialize the session as is: Saving the whole session (or its context)
> [1] as is to durable storage, so it can be kept and picked up again.
> - Serialize the required elements (catalogs, tables, functions, etc.), not
> necessarily as a whole: The main point here would be to serialize a
> different object, so the persistent data will not be that sensitive to
> changes of the session (or its context). There can be numerous factors
> here, like try to keep the model close to the session itself, so the
> boilerplate required for the mapping can be kept to minimal, or focus on
> saving what is actually necessary, making the persistent storage more
> portable.
>
> WDYT?
>
> Cheers,
> F
>
> [1]
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java