You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by dz902 <dz...@gmail.com> on 2022/03/15 11:14:34 UTC

Setting S3 as State Backend in SQL Client

Hi,

I'm using Flink 1.14 and was unable to set S3 as state backend. I tried
combination of:

SET state.backend='filesystem';
SET state.checkpoints.dir='s3://xxx/checkpoints/';
SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
SET state.checkpoint-storage='filesystem'

As well as:

SET state.backend='hashmap';

Which covered both legacy 1.13 way to do it and 1.14 new way to do it.

None worked. In the Web UI I see checkpoints being made to the Job Manager
continuously. Configuration reads:

- Checkpoint Storage: JobManagerCheckpointStorage
- State Backend: HashMapStateBackend

Is this a bug? Is there a way to set state backend to S3 using SQL Client?

Thanks,
Dai

Re: Setting S3 as State Backend in SQL Client

Posted by Dian Fu <di...@gmail.com>.
Could you show some code snappit?

On Mon, Mar 21, 2022 at 4:22 PM dz902 <dz...@gmail.com> wrote:

> Hi Dian,
>
> I have tried that with Flink v1.14 on YARN using session mode.
> Unfortunately it does not work. However, StreamExecutionContext
> set_state_backend() worked.
>
> On Mon, Mar 21, 2022 at 1:11 PM Dian Fu <di...@gmail.com> wrote:
>
>> Hi Dai,
>>
>> Regarding how to set per-job configuration for the state backend, could
>> you check if the documentation [1] is what you wanted?
>>
>> Regards,
>> Dian
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table/table_environment/#statebackend-checkpoint-and-restart-strategy
>>
>> On Thu, Mar 17, 2022 at 5:43 PM dz902 <dz...@gmail.com> wrote:
>>
>>> Thanks for the info!
>>>
>>> I guess one cannot change state backend within SQL Client in v1.13 and
>>> v1.14. I have tested both.
>>>
>>> As of flink-conf.yaml, SQL Client does not take per-job configuration
>>> from there so it does not work. Changing flink-conf.yaml actually changes
>>> cluster-wise configuration defaults, so restarting the cluster is needed.
>>>
>>> Also I found in Python Table API there were no per-job configuration for
>>> state backend which is frustrating and confusing. One will have to create
>>> StreamingExecutionEnvironment and use that to
>>> create StreamTableEnvironment, just to set per-job state backend.
>>>
>>> The docs could use some serious revamping. Hope this helps other people.
>>>
>>>
>>> On Thu, Mar 17, 2022 at 4:34 PM Martijn Visser <ma...@apache.org>
>>> wrote:
>>>
>>>> If you want to use S3 to store checkpoints, you'll have to configure
>>>> this in your flink-conf.yaml. Keep in mind that if you make a change in
>>>> that file, you'll have to restart your cluster before they have any effect.
>>>>
>>>> On Wed, 16 Mar 2022 at 17:31, dz902 <dz...@gmail.com> wrote:
>>>>
>>>>> So anyway my question is, how would I use S3 to store checkpoints? I
>>>>> could even store savepoints fine with S3, just that I was unable to change
>>>>> state backend and checkpoints dir for SQL Client.
>>>>>
>>>>> On Thu, Mar 17, 2022 at 12:25 AM dz902 <dz...@gmail.com> wrote:
>>>>>
>>>>>> I believe we were not on the same page (literally) :)
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>> On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <
>>>>>> martijnvisser@apache.org> wrote:
>>>>>>
>>>>>>> Hi dz902,
>>>>>>>
>>>>>>> I actually can't find that sentence on the website you've linked to.
>>>>>>> It does state "The following sections list all available options that can
>>>>>>> be used to adjust Flink Table & SQL API programs.". So that list are the
>>>>>>> available options that you can use. The options that you're trying are not
>>>>>>> included in the list, you can't set those options from the SQL Client.
>>>>>>>
>>>>>>> Options that you set in flink-conf.yaml are applicable to the jobs
>>>>>>> on the cluster, so also the ones created via the SQL Client.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Martijn Visser
>>>>>>> https://twitter.com/MartijnVisser82
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Per SQL Lite doc (
>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>>>>>>>> I see this:
>>>>>>>>
>>>>>>>> > SQL Client Configuration
>>>>>>>> > You can configure the SQL client by setting the options below, or
>>>>>>>> any valid Flink configuration entry:
>>>>>>>>
>>>>>>>> So any valid Flink configuration should work? For example,
>>>>>>>> execution.checkpointing.interval='10s' works despite not being listed there.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> If I remember correctly, set operations supports only a limited
>>>>>>>>> set of configurations.
>>>>>>>>>
>>>>>>>>> Most of them are table options that are listed on table
>>>>>>>>> configuration [1] plus some pipeline options.
>>>>>>>>>
>>>>>>>>> State backend options are not likely one of them.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Paul Lam
>>>>>>>>>
>>>>>>>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>>>>>>>
>>>>>>>>> Just tried editing flink-conf.yaml and it seems SQL Client does
>>>>>>>>> not respect that also. Is this an intended behavior?
>>>>>>>>>
>>>>>>>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I
>>>>>>>>>> tried combination of:
>>>>>>>>>>
>>>>>>>>>> SET state.backend='filesystem';
>>>>>>>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>>>>>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>>>>>>>> SET state.checkpoint-storage='filesystem'
>>>>>>>>>>
>>>>>>>>>> As well as:
>>>>>>>>>>
>>>>>>>>>> SET state.backend='hashmap';
>>>>>>>>>>
>>>>>>>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to
>>>>>>>>>> do it.
>>>>>>>>>>
>>>>>>>>>> None worked. In the Web UI I see checkpoints being made to the
>>>>>>>>>> Job Manager continuously. Configuration reads:
>>>>>>>>>>
>>>>>>>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>>>>>>>> - State Backend: HashMapStateBackend
>>>>>>>>>>
>>>>>>>>>> Is this a bug? Is there a way to set state backend to S3 using
>>>>>>>>>> SQL Client?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Dai
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>

Re: Setting S3 as State Backend in SQL Client

Posted by dz902 <dz...@gmail.com>.
Hi Dian,

I have tried that with Flink v1.14 on YARN using session mode.
Unfortunately it does not work. However, StreamExecutionContext
set_state_backend() worked.

On Mon, Mar 21, 2022 at 1:11 PM Dian Fu <di...@gmail.com> wrote:

> Hi Dai,
>
> Regarding how to set per-job configuration for the state backend, could
> you check if the documentation [1] is what you wanted?
>
> Regards,
> Dian
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table/table_environment/#statebackend-checkpoint-and-restart-strategy
>
> On Thu, Mar 17, 2022 at 5:43 PM dz902 <dz...@gmail.com> wrote:
>
>> Thanks for the info!
>>
>> I guess one cannot change state backend within SQL Client in v1.13 and
>> v1.14. I have tested both.
>>
>> As of flink-conf.yaml, SQL Client does not take per-job configuration
>> from there so it does not work. Changing flink-conf.yaml actually changes
>> cluster-wise configuration defaults, so restarting the cluster is needed.
>>
>> Also I found in Python Table API there were no per-job configuration for
>> state backend which is frustrating and confusing. One will have to create
>> StreamingExecutionEnvironment and use that to
>> create StreamTableEnvironment, just to set per-job state backend.
>>
>> The docs could use some serious revamping. Hope this helps other people.
>>
>>
>> On Thu, Mar 17, 2022 at 4:34 PM Martijn Visser <ma...@apache.org>
>> wrote:
>>
>>> If you want to use S3 to store checkpoints, you'll have to configure
>>> this in your flink-conf.yaml. Keep in mind that if you make a change in
>>> that file, you'll have to restart your cluster before they have any effect.
>>>
>>> On Wed, 16 Mar 2022 at 17:31, dz902 <dz...@gmail.com> wrote:
>>>
>>>> So anyway my question is, how would I use S3 to store checkpoints? I
>>>> could even store savepoints fine with S3, just that I was unable to change
>>>> state backend and checkpoints dir for SQL Client.
>>>>
>>>> On Thu, Mar 17, 2022 at 12:25 AM dz902 <dz...@gmail.com> wrote:
>>>>
>>>>> I believe we were not on the same page (literally) :)
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>> On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <
>>>>> martijnvisser@apache.org> wrote:
>>>>>
>>>>>> Hi dz902,
>>>>>>
>>>>>> I actually can't find that sentence on the website you've linked to.
>>>>>> It does state "The following sections list all available options that can
>>>>>> be used to adjust Flink Table & SQL API programs.". So that list are the
>>>>>> available options that you can use. The options that you're trying are not
>>>>>> included in the list, you can't set those options from the SQL Client.
>>>>>>
>>>>>> Options that you set in flink-conf.yaml are applicable to the jobs on
>>>>>> the cluster, so also the ones created via the SQL Client.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Martijn Visser
>>>>>> https://twitter.com/MartijnVisser82
>>>>>>
>>>>>>
>>>>>> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Per SQL Lite doc (
>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>>>>>>> I see this:
>>>>>>>
>>>>>>> > SQL Client Configuration
>>>>>>> > You can configure the SQL client by setting the options below, or
>>>>>>> any valid Flink configuration entry:
>>>>>>>
>>>>>>> So any valid Flink configuration should work? For example,
>>>>>>> execution.checkpointing.interval='10s' works despite not being listed there.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> If I remember correctly, set operations supports only a limited set
>>>>>>>> of configurations.
>>>>>>>>
>>>>>>>> Most of them are table options that are listed on table
>>>>>>>> configuration [1] plus some pipeline options.
>>>>>>>>
>>>>>>>> State backend options are not likely one of them.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Paul Lam
>>>>>>>>
>>>>>>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>>>>>>
>>>>>>>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>>>>>>>> respect that also. Is this an intended behavior?
>>>>>>>>
>>>>>>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I
>>>>>>>>> tried combination of:
>>>>>>>>>
>>>>>>>>> SET state.backend='filesystem';
>>>>>>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>>>>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>>>>>>> SET state.checkpoint-storage='filesystem'
>>>>>>>>>
>>>>>>>>> As well as:
>>>>>>>>>
>>>>>>>>> SET state.backend='hashmap';
>>>>>>>>>
>>>>>>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> None worked. In the Web UI I see checkpoints being made to the Job
>>>>>>>>> Manager continuously. Configuration reads:
>>>>>>>>>
>>>>>>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>>>>>>> - State Backend: HashMapStateBackend
>>>>>>>>>
>>>>>>>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>>>>>>>> Client?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dai
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>

Re: Setting S3 as State Backend in SQL Client

Posted by Dian Fu <di...@gmail.com>.
Hi Dai,

Regarding how to set per-job configuration for the state backend, could you
check if the documentation [1] is what you wanted?

Regards,
Dian

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table/table_environment/#statebackend-checkpoint-and-restart-strategy

On Thu, Mar 17, 2022 at 5:43 PM dz902 <dz...@gmail.com> wrote:

> Thanks for the info!
>
> I guess one cannot change state backend within SQL Client in v1.13 and
> v1.14. I have tested both.
>
> As of flink-conf.yaml, SQL Client does not take per-job configuration from
> there so it does not work. Changing flink-conf.yaml actually changes
> cluster-wise configuration defaults, so restarting the cluster is needed.
>
> Also I found in Python Table API there were no per-job configuration for
> state backend which is frustrating and confusing. One will have to create
> StreamingExecutionEnvironment and use that to
> create StreamTableEnvironment, just to set per-job state backend.
>
> The docs could use some serious revamping. Hope this helps other people.
>
>
> On Thu, Mar 17, 2022 at 4:34 PM Martijn Visser <ma...@apache.org>
> wrote:
>
>> If you want to use S3 to store checkpoints, you'll have to configure this
>> in your flink-conf.yaml. Keep in mind that if you make a change in that
>> file, you'll have to restart your cluster before they have any effect.
>>
>> On Wed, 16 Mar 2022 at 17:31, dz902 <dz...@gmail.com> wrote:
>>
>>> So anyway my question is, how would I use S3 to store checkpoints? I
>>> could even store savepoints fine with S3, just that I was unable to change
>>> state backend and checkpoints dir for SQL Client.
>>>
>>> On Thu, Mar 17, 2022 at 12:25 AM dz902 <dz...@gmail.com> wrote:
>>>
>>>> I believe we were not on the same page (literally) :)
>>>>
>>>> [image: image.png]
>>>>
>>>> On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <
>>>> martijnvisser@apache.org> wrote:
>>>>
>>>>> Hi dz902,
>>>>>
>>>>> I actually can't find that sentence on the website you've linked to.
>>>>> It does state "The following sections list all available options that can
>>>>> be used to adjust Flink Table & SQL API programs.". So that list are the
>>>>> available options that you can use. The options that you're trying are not
>>>>> included in the list, you can't set those options from the SQL Client.
>>>>>
>>>>> Options that you set in flink-conf.yaml are applicable to the jobs on
>>>>> the cluster, so also the ones created via the SQL Client.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Martijn Visser
>>>>> https://twitter.com/MartijnVisser82
>>>>>
>>>>>
>>>>> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Per SQL Lite doc (
>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>>>>>> I see this:
>>>>>>
>>>>>> > SQL Client Configuration
>>>>>> > You can configure the SQL client by setting the options below, or
>>>>>> any valid Flink configuration entry:
>>>>>>
>>>>>> So any valid Flink configuration should work? For example,
>>>>>> execution.checkpointing.interval='10s' works despite not being listed there.
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> If I remember correctly, set operations supports only a limited set
>>>>>>> of configurations.
>>>>>>>
>>>>>>> Most of them are table options that are listed on table
>>>>>>> configuration [1] plus some pipeline options.
>>>>>>>
>>>>>>> State backend options are not likely one of them.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>>>>>
>>>>>>> Best,
>>>>>>> Paul Lam
>>>>>>>
>>>>>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>>>>>
>>>>>>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>>>>>>> respect that also. Is this an intended behavior?
>>>>>>>
>>>>>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I
>>>>>>>> tried combination of:
>>>>>>>>
>>>>>>>> SET state.backend='filesystem';
>>>>>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>>>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>>>>>> SET state.checkpoint-storage='filesystem'
>>>>>>>>
>>>>>>>> As well as:
>>>>>>>>
>>>>>>>> SET state.backend='hashmap';
>>>>>>>>
>>>>>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do
>>>>>>>> it.
>>>>>>>>
>>>>>>>> None worked. In the Web UI I see checkpoints being made to the Job
>>>>>>>> Manager continuously. Configuration reads:
>>>>>>>>
>>>>>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>>>>>> - State Backend: HashMapStateBackend
>>>>>>>>
>>>>>>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>>>>>>> Client?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Dai
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>

Re: Setting S3 as State Backend in SQL Client

Posted by dz902 <dz...@gmail.com>.
Thanks for the info!

I guess one cannot change state backend within SQL Client in v1.13 and
v1.14. I have tested both.

As of flink-conf.yaml, SQL Client does not take per-job configuration from
there so it does not work. Changing flink-conf.yaml actually changes
cluster-wise configuration defaults, so restarting the cluster is needed.

Also I found in Python Table API there were no per-job configuration for
state backend which is frustrating and confusing. One will have to create
StreamingExecutionEnvironment and use that to
create StreamTableEnvironment, just to set per-job state backend.

The docs could use some serious revamping. Hope this helps other people.


On Thu, Mar 17, 2022 at 4:34 PM Martijn Visser <ma...@apache.org>
wrote:

> If you want to use S3 to store checkpoints, you'll have to configure this
> in your flink-conf.yaml. Keep in mind that if you make a change in that
> file, you'll have to restart your cluster before they have any effect.
>
> On Wed, 16 Mar 2022 at 17:31, dz902 <dz...@gmail.com> wrote:
>
>> So anyway my question is, how would I use S3 to store checkpoints? I
>> could even store savepoints fine with S3, just that I was unable to change
>> state backend and checkpoints dir for SQL Client.
>>
>> On Thu, Mar 17, 2022 at 12:25 AM dz902 <dz...@gmail.com> wrote:
>>
>>> I believe we were not on the same page (literally) :)
>>>
>>> [image: image.png]
>>>
>>> On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <ma...@apache.org>
>>> wrote:
>>>
>>>> Hi dz902,
>>>>
>>>> I actually can't find that sentence on the website you've linked to. It
>>>> does state "The following sections list all available options that can be
>>>> used to adjust Flink Table & SQL API programs.". So that list are the
>>>> available options that you can use. The options that you're trying are not
>>>> included in the list, you can't set those options from the SQL Client.
>>>>
>>>> Options that you set in flink-conf.yaml are applicable to the jobs on
>>>> the cluster, so also the ones created via the SQL Client.
>>>>
>>>> Best regards,
>>>>
>>>> Martijn Visser
>>>> https://twitter.com/MartijnVisser82
>>>>
>>>>
>>>> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Per SQL Lite doc (
>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>>>>> I see this:
>>>>>
>>>>> > SQL Client Configuration
>>>>> > You can configure the SQL client by setting the options below, or
>>>>> any valid Flink configuration entry:
>>>>>
>>>>> So any valid Flink configuration should work? For example,
>>>>> execution.checkpointing.interval='10s' works despite not being listed there.
>>>>>
>>>>>
>>>>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> If I remember correctly, set operations supports only a limited set
>>>>>> of configurations.
>>>>>>
>>>>>> Most of them are table options that are listed on table configuration
>>>>>> [1] plus some pipeline options.
>>>>>>
>>>>>> State backend options are not likely one of them.
>>>>>>
>>>>>> [1]
>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>>>>
>>>>>> Best,
>>>>>> Paul Lam
>>>>>>
>>>>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>>>>
>>>>>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>>>>>> respect that also. Is this an intended behavior?
>>>>>>
>>>>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I
>>>>>>> tried combination of:
>>>>>>>
>>>>>>> SET state.backend='filesystem';
>>>>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>>>>> SET state.checkpoint-storage='filesystem'
>>>>>>>
>>>>>>> As well as:
>>>>>>>
>>>>>>> SET state.backend='hashmap';
>>>>>>>
>>>>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do
>>>>>>> it.
>>>>>>>
>>>>>>> None worked. In the Web UI I see checkpoints being made to the Job
>>>>>>> Manager continuously. Configuration reads:
>>>>>>>
>>>>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>>>>> - State Backend: HashMapStateBackend
>>>>>>>
>>>>>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>>>>>> Client?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>

Re: Setting S3 as State Backend in SQL Client

Posted by Martijn Visser <ma...@apache.org>.
If you want to use S3 to store checkpoints, you'll have to configure this
in your flink-conf.yaml. Keep in mind that if you make a change in that
file, you'll have to restart your cluster before they have any effect.

On Wed, 16 Mar 2022 at 17:31, dz902 <dz...@gmail.com> wrote:

> So anyway my question is, how would I use S3 to store checkpoints? I could
> even store savepoints fine with S3, just that I was unable to change state
> backend and checkpoints dir for SQL Client.
>
> On Thu, Mar 17, 2022 at 12:25 AM dz902 <dz...@gmail.com> wrote:
>
>> I believe we were not on the same page (literally) :)
>>
>> [image: image.png]
>>
>> On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <ma...@apache.org>
>> wrote:
>>
>>> Hi dz902,
>>>
>>> I actually can't find that sentence on the website you've linked to. It
>>> does state "The following sections list all available options that can be
>>> used to adjust Flink Table & SQL API programs.". So that list are the
>>> available options that you can use. The options that you're trying are not
>>> included in the list, you can't set those options from the SQL Client.
>>>
>>> Options that you set in flink-conf.yaml are applicable to the jobs on
>>> the cluster, so also the ones created via the SQL Client.
>>>
>>> Best regards,
>>>
>>> Martijn Visser
>>> https://twitter.com/MartijnVisser82
>>>
>>>
>>> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Per SQL Lite doc (
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>>>> I see this:
>>>>
>>>> > SQL Client Configuration
>>>> > You can configure the SQL client by setting the options below, or any
>>>> valid Flink configuration entry:
>>>>
>>>> So any valid Flink configuration should work? For example,
>>>> execution.checkpointing.interval='10s' works despite not being listed there.
>>>>
>>>>
>>>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> If I remember correctly, set operations supports only a limited set of
>>>>> configurations.
>>>>>
>>>>> Most of them are table options that are listed on table configuration
>>>>> [1] plus some pipeline options.
>>>>>
>>>>> State backend options are not likely one of them.
>>>>>
>>>>> [1]
>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>>>
>>>>> Best,
>>>>> Paul Lam
>>>>>
>>>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>>>
>>>>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>>>>> respect that also. Is this an intended behavior?
>>>>>
>>>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I
>>>>>> tried combination of:
>>>>>>
>>>>>> SET state.backend='filesystem';
>>>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>>>> SET state.checkpoint-storage='filesystem'
>>>>>>
>>>>>> As well as:
>>>>>>
>>>>>> SET state.backend='hashmap';
>>>>>>
>>>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
>>>>>>
>>>>>> None worked. In the Web UI I see checkpoints being made to the Job
>>>>>> Manager continuously. Configuration reads:
>>>>>>
>>>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>>>> - State Backend: HashMapStateBackend
>>>>>>
>>>>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>>>>> Client?
>>>>>>
>>>>>> Thanks,
>>>>>> Dai
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: Setting S3 as State Backend in SQL Client

Posted by dz902 <dz...@gmail.com>.
So anyway my question is, how would I use S3 to store checkpoints? I could
even store savepoints fine with S3, just that I was unable to change state
backend and checkpoints dir for SQL Client.

On Thu, Mar 17, 2022 at 12:25 AM dz902 <dz...@gmail.com> wrote:

> I believe we were not on the same page (literally) :)
>
> [image: image.png]
>
> On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <ma...@apache.org>
> wrote:
>
>> Hi dz902,
>>
>> I actually can't find that sentence on the website you've linked to. It
>> does state "The following sections list all available options that can be
>> used to adjust Flink Table & SQL API programs.". So that list are the
>> available options that you can use. The options that you're trying are not
>> included in the list, you can't set those options from the SQL Client.
>>
>> Options that you set in flink-conf.yaml are applicable to the jobs on the
>> cluster, so also the ones created via the SQL Client.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>>
>>
>> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Per SQL Lite doc (
>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>>> I see this:
>>>
>>> > SQL Client Configuration
>>> > You can configure the SQL client by setting the options below, or any
>>> valid Flink configuration entry:
>>>
>>> So any valid Flink configuration should work? For example,
>>> execution.checkpointing.interval='10s' works despite not being listed there.
>>>
>>>
>>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> If I remember correctly, set operations supports only a limited set of
>>>> configurations.
>>>>
>>>> Most of them are table options that are listed on table configuration
>>>> [1] plus some pipeline options.
>>>>
>>>> State backend options are not likely one of them.
>>>>
>>>> [1]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>>
>>>> Best,
>>>> Paul Lam
>>>>
>>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>>
>>>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>>>> respect that also. Is this an intended behavior?
>>>>
>>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I
>>>>> tried combination of:
>>>>>
>>>>> SET state.backend='filesystem';
>>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>>> SET state.checkpoint-storage='filesystem'
>>>>>
>>>>> As well as:
>>>>>
>>>>> SET state.backend='hashmap';
>>>>>
>>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
>>>>>
>>>>> None worked. In the Web UI I see checkpoints being made to the Job
>>>>> Manager continuously. Configuration reads:
>>>>>
>>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>>> - State Backend: HashMapStateBackend
>>>>>
>>>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>>>> Client?
>>>>>
>>>>> Thanks,
>>>>> Dai
>>>>>
>>>>>
>>>>>
>>>>

Re: Setting S3 as State Backend in SQL Client

Posted by dz902 <dz...@gmail.com>.
I believe we were not on the same page (literally) :)

[image: image.png]

On Wed, Mar 16, 2022 at 4:12 PM Martijn Visser <ma...@apache.org>
wrote:

> Hi dz902,
>
> I actually can't find that sentence on the website you've linked to. It
> does state "The following sections list all available options that can be
> used to adjust Flink Table & SQL API programs.". So that list are the
> available options that you can use. The options that you're trying are not
> included in the list, you can't set those options from the SQL Client.
>
> Options that you set in flink-conf.yaml are applicable to the jobs on the
> cluster, so also the ones created via the SQL Client.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
>
> On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:
>
>> Hi,
>>
>> Per SQL Lite doc (
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
>> I see this:
>>
>> > SQL Client Configuration
>> > You can configure the SQL client by setting the options below, or any
>> valid Flink configuration entry:
>>
>> So any valid Flink configuration should work? For example,
>> execution.checkpointing.interval='10s' works despite not being listed there.
>>
>>
>> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If I remember correctly, set operations supports only a limited set of
>>> configurations.
>>>
>>> Most of them are table options that are listed on table configuration
>>> [1] plus some pipeline options.
>>>
>>> State backend options are not likely one of them.
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>>
>>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>>> respect that also. Is this an intended behavior?
>>>
>>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I tried
>>>> combination of:
>>>>
>>>> SET state.backend='filesystem';
>>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>>> SET state.checkpoint-storage='filesystem'
>>>>
>>>> As well as:
>>>>
>>>> SET state.backend='hashmap';
>>>>
>>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
>>>>
>>>> None worked. In the Web UI I see checkpoints being made to the Job
>>>> Manager continuously. Configuration reads:
>>>>
>>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>>> - State Backend: HashMapStateBackend
>>>>
>>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>>> Client?
>>>>
>>>> Thanks,
>>>> Dai
>>>>
>>>>
>>>>
>>>

Re: Setting S3 as State Backend in SQL Client

Posted by Martijn Visser <ma...@apache.org>.
Hi dz902,

I actually can't find that sentence on the website you've linked to. It
does state "The following sections list all available options that can be
used to adjust Flink Table & SQL API programs.". So that list are the
available options that you can use. The options that you're trying are not
included in the list, you can't set those options from the SQL Client.

Options that you set in flink-conf.yaml are applicable to the jobs on the
cluster, so also the ones created via the SQL Client.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82


On Wed, 16 Mar 2022 at 08:43, dz902 <dz...@gmail.com> wrote:

> Hi,
>
> Per SQL Lite doc (
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
> I see this:
>
> > SQL Client Configuration
> > You can configure the SQL client by setting the options below, or any
> valid Flink configuration entry:
>
> So any valid Flink configuration should work? For example,
> execution.checkpointing.interval='10s' works despite not being listed there.
>
>
> On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com> wrote:
>
>> Hi,
>>
>> If I remember correctly, set operations supports only a limited set of
>> configurations.
>>
>> Most of them are table options that are listed on table configuration [1]
>> plus some pipeline options.
>>
>> State backend options are not likely one of them.
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>>
>> Best,
>> Paul Lam
>>
>> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>>
>> Just tried editing flink-conf.yaml and it seems SQL Client does not
>> respect that also. Is this an intended behavior?
>>
>> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm using Flink 1.14 and was unable to set S3 as state backend. I tried
>>> combination of:
>>>
>>> SET state.backend='filesystem';
>>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>>> SET state.checkpoint-storage='filesystem'
>>>
>>> As well as:
>>>
>>> SET state.backend='hashmap';
>>>
>>> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
>>>
>>> None worked. In the Web UI I see checkpoints being made to the Job
>>> Manager continuously. Configuration reads:
>>>
>>> - Checkpoint Storage: JobManagerCheckpointStorage
>>> - State Backend: HashMapStateBackend
>>>
>>> Is this a bug? Is there a way to set state backend to S3 using SQL
>>> Client?
>>>
>>> Thanks,
>>> Dai
>>>
>>>
>>>
>>

Re: Setting S3 as State Backend in SQL Client

Posted by dz902 <dz...@gmail.com>.
Hi,

Per SQL Lite doc (
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sqlclient/)
I see this:

> SQL Client Configuration
> You can configure the SQL client by setting the options below, or any
valid Flink configuration entry:

So any valid Flink configuration should work? For example,
execution.checkpointing.interval='10s' works despite not being listed there.


On Wed, Mar 16, 2022 at 3:07 PM Paul Lam <pa...@gmail.com> wrote:

> Hi,
>
> If I remember correctly, set operations supports only a limited set of
> configurations.
>
> Most of them are table options that are listed on table configuration [1]
> plus some pipeline options.
>
> State backend options are not likely one of them.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/
>
> Best,
> Paul Lam
>
> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
>
> Just tried editing flink-conf.yaml and it seems SQL Client does not
> respect that also. Is this an intended behavior?
>
> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm using Flink 1.14 and was unable to set S3 as state backend. I tried
>> combination of:
>>
>> SET state.backend='filesystem';
>> SET state.checkpoints.dir='s3://xxx/checkpoints/';
>> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
>> SET state.checkpoint-storage='filesystem'
>>
>> As well as:
>>
>> SET state.backend='hashmap';
>>
>> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
>>
>> None worked. In the Web UI I see checkpoints being made to the Job
>> Manager continuously. Configuration reads:
>>
>> - Checkpoint Storage: JobManagerCheckpointStorage
>> - State Backend: HashMapStateBackend
>>
>> Is this a bug? Is there a way to set state backend to S3 using SQL Client?
>>
>> Thanks,
>> Dai
>>
>>
>>
>

Re: Setting S3 as State Backend in SQL Client

Posted by Paul Lam <pa...@gmail.com>.
Hi,

If I remember correctly, set operations supports only a limited set of configurations.

Most of them are table options that are listed on table configuration [1] plus some pipeline options.

State backend options are not likely one of them.

[1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/ <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/config/>

Best,
Paul Lam

> 2022年3月15日 19:21,dz902 <dz...@gmail.com> 写道:
> 
> Just tried editing flink-conf.yaml and it seems SQL Client does not respect that also. Is this an intended behavior?
> 
> On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz902i@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> I'm using Flink 1.14 and was unable to set S3 as state backend. I tried combination of:
> 
> SET state.backend='filesystem';
> SET state.checkpoints.dir='s3://xxx/checkpoints/';
> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
> SET state.checkpoint-storage='filesystem'
> 
> As well as:
> 
> SET state.backend='hashmap';
> 
> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
> 
> None worked. In the Web UI I see checkpoints being made to the Job Manager continuously. Configuration reads:
> 
> - Checkpoint Storage: JobManagerCheckpointStorage
> - State Backend: HashMapStateBackend
> 
> Is this a bug? Is there a way to set state backend to S3 using SQL Client?
> 
> Thanks,
> Dai
> 
> 


Re: Setting S3 as State Backend in SQL Client

Posted by dz902 <dz...@gmail.com>.
Just tried editing flink-conf.yaml and it seems SQL Client does not respect
that also. Is this an intended behavior?

On Tue, Mar 15, 2022 at 7:14 PM dz902 <dz...@gmail.com> wrote:

> Hi,
>
> I'm using Flink 1.14 and was unable to set S3 as state backend. I tried
> combination of:
>
> SET state.backend='filesystem';
> SET state.checkpoints.dir='s3://xxx/checkpoints/';
> SET state.backend.fs.checkpointdir='s3://xxx/checkpoints/';
> SET state.checkpoint-storage='filesystem'
>
> As well as:
>
> SET state.backend='hashmap';
>
> Which covered both legacy 1.13 way to do it and 1.14 new way to do it.
>
> None worked. In the Web UI I see checkpoints being made to the Job Manager
> continuously. Configuration reads:
>
> - Checkpoint Storage: JobManagerCheckpointStorage
> - State Backend: HashMapStateBackend
>
> Is this a bug? Is there a way to set state backend to S3 using SQL Client?
>
> Thanks,
> Dai
>
>
>