You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Catlyn Kong <ca...@yelp.com> on 2019/08/27 17:54:40 UTC

Re: [External] Re: --state_backend PipelineOption not supported in python when running on Flink

Hi Max,

Thanks for getting back to me. Will create a ticket, my username is catlynk.

Cheers,
Catlyn

On Tue, Aug 27, 2019 at 3:50 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi Catlyn,
>
> This option has never worked outside the Java SDK where it originates
> from. For the upcoming Beam 2.16.0 release, we have replaced this option
> with a factory class:
>
> https://github.com/apache/beam/blob/da6c1a8f435f5583811785050808a2311db94047/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L152
>
> We could bundle a default factory class for the RocksDB state backend
> and make this easier to configure from Python. Do you mind opening a
> JIRA issue for this? I can give you permissions for
> https://issues.apache.org/jira/projects/BEAM/issues if you create an
> account.
>
> In any case, you could also configure the Flink cluster to use RocksDB.
> Beam will use whatever state backend is configured.
>
> Thanks,
> Max
>
> On 23.08.19 02:57, Catlyn Kong wrote:
> > Hi all,
> >
> > I'm experimenting with checkpoints/savepoints in Beam (version 2.14)
> > when using a Flink (version 1.6.4) runner. Flink was able to take
> > periodic checkpoints when I setup the flink-conf.yaml correctly. But I
> > was thinking if it's possible to set the StateBackend on a per job level
> > by flagging the --state_backend=RocksDBStateBackend option since it's
> > said to be supported here
> > <https://beam.apache.org/documentation/runners/flink/>.
> >
> > But instead I got the following error:
> > RuntimeError: Pipeline failed in state FAILED:
> > com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot
> > construct instance of `org.apache.flink.runtime.state.StateBackend` (no
> > Creators, like default construct, exist): abstract types either need to
> > be mapped to concrete types, have custom deserializer, or contain
> > additional type information
> >  at [Source: (String)""RocksDBStateBackend""; line: 1, column: 1]
> >
> > I then saw that there's:
> > @JasonIgnore
> > StateBackend getStateBackend();
> >
> > I'm wondering if this is not supported in python yet? If yes then do we
> > have plans to support this in the near future?
> >
> > Best,
> > Catlyn
>

Re: [External] Re: --state_backend PipelineOption not supported in python when running on Flink

Posted by Catlyn Kong <ca...@yelp.com>.
Hi Max,

Filed https://issues.apache.org/jira/browse/BEAM-8112 to track this. It'll
be nice to also point it out in the python documentation here that it's not
supported yet: https://beam.apache.org/documentation/runners/flink/.

Thanks,
Catlyn

On Wed, Aug 28, 2019 at 2:50 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi Catlyn,
>
> I granted you contributor permissions in JIRA which allows you to
> create/assign tickets.
>
> Cheers,
> Max
>
> On 27.08.19 19:54, Catlyn Kong wrote:
> > Hi Max,
> >
> > Thanks for getting back to me. Will create a ticket, my username is
> catlynk.
> >
> > Cheers,
> > Catlyn
> >
> > On Tue, Aug 27, 2019 at 3:50 AM Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Hi Catlyn,
> >
> >     This option has never worked outside the Java SDK where it originates
> >     from. For the upcoming Beam 2.16.0 release, we have replaced this
> >     option
> >     with a factory class:
> >
> https://github.com/apache/beam/blob/da6c1a8f435f5583811785050808a2311db94047/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L152
> >
> >     We could bundle a default factory class for the RocksDB state backend
> >     and make this easier to configure from Python. Do you mind opening a
> >     JIRA issue for this? I can give you permissions for
> >     https://issues.apache.org/jira/projects/BEAM/issues if you create an
> >     account.
> >
> >     In any case, you could also configure the Flink cluster to use
> RocksDB.
> >     Beam will use whatever state backend is configured.
> >
> >     Thanks,
> >     Max
> >
> >     On 23.08.19 02:57, Catlyn Kong wrote:
> >      > Hi all,
> >      >
> >      > I'm experimenting with checkpoints/savepoints in Beam (version
> 2.14)
> >      > when using a Flink (version 1.6.4) runner. Flink was able to take
> >      > periodic checkpoints when I setup the flink-conf.yaml correctly.
> >     But I
> >      > was thinking if it's possible to set the StateBackend on a per
> >     job level
> >      > by flagging the --state_backend=RocksDBStateBackend option since
> it's
> >      > said to be supported here
> >      > <https://beam.apache.org/documentation/runners/flink/>.
> >      >
> >      > But instead I got the following error:
> >      > RuntimeError: Pipeline failed in state FAILED:
> >      > com.fasterxml.jackson.databind.exc.InvalidDefinitionException:
> Cannot
> >      > construct instance of
> >     `org.apache.flink.runtime.state.StateBackend` (no
> >      > Creators, like default construct, exist): abstract types either
> >     need to
> >      > be mapped to concrete types, have custom deserializer, or contain
> >      > additional type information
> >      >  at [Source: (String)""RocksDBStateBackend""; line: 1, column: 1]
> >      >
> >      > I then saw that there's:
> >      > @JasonIgnore
> >      > StateBackend getStateBackend();
> >      >
> >      > I'm wondering if this is not supported in python yet? If yes then
> >     do we
> >      > have plans to support this in the near future?
> >      >
> >      > Best,
> >      > Catlyn
> >
>

Re: [External] Re: --state_backend PipelineOption not supported in python when running on Flink

Posted by Maximilian Michels <mx...@apache.org>.
Hi Catlyn,

I granted you contributor permissions in JIRA which allows you to 
create/assign tickets.

Cheers,
Max

On 27.08.19 19:54, Catlyn Kong wrote:
> Hi Max,
> 
> Thanks for getting back to me. Will create a ticket, my username is catlynk.
> 
> Cheers,
> Catlyn
> 
> On Tue, Aug 27, 2019 at 3:50 AM Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Hi Catlyn,
> 
>     This option has never worked outside the Java SDK where it originates
>     from. For the upcoming Beam 2.16.0 release, we have replaced this
>     option
>     with a factory class:
>     https://github.com/apache/beam/blob/da6c1a8f435f5583811785050808a2311db94047/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L152
> 
>     We could bundle a default factory class for the RocksDB state backend
>     and make this easier to configure from Python. Do you mind opening a
>     JIRA issue for this? I can give you permissions for
>     https://issues.apache.org/jira/projects/BEAM/issues if you create an
>     account.
> 
>     In any case, you could also configure the Flink cluster to use RocksDB.
>     Beam will use whatever state backend is configured.
> 
>     Thanks,
>     Max
> 
>     On 23.08.19 02:57, Catlyn Kong wrote:
>      > Hi all,
>      >
>      > I'm experimenting with checkpoints/savepoints in Beam (version 2.14)
>      > when using a Flink (version 1.6.4) runner. Flink was able to take
>      > periodic checkpoints when I setup the flink-conf.yaml correctly.
>     But I
>      > was thinking if it's possible to set the StateBackend on a per
>     job level
>      > by flagging the --state_backend=RocksDBStateBackend option since it's
>      > said to be supported here
>      > <https://beam.apache.org/documentation/runners/flink/>.
>      >
>      > But instead I got the following error:
>      > RuntimeError: Pipeline failed in state FAILED:
>      > com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot
>      > construct instance of
>     `org.apache.flink.runtime.state.StateBackend` (no
>      > Creators, like default construct, exist): abstract types either
>     need to
>      > be mapped to concrete types, have custom deserializer, or contain
>      > additional type information
>      >  at [Source: (String)""RocksDBStateBackend""; line: 1, column: 1]
>      >
>      > I then saw that there's:
>      > @JasonIgnore
>      > StateBackend getStateBackend();
>      >
>      > I'm wondering if this is not supported in python yet? If yes then
>     do we
>      > have plans to support this in the near future?
>      >
>      > Best,
>      > Catlyn
>