You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by bupt_ljy <bu...@163.com> on 2018/08/17 18:54:42 UTC

Re: [Proposal] Utilities for reading, transforming and creatingStreaming savepoints

Hi,
+1, I think it will be a very great tool for Flink, especially the creating new state part. On production, we’re really worried about the availability of the savepoints, because the generating logic is inside Flink and we don’t have a good way to validate it. But with this tool, we can construct a new state for our programs very soon even if the savepoints data is broken.
It’s great, thanks!


Original Message
Sender:Jamie Grierjgrier@lyft.com
Recipient:devdev@flink.apache.org
Date:Saturday, Aug 18, 2018 02:32
Subject:Re: [Proposal] Utilities for reading, transforming and creatingStreaming savepoints


This is great, Gyula! A colleague here at Lyft has also done some work around bootstrapping DataStream programs and we've also talked a bit about doing this by running DataSet programs. On Fri, Aug 17, 2018 at 3:28 AM, Gyula Fóra gyula.fora@gmail.com wrote:  Hi All!   I want to share with you a little project we have been working on at King  (with some help from some dataArtisans folks). I think this would be a  valuable addition to Flink and solve a bunch of outstanding production  use-cases and headaches around state bootstrapping and state analytics.   We have built a quick and dirty POC implementation on top of Flink 1.6,  please check the README for some nice examples to get a quick idea:   https://github.com/king/bravo   *Short story*  Bravo is a convenient state reader and writer library leveraging the  Flink’s batch processing capabilities. It supports processing and writing  Flink streaming savepoints. At the moment it only supports processing  RocksDB savepoints but this can be extended in the future for other state  backends and checkpoint types.   Our goal is to cover a few basic features:   - Converting keyed states to Flink DataSets for processing and analytics  - Reading/Writing non-keyed operators states  - Bootstrap keyed states from Flink DataSets and create new valid  savepoints  - Transform existing savepoints by replacing/changing some states    Some example use-cases:   - Point-in-time state analytics across all operators and keys  - Bootstrap state of a streaming job from external resources such as  reading from database/filesystem  - Validate and potentially repair corrupted state of a streaming job  - Change max parallelism of a job    Our main goal is to start working together with other Flink production  users and make this something useful that can be part of Flink. So if you  have use-cases please talk to us :)  I have also started a google doc which contains a little bit more info than  the readme and could be a starting place for discussions:   https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpw  dhqBMr-ppkFL5E/edit?usp=sharing   I know there are a bunch of rough edges and bugs (and no tests) but our  motto is: If you are not embarrassed, you released too late :)   Please let me know what you think!   Cheers,  Gyula

Re: [Proposal] Utilities for reading, transforming and creatingStreaming savepoints

Posted by 陈梓立 <wa...@gmail.com>.
Hi,

also +1.
As vivo said, SavePoint is not compatible.
I have heard from a lot of users just said "my previous program does not
work any more!".
If these utilities provide such migration functions, it would be perfect!

Best,
tison.


vino yang <ya...@gmail.com> 于2018年8月18日周六 上午11:04写道:

> Hi,
>
> +1, from my side. Considering that Savepoint is not compatible with every
> version of Flink.
> It would be very useful if it could do some conversion between different
> versions of Flink Savepoint.
>
> Thanks, vino.
>
> bupt_ljy <bu...@163.com> 于2018年8月18日周六 上午2:55写道:
>
> > Hi,
> > +1, I think it will be a very great tool for Flink, especially the
> > creating new state part. On production, we’re really worried about the
> > availability of the savepoints, because the generating logic is inside
> > Flink and we don’t have a good way to validate it. But with this tool, we
> > can construct a new state for our programs very soon even if the
> savepoints
> > data is broken.
> > It’s great, thanks!
> >
> >
> > Original Message
> > Sender:Jamie Grierjgrier@lyft.com
> > Recipient:devdev@flink.apache.org
> > Date:Saturday, Aug 18, 2018 02:32
> > Subject:Re: [Proposal] Utilities for reading, transforming and
> > creatingStreaming savepoints
> >
> >
> > This is great, Gyula! A colleague here at Lyft has also done some work
> > around bootstrapping DataStream programs and we've also talked a bit
> about
> > doing this by running DataSet programs. On Fri, Aug 17, 2018 at 3:28 AM,
> > Gyula Fóra gyula.fora@gmail.com wrote:  Hi All!   I want to share with
> > you a little project we have been working on at King  (with some help
> from
> > some dataArtisans folks). I think this would be a  valuable addition to
> > Flink and solve a bunch of outstanding production  use-cases and
> headaches
> > around state bootstrapping and state analytics.   We have built a quick
> and
> > dirty POC implementation on top of Flink 1.6,  please check the README
> for
> > some nice examples to get a quick idea:   https://github.com/king/bravo
> >  *Short story*  Bravo is a convenient state reader and writer library
> > leveraging the  Flink’s batch processing capabilities. It supports
> > processing and writing  Flink streaming savepoints. At the moment it only
> > supports processing  RocksDB savepoints but this can be extended in the
> > future for other state  backends and checkpoint types.   Our goal is to
> > cover a few basic features:   - Converting keyed states to Flink DataSets
> > for processing and analytics  - Reading/Writing non-keyed operators
> states
> > - Bootstrap keyed states from Flink DataSets and create new valid
> > savepoints  - Transform existing savepoints by replacing/changing some
> > states    Some example use-cases:   - Point-in-time state analytics
> across
> > all operators and keys  - Bootstrap state of a streaming job from
> external
> > resources such as  reading from database/filesystem  - Validate and
> > potentially repair corrupted state of a streaming job  - Change max
> > parallelism of a job    Our main goal is to start working together with
> > other Flink production  users and make this something useful that can be
> > part of Flink. So if you  have use-cases please talk to us :)  I have
> also
> > started a google doc which contains a little bit more info than  the
> readme
> > and could be a starting place for discussions:
> > https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpw
> > dhqBMr-ppkFL5E/edit?usp=sharing   I know there are a bunch of rough edges
> > and bugs (and no tests) but our  motto is: If you are not embarrassed,
> you
> > released too late :)   Please let me know what you think!   Cheers,
> Gyula
>

Re: [Proposal] Utilities for reading, transforming and creatingStreaming savepoints

Posted by vino yang <ya...@gmail.com>.
Hi,

+1, from my side. Considering that Savepoint is not compatible with every
version of Flink.
It would be very useful if it could do some conversion between different
versions of Flink Savepoint.

Thanks, vino.

bupt_ljy <bu...@163.com> 于2018年8月18日周六 上午2:55写道:

> Hi,
> +1, I think it will be a very great tool for Flink, especially the
> creating new state part. On production, we’re really worried about the
> availability of the savepoints, because the generating logic is inside
> Flink and we don’t have a good way to validate it. But with this tool, we
> can construct a new state for our programs very soon even if the savepoints
> data is broken.
> It’s great, thanks!
>
>
> Original Message
> Sender:Jamie Grierjgrier@lyft.com
> Recipient:devdev@flink.apache.org
> Date:Saturday, Aug 18, 2018 02:32
> Subject:Re: [Proposal] Utilities for reading, transforming and
> creatingStreaming savepoints
>
>
> This is great, Gyula! A colleague here at Lyft has also done some work
> around bootstrapping DataStream programs and we've also talked a bit about
> doing this by running DataSet programs. On Fri, Aug 17, 2018 at 3:28 AM,
> Gyula Fóra gyula.fora@gmail.com wrote:  Hi All!   I want to share with
> you a little project we have been working on at King  (with some help from
> some dataArtisans folks). I think this would be a  valuable addition to
> Flink and solve a bunch of outstanding production  use-cases and headaches
> around state bootstrapping and state analytics.   We have built a quick and
> dirty POC implementation on top of Flink 1.6,  please check the README for
> some nice examples to get a quick idea:   https://github.com/king/bravo
>  *Short story*  Bravo is a convenient state reader and writer library
> leveraging the  Flink’s batch processing capabilities. It supports
> processing and writing  Flink streaming savepoints. At the moment it only
> supports processing  RocksDB savepoints but this can be extended in the
> future for other state  backends and checkpoint types.   Our goal is to
> cover a few basic features:   - Converting keyed states to Flink DataSets
> for processing and analytics  - Reading/Writing non-keyed operators states
> - Bootstrap keyed states from Flink DataSets and create new valid
> savepoints  - Transform existing savepoints by replacing/changing some
> states    Some example use-cases:   - Point-in-time state analytics across
> all operators and keys  - Bootstrap state of a streaming job from external
> resources such as  reading from database/filesystem  - Validate and
> potentially repair corrupted state of a streaming job  - Change max
> parallelism of a job    Our main goal is to start working together with
> other Flink production  users and make this something useful that can be
> part of Flink. So if you  have use-cases please talk to us :)  I have also
> started a google doc which contains a little bit more info than  the readme
> and could be a starting place for discussions:
> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpw
> dhqBMr-ppkFL5E/edit?usp=sharing   I know there are a bunch of rough edges
> and bugs (and no tests) but our  motto is: If you are not embarrassed, you
> released too late :)   Please let me know what you think!   Cheers,  Gyula