You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Maximilian Michels <mx...@apache.org> on 2019/05/08 18:45:28 UTC

Coder Evolution

Hi,

I'm looking into updating the Flink Runner to Flink version 1.8. Since 
version 1.7 Flink has a new optional interface for Coder evolution*.

When a Flink pipeline is checkpointed, CoderSnapshots are written out 
alongside with the checkpointed data. When the pipeline is restored from 
that checkpoint, the CoderSnapshots are restored and used to 
reinstantiate the Coders.

Furthermore, there is a compatibility and migration check between the 
old and the new Coder. This allows to determine whether

  - The serializer did not change or is compatible (ok)
  - The serialization format of the coder changed (ok after migration)
  - The coder needs to be reconfigured and we know how to that based on
    the old version (ok after reconfiguration)
  - The coder is incompatible (error)

I was wondering about the Coder evolution story in Beam. The current 
state is that checkpointed Beam pipelines are only guaranteed to run 
with the same Beam version and pipeline version. A newer version of 
either might break the checkpoint format without any way to migrate the 
state.

Should we start thinking about supporting Coder evolution in Beam?

Thanks,
Max


* Coders are called TypeSerializers in Flink land. The interface is 
TypeSerializerSnapshot.

Re: Coder Evolution

Posted by Lukasz Cwik <lc...@google.com>.
Yes, having evolution actually work is quite difficult. For example, take
the case of a map based side input where you try to lookup some value by a
key. The runner will have stored a bunch of this data using the old format,
would you ask that lookups are done using the old format or the new format
or would you recode all the data from the old format to the new format
(assuming our SDKs/protocols could do all of these things)?

1) Lookups using either the old format or new format have an issue where
you'll lookup the wrong thing because the old format may encode completely
differently then the new format.
2) Recoding all the data from the old format to the new format may reduce
the number of keys. For example lets say we have a schema that has two
fields A and B that is used as the key and that schema is now reduced to
just the first field A. How do you combine all the values that map onto the
same encoded key (this is where the discussion of only mergeable state vs
special SDK methods such as OnMerge is very relevant[1])?

I haven't seen any solution where you perform encoding lineage tracking
that works across multiple "evolutions" and recoding the data requires us
to have a solution for dealing with logically merging user data stored by
the runner.

1:
https://lists.apache.org/thread.html/ccc0d548e440b63897b6784cd7896c266498df64c9c63ce6c52ae098@%3Cdev.beam.apache.org%3E

On Fri, May 10, 2019 at 6:30 AM Maximilian Michels <mx...@apache.org> wrote:

> Thanks for the references Luke! I thought that there may have been prior
> discussions, so this thread could be a good place to consolidate.
>
> > Dataflow also has an update feature, but it's limited by the fact that
> Beam does not have a good concept of Coder evolution. As a result we try
> very hard never to change import Coders,
>
> Trying not to break Coders is a fair approach and could work fine for
> Beam itself, if the Coders were designed really carefully. But what
> about custom Coders users may have written? AvroCoder or ProtoCoder
> would be good candidates for forwards-compatibility, but even these do
> not have migration functionality built in.
>
> Is schema evolution already part of SchemaCoder? It's definitely a good
> candidate for evolution because a schema provides the insight-view for a
> Coder, but as for how to actually perform the evolution, it looks like
> this is still an open question.
>
> -Max
>
> On 09.05.19 18:56, Reuven Lax wrote:
> > Dataflow also has an update feature, but it's limited by the fact that
> > Beam does not have a good concept of Coder evolution. As a result we try
> > very hard never to change import Coders, which sometime makes
> > development of parts of Beam much more difficult. I think Beam would
> > benefit greatly by having a first-class concept of Coder evolution.
> >
> > BTW for schemas there is a natural way of defining evolution that can be
> > handled by SchemaCoder.
> >
> > On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <lcwik@google.com
> > <ma...@google.com>> wrote:
> >
> >     There was a thread about coder update in the past here[1]. Also,
> >     Reuven sent out a doc[2] about pipeline drain and update which was
> >     discussed in this thread[3]. I believe there have been more
> >     references to pipeline update in other threads when people tried to
> >     change coder encodings in the past as well.
> >
> >     Reuven/Dan are the best contacts about this on how this works inside
> >     of Google, the limitations and other ideas that had been proposed.
> >
> >     1:
> >
> https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
> >     2:
> >
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
> >     3:
> >
> https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
> >
> >     On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mxm@apache.org
> >     <ma...@apache.org>> wrote:
> >
> >         Hi,
> >
> >         I'm looking into updating the Flink Runner to Flink version 1.8.
> >         Since
> >         version 1.7 Flink has a new optional interface for Coder
> evolution*.
> >
> >         When a Flink pipeline is checkpointed, CoderSnapshots are
> >         written out
> >         alongside with the checkpointed data. When the pipeline is
> >         restored from
> >         that checkpoint, the CoderSnapshots are restored and used to
> >         reinstantiate the Coders.
> >
> >         Furthermore, there is a compatibility and migration check
> >         between the
> >         old and the new Coder. This allows to determine whether
> >
> >            - The serializer did not change or is compatible (ok)
> >            - The serialization format of the coder changed (ok after
> >         migration)
> >            - The coder needs to be reconfigured and we know how to that
> >         based on
> >              the old version (ok after reconfiguration)
> >            - The coder is incompatible (error)
> >
> >         I was wondering about the Coder evolution story in Beam. The
> >         current
> >         state is that checkpointed Beam pipelines are only guaranteed to
> >         run
> >         with the same Beam version and pipeline version. A newer version
> of
> >         either might break the checkpoint format without any way to
> >         migrate the
> >         state.
> >
> >         Should we start thinking about supporting Coder evolution in
> Beam?
> >
> >         Thanks,
> >         Max
> >
> >
> >         * Coders are called TypeSerializers in Flink land. The interface
> is
> >         TypeSerializerSnapshot.
> >
>

Re: Coder Evolution

Posted by Kenneth Knowles <ke...@apache.org>.
This needs additional design and discussion for schema and non-schema
cases. The two should hopefully fit together (where schema-oriented
compatibility is a particular implementation within the more generic
framework) but I wouldn't take it as a requirement. So far we have focused
on compatibility, not migration. So that's inherently limited but has a
simple API.

In the schema case I think there are many mature relational migration
frameworks that have tackled forward and backward migration with great
success. And they are much more analogous to pipeline update scenarios that
protobuf, which is more targeted to independent services evolving at
different cadences and often without coordination. These approaches would
probably complement Flink's notion of having a migration from one
serialization format to another. Two that I've used in past lives are
Django (Python web apps) and iOS migrations. There's nothing radical. Just
small reversible transformations you author, with support libraries doing
the heavy lifting.

Kenn

*From: *Lukasz Cwik <lc...@google.com>
*Date: *Mon, May 13, 2019 at 2:29 PM
*To: *dev

Coder evolution is a critical part for both updating and rolling back a
> pipeline. One way evolution would make rollbacks more difficult for users.
>
> Only allowing resuming from snapshots to rollback pipelines if helpful to
> recover pipelines where internal pipeline state is known to be bad like
> user state but it ignores any external side effects that may have been
> correct.
> Being able to perform a rollback would allow people to keep internal state
> changes that would reflect external writes if the known user state is good.
>
> *From: *Reuven Lax <re...@google.com>
> *Date: *Mon, May 13, 2019 at 12:33 PM
> *To: *dev
>
> For schemas it requires some design and discussion. One approach is to
>> allow one-way evolution the way protos and BigQuery does. Essentially this
>> means we allow adding new fields and making existing fields options, and
>> any other change is disallowed.
>>
>> *From: *Maximilian Michels <mx...@apache.org>
>> *Date: *Fri, May 10, 2019 at 6:30 AM
>> *To: * <de...@beam.apache.org>
>>
>> Thanks for the references Luke! I thought that there may have been prior
>>> discussions, so this thread could be a good place to consolidate.
>>>
>>> > Dataflow also has an update feature, but it's limited by the fact that
>>> Beam does not have a good concept of Coder evolution. As a result we try
>>> very hard never to change import Coders,
>>>
>>> Trying not to break Coders is a fair approach and could work fine for
>>> Beam itself, if the Coders were designed really carefully. But what
>>> about custom Coders users may have written? AvroCoder or ProtoCoder
>>> would be good candidates for forwards-compatibility, but even these do
>>> not have migration functionality built in.
>>>
>>> Is schema evolution already part of SchemaCoder? It's definitely a good
>>> candidate for evolution because a schema provides the insight-view for a
>>> Coder, but as for how to actually perform the evolution, it looks like
>>> this is still an open question.
>>>
>>> -Max
>>>
>>> On 09.05.19 18:56, Reuven Lax wrote:
>>> > Dataflow also has an update feature, but it's limited by the fact that
>>> > Beam does not have a good concept of Coder evolution. As a result we
>>> try
>>> > very hard never to change import Coders, which sometime makes
>>> > development of parts of Beam much more difficult. I think Beam would
>>> > benefit greatly by having a first-class concept of Coder evolution.
>>> >
>>> > BTW for schemas there is a natural way of defining evolution that can
>>> be
>>> > handled by SchemaCoder.
>>> >
>>> > On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <lcwik@google.com
>>> > <ma...@google.com>> wrote:
>>> >
>>> >     There was a thread about coder update in the past here[1]. Also,
>>> >     Reuven sent out a doc[2] about pipeline drain and update which was
>>> >     discussed in this thread[3]. I believe there have been more
>>> >     references to pipeline update in other threads when people tried to
>>> >     change coder encodings in the past as well.
>>> >
>>> >     Reuven/Dan are the best contacts about this on how this works
>>> inside
>>> >     of Google, the limitations and other ideas that had been proposed.
>>> >
>>> >     1:
>>> >
>>> https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
>>> >     2:
>>> >
>>> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
>>> >     3:
>>> >
>>> https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
>>> >
>>> >     On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mxm@apache.org
>>> >     <ma...@apache.org>> wrote:
>>> >
>>> >         Hi,
>>> >
>>> >         I'm looking into updating the Flink Runner to Flink version
>>> 1.8.
>>> >         Since
>>> >         version 1.7 Flink has a new optional interface for Coder
>>> evolution*.
>>> >
>>> >         When a Flink pipeline is checkpointed, CoderSnapshots are
>>> >         written out
>>> >         alongside with the checkpointed data. When the pipeline is
>>> >         restored from
>>> >         that checkpoint, the CoderSnapshots are restored and used to
>>> >         reinstantiate the Coders.
>>> >
>>> >         Furthermore, there is a compatibility and migration check
>>> >         between the
>>> >         old and the new Coder. This allows to determine whether
>>> >
>>> >            - The serializer did not change or is compatible (ok)
>>> >            - The serialization format of the coder changed (ok after
>>> >         migration)
>>> >            - The coder needs to be reconfigured and we know how to that
>>> >         based on
>>> >              the old version (ok after reconfiguration)
>>> >            - The coder is incompatible (error)
>>> >
>>> >         I was wondering about the Coder evolution story in Beam. The
>>> >         current
>>> >         state is that checkpointed Beam pipelines are only guaranteed
>>> to
>>> >         run
>>> >         with the same Beam version and pipeline version. A newer
>>> version of
>>> >         either might break the checkpoint format without any way to
>>> >         migrate the
>>> >         state.
>>> >
>>> >         Should we start thinking about supporting Coder evolution in
>>> Beam?
>>> >
>>> >         Thanks,
>>> >         Max
>>> >
>>> >
>>> >         * Coders are called TypeSerializers in Flink land. The
>>> interface is
>>> >         TypeSerializerSnapshot.
>>> >
>>>
>>

Re: Coder Evolution

Posted by Lukasz Cwik <lc...@google.com>.
Coder evolution is a critical part for both updating and rolling back a
pipeline. One way evolution would make rollbacks more difficult for users.

Only allowing resuming from snapshots to rollback pipelines if helpful to
recover pipelines where internal pipeline state is known to be bad like
user state but it ignores any external side effects that may have been
correct.
Being able to perform a rollback would allow people to keep internal state
changes that would reflect external writes if the known user state is good.

*From: *Reuven Lax <re...@google.com>
*Date: *Mon, May 13, 2019 at 12:33 PM
*To: *dev

For schemas it requires some design and discussion. One approach is to
> allow one-way evolution the way protos and BigQuery does. Essentially this
> means we allow adding new fields and making existing fields options, and
> any other change is disallowed.
>
> *From: *Maximilian Michels <mx...@apache.org>
> *Date: *Fri, May 10, 2019 at 6:30 AM
> *To: * <de...@beam.apache.org>
>
> Thanks for the references Luke! I thought that there may have been prior
>> discussions, so this thread could be a good place to consolidate.
>>
>> > Dataflow also has an update feature, but it's limited by the fact that
>> Beam does not have a good concept of Coder evolution. As a result we try
>> very hard never to change import Coders,
>>
>> Trying not to break Coders is a fair approach and could work fine for
>> Beam itself, if the Coders were designed really carefully. But what
>> about custom Coders users may have written? AvroCoder or ProtoCoder
>> would be good candidates for forwards-compatibility, but even these do
>> not have migration functionality built in.
>>
>> Is schema evolution already part of SchemaCoder? It's definitely a good
>> candidate for evolution because a schema provides the insight-view for a
>> Coder, but as for how to actually perform the evolution, it looks like
>> this is still an open question.
>>
>> -Max
>>
>> On 09.05.19 18:56, Reuven Lax wrote:
>> > Dataflow also has an update feature, but it's limited by the fact that
>> > Beam does not have a good concept of Coder evolution. As a result we
>> try
>> > very hard never to change import Coders, which sometime makes
>> > development of parts of Beam much more difficult. I think Beam would
>> > benefit greatly by having a first-class concept of Coder evolution.
>> >
>> > BTW for schemas there is a natural way of defining evolution that can
>> be
>> > handled by SchemaCoder.
>> >
>> > On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <lcwik@google.com
>> > <ma...@google.com>> wrote:
>> >
>> >     There was a thread about coder update in the past here[1]. Also,
>> >     Reuven sent out a doc[2] about pipeline drain and update which was
>> >     discussed in this thread[3]. I believe there have been more
>> >     references to pipeline update in other threads when people tried to
>> >     change coder encodings in the past as well.
>> >
>> >     Reuven/Dan are the best contacts about this on how this works inside
>> >     of Google, the limitations and other ideas that had been proposed.
>> >
>> >     1:
>> >
>> https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
>> >     2:
>> >
>> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
>> >     3:
>> >
>> https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
>> >
>> >     On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mxm@apache.org
>> >     <ma...@apache.org>> wrote:
>> >
>> >         Hi,
>> >
>> >         I'm looking into updating the Flink Runner to Flink version 1.8.
>> >         Since
>> >         version 1.7 Flink has a new optional interface for Coder
>> evolution*.
>> >
>> >         When a Flink pipeline is checkpointed, CoderSnapshots are
>> >         written out
>> >         alongside with the checkpointed data. When the pipeline is
>> >         restored from
>> >         that checkpoint, the CoderSnapshots are restored and used to
>> >         reinstantiate the Coders.
>> >
>> >         Furthermore, there is a compatibility and migration check
>> >         between the
>> >         old and the new Coder. This allows to determine whether
>> >
>> >            - The serializer did not change or is compatible (ok)
>> >            - The serialization format of the coder changed (ok after
>> >         migration)
>> >            - The coder needs to be reconfigured and we know how to that
>> >         based on
>> >              the old version (ok after reconfiguration)
>> >            - The coder is incompatible (error)
>> >
>> >         I was wondering about the Coder evolution story in Beam. The
>> >         current
>> >         state is that checkpointed Beam pipelines are only guaranteed to
>> >         run
>> >         with the same Beam version and pipeline version. A newer
>> version of
>> >         either might break the checkpoint format without any way to
>> >         migrate the
>> >         state.
>> >
>> >         Should we start thinking about supporting Coder evolution in
>> Beam?
>> >
>> >         Thanks,
>> >         Max
>> >
>> >
>> >         * Coders are called TypeSerializers in Flink land. The
>> interface is
>> >         TypeSerializerSnapshot.
>> >
>>
>

Re: Coder Evolution

Posted by Reuven Lax <re...@google.com>.
For schemas it requires some design and discussion. One approach is to
allow one-way evolution the way protos and BigQuery does. Essentially this
means we allow adding new fields and making existing fields options, and
any other change is disallowed.

*From: *Maximilian Michels <mx...@apache.org>
*Date: *Fri, May 10, 2019 at 6:30 AM
*To: * <de...@beam.apache.org>

Thanks for the references Luke! I thought that there may have been prior
> discussions, so this thread could be a good place to consolidate.
>
> > Dataflow also has an update feature, but it's limited by the fact that
> Beam does not have a good concept of Coder evolution. As a result we try
> very hard never to change import Coders,
>
> Trying not to break Coders is a fair approach and could work fine for
> Beam itself, if the Coders were designed really carefully. But what
> about custom Coders users may have written? AvroCoder or ProtoCoder
> would be good candidates for forwards-compatibility, but even these do
> not have migration functionality built in.
>
> Is schema evolution already part of SchemaCoder? It's definitely a good
> candidate for evolution because a schema provides the insight-view for a
> Coder, but as for how to actually perform the evolution, it looks like
> this is still an open question.
>
> -Max
>
> On 09.05.19 18:56, Reuven Lax wrote:
> > Dataflow also has an update feature, but it's limited by the fact that
> > Beam does not have a good concept of Coder evolution. As a result we try
> > very hard never to change import Coders, which sometime makes
> > development of parts of Beam much more difficult. I think Beam would
> > benefit greatly by having a first-class concept of Coder evolution.
> >
> > BTW for schemas there is a natural way of defining evolution that can be
> > handled by SchemaCoder.
> >
> > On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <lcwik@google.com
> > <ma...@google.com>> wrote:
> >
> >     There was a thread about coder update in the past here[1]. Also,
> >     Reuven sent out a doc[2] about pipeline drain and update which was
> >     discussed in this thread[3]. I believe there have been more
> >     references to pipeline update in other threads when people tried to
> >     change coder encodings in the past as well.
> >
> >     Reuven/Dan are the best contacts about this on how this works inside
> >     of Google, the limitations and other ideas that had been proposed.
> >
> >     1:
> >
> https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
> >     2:
> >
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
> >     3:
> >
> https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
> >
> >     On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mxm@apache.org
> >     <ma...@apache.org>> wrote:
> >
> >         Hi,
> >
> >         I'm looking into updating the Flink Runner to Flink version 1.8.
> >         Since
> >         version 1.7 Flink has a new optional interface for Coder
> evolution*.
> >
> >         When a Flink pipeline is checkpointed, CoderSnapshots are
> >         written out
> >         alongside with the checkpointed data. When the pipeline is
> >         restored from
> >         that checkpoint, the CoderSnapshots are restored and used to
> >         reinstantiate the Coders.
> >
> >         Furthermore, there is a compatibility and migration check
> >         between the
> >         old and the new Coder. This allows to determine whether
> >
> >            - The serializer did not change or is compatible (ok)
> >            - The serialization format of the coder changed (ok after
> >         migration)
> >            - The coder needs to be reconfigured and we know how to that
> >         based on
> >              the old version (ok after reconfiguration)
> >            - The coder is incompatible (error)
> >
> >         I was wondering about the Coder evolution story in Beam. The
> >         current
> >         state is that checkpointed Beam pipelines are only guaranteed to
> >         run
> >         with the same Beam version and pipeline version. A newer version
> of
> >         either might break the checkpoint format without any way to
> >         migrate the
> >         state.
> >
> >         Should we start thinking about supporting Coder evolution in
> Beam?
> >
> >         Thanks,
> >         Max
> >
> >
> >         * Coders are called TypeSerializers in Flink land. The interface
> is
> >         TypeSerializerSnapshot.
> >
>

Re: Coder Evolution

Posted by Maximilian Michels <mx...@apache.org>.
Thanks for the references Luke! I thought that there may have been prior 
discussions, so this thread could be a good place to consolidate.

> Dataflow also has an update feature, but it's limited by the fact that Beam does not have a good concept of Coder evolution. As a result we try very hard never to change import Coders,

Trying not to break Coders is a fair approach and could work fine for 
Beam itself, if the Coders were designed really carefully. But what 
about custom Coders users may have written? AvroCoder or ProtoCoder 
would be good candidates for forwards-compatibility, but even these do 
not have migration functionality built in.

Is schema evolution already part of SchemaCoder? It's definitely a good 
candidate for evolution because a schema provides the insight-view for a 
Coder, but as for how to actually perform the evolution, it looks like 
this is still an open question.

-Max

On 09.05.19 18:56, Reuven Lax wrote:
> Dataflow also has an update feature, but it's limited by the fact that 
> Beam does not have a good concept of Coder evolution. As a result we try 
> very hard never to change import Coders, which sometime makes 
> development of parts of Beam much more difficult. I think Beam would 
> benefit greatly by having a first-class concept of Coder evolution.
> 
> BTW for schemas there is a natural way of defining evolution that can be 
> handled by SchemaCoder.
> 
> On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <lcwik@google.com 
> <ma...@google.com>> wrote:
> 
>     There was a thread about coder update in the past here[1]. Also,
>     Reuven sent out a doc[2] about pipeline drain and update which was
>     discussed in this thread[3]. I believe there have been more
>     references to pipeline update in other threads when people tried to
>     change coder encodings in the past as well.
> 
>     Reuven/Dan are the best contacts about this on how this works inside
>     of Google, the limitations and other ideas that had been proposed.
> 
>     1:
>     https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
>     2:
>     https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
>     3:
>     https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
> 
>     On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mxm@apache.org
>     <ma...@apache.org>> wrote:
> 
>         Hi,
> 
>         I'm looking into updating the Flink Runner to Flink version 1.8.
>         Since
>         version 1.7 Flink has a new optional interface for Coder evolution*.
> 
>         When a Flink pipeline is checkpointed, CoderSnapshots are
>         written out
>         alongside with the checkpointed data. When the pipeline is
>         restored from
>         that checkpoint, the CoderSnapshots are restored and used to
>         reinstantiate the Coders.
> 
>         Furthermore, there is a compatibility and migration check
>         between the
>         old and the new Coder. This allows to determine whether
> 
>            - The serializer did not change or is compatible (ok)
>            - The serialization format of the coder changed (ok after
>         migration)
>            - The coder needs to be reconfigured and we know how to that
>         based on
>              the old version (ok after reconfiguration)
>            - The coder is incompatible (error)
> 
>         I was wondering about the Coder evolution story in Beam. The
>         current
>         state is that checkpointed Beam pipelines are only guaranteed to
>         run
>         with the same Beam version and pipeline version. A newer version of
>         either might break the checkpoint format without any way to
>         migrate the
>         state.
> 
>         Should we start thinking about supporting Coder evolution in Beam?
> 
>         Thanks,
>         Max
> 
> 
>         * Coders are called TypeSerializers in Flink land. The interface is
>         TypeSerializerSnapshot.
> 

Re: Coder Evolution

Posted by Reuven Lax <re...@google.com>.
Dataflow also has an update feature, but it's limited by the fact that Beam
does not have a good concept of Coder evolution. As a result we try very
hard never to change import Coders, which sometime makes development of
parts of Beam much more difficult. I think Beam would benefit greatly by
having a first-class concept of Coder evolution.

BTW for schemas there is a natural way of defining evolution that can be
handled by SchemaCoder.

On Wed, May 8, 2019 at 12:58 PM Lukasz Cwik <lc...@google.com> wrote:

> There was a thread about coder update in the past here[1]. Also, Reuven
> sent out a doc[2] about pipeline drain and update which was discussed in
> this thread[3]. I believe there have been more references to pipeline
> update in other threads when people tried to change coder encodings in the
> past as well.
>
> Reuven/Dan are the best contacts about this on how this works inside of
> Google, the limitations and other ideas that had been proposed.
>
> 1:
> https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
> 2:
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
> 3:
> https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E
>
> On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mx...@apache.org> wrote:
>
>> Hi,
>>
>> I'm looking into updating the Flink Runner to Flink version 1.8. Since
>> version 1.7 Flink has a new optional interface for Coder evolution*.
>>
>> When a Flink pipeline is checkpointed, CoderSnapshots are written out
>> alongside with the checkpointed data. When the pipeline is restored from
>> that checkpoint, the CoderSnapshots are restored and used to
>> reinstantiate the Coders.
>>
>> Furthermore, there is a compatibility and migration check between the
>> old and the new Coder. This allows to determine whether
>>
>>   - The serializer did not change or is compatible (ok)
>>   - The serialization format of the coder changed (ok after migration)
>>   - The coder needs to be reconfigured and we know how to that based on
>>     the old version (ok after reconfiguration)
>>   - The coder is incompatible (error)
>>
>> I was wondering about the Coder evolution story in Beam. The current
>> state is that checkpointed Beam pipelines are only guaranteed to run
>> with the same Beam version and pipeline version. A newer version of
>> either might break the checkpoint format without any way to migrate the
>> state.
>>
>> Should we start thinking about supporting Coder evolution in Beam?
>>
>> Thanks,
>> Max
>>
>>
>> * Coders are called TypeSerializers in Flink land. The interface is
>> TypeSerializerSnapshot.
>>
>

Re: Coder Evolution

Posted by Lukasz Cwik <lc...@google.com>.
There was a thread about coder update in the past here[1]. Also, Reuven
sent out a doc[2] about pipeline drain and update which was discussed in
this thread[3]. I believe there have been more references to pipeline
update in other threads when people tried to change coder encodings in the
past as well.

Reuven/Dan are the best contacts about this on how this works inside of
Google, the limitations and other ideas that had been proposed.

1:
https://lists.apache.org/thread.html/f3b2daa740075cc39dc04f08d1eaacfcc2d550cecc04e27be024cf52@%3Cdev.beam.apache.org%3E
2:
https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit#
3:
https://lists.apache.org/thread.html/37c9aa4aa5011801a7f060bf3f53e687e539fa6154cc9f1c544d4f7a@%3Cdev.beam.apache.org%3E

On Wed, May 8, 2019 at 11:45 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi,
>
> I'm looking into updating the Flink Runner to Flink version 1.8. Since
> version 1.7 Flink has a new optional interface for Coder evolution*.
>
> When a Flink pipeline is checkpointed, CoderSnapshots are written out
> alongside with the checkpointed data. When the pipeline is restored from
> that checkpoint, the CoderSnapshots are restored and used to
> reinstantiate the Coders.
>
> Furthermore, there is a compatibility and migration check between the
> old and the new Coder. This allows to determine whether
>
>   - The serializer did not change or is compatible (ok)
>   - The serialization format of the coder changed (ok after migration)
>   - The coder needs to be reconfigured and we know how to that based on
>     the old version (ok after reconfiguration)
>   - The coder is incompatible (error)
>
> I was wondering about the Coder evolution story in Beam. The current
> state is that checkpointed Beam pipelines are only guaranteed to run
> with the same Beam version and pipeline version. A newer version of
> either might break the checkpoint format without any way to migrate the
> state.
>
> Should we start thinking about supporting Coder evolution in Beam?
>
> Thanks,
> Max
>
>
> * Coders are called TypeSerializers in Flink land. The interface is
> TypeSerializerSnapshot.
>