You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Lukáš Drbal <lu...@gmail.com> on 2021/09/03 09:45:12 UTC

Flink non-portable runner: unable to restore savepoint in 2.32.0

Hi,

We are using beam + non-portable flink runner and now we are trying to
upgrade from 2.31.0 to 2.32.0. Savepoint restore failed with:

Caused by: java.lang.IllegalStateException: Could not Java-deserialize
TypeSerializer while restoring checkpoint metadata for serializer snapshot
'org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$LegacySnapshot'.
Please update to the TypeSerializerSnapshot interface that removes Java
Serialization to avoid this problem in the future.
at
org.apache.flink.api.common.typeutils.TypeSerializerConfigSnapshot.restoreSerializer(TypeSerializerConfigSnapshot.java:145)
at
org.apache.flink.runtime.state.StateSerializerProvider.previousSchemaSerializer(StateSerializerProvider.java:186)
at
org.apache.flink.runtime.state.StateSerializerProvider.currentSchemaSerializer(StateSerializerProvider.java:161)
at
org.apache.flink.runtime.state.RegisteredOperatorStateBackendMetaInfo.getPartitionStateSerializer(RegisteredOperatorStateBackendMetaInfo.java:112)
at
org.apache.flink.runtime.state.OperatorStateRestoreOperation.restore(OperatorStateRestoreOperation.java:93)
at
org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:80)
... 16 more
Caused by: java.io.InvalidClassException:
org.apache.beam.sdk.transforms.windowing.IntervalWindow$IntervalWindowCoder;
local class incompatible: stream classdesc serialVersionUID =
-4033763546256145084, local class serialVersionUID = -3349635311633734096

That's because IntervalWindowCoder changed [1]. As I understand restore
savepoint may work but there is no guarantee and this may happens.

From my perspective this leads to very bad experience for users using beam.
Is there any plan to change it? Flink itself should support coders
evolution.
Is this handled better in portable runner? How is this handled by another
runners, for example in Dataflow? Any plan to change this?

L.

[1]
https://github.com/apache/beam/commit/7d98ad2b6554258eeccf9f7c1b017f9a001bfd87#diff-9bd0fd4fa4cbbd6ddb29b194c39d6d46a4f229aa0f1db9ed79ec2adea7971854R178

Re: Flink non-portable runner: unable to restore savepoint in 2.32.0

Posted by Luke Cwik <lc...@google.com>.
On Fri, Sep 3, 2021 at 2:45 AM Lukáš Drbal <lu...@gmail.com> wrote:

> Hi,
>
> We are using beam + non-portable flink runner and now we are trying to
> upgrade from 2.31.0 to 2.32.0. Savepoint restore failed with:
>
> Caused by: java.lang.IllegalStateException: Could not Java-deserialize
> TypeSerializer while restoring checkpoint metadata for serializer snapshot
> 'org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$LegacySnapshot'.
> Please update to the TypeSerializerSnapshot interface that removes Java
> Serialization to avoid this problem in the future.
> at
> org.apache.flink.api.common.typeutils.TypeSerializerConfigSnapshot.restoreSerializer(TypeSerializerConfigSnapshot.java:145)
> at
> org.apache.flink.runtime.state.StateSerializerProvider.previousSchemaSerializer(StateSerializerProvider.java:186)
> at
> org.apache.flink.runtime.state.StateSerializerProvider.currentSchemaSerializer(StateSerializerProvider.java:161)
> at
> org.apache.flink.runtime.state.RegisteredOperatorStateBackendMetaInfo.getPartitionStateSerializer(RegisteredOperatorStateBackendMetaInfo.java:112)
> at
> org.apache.flink.runtime.state.OperatorStateRestoreOperation.restore(OperatorStateRestoreOperation.java:93)
> at
> org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:80)
> ... 16 more
> Caused by: java.io.InvalidClassException:
> org.apache.beam.sdk.transforms.windowing.IntervalWindow$IntervalWindowCoder;
> local class incompatible: stream classdesc serialVersionUID =
> -4033763546256145084, local class serialVersionUID = -3349635311633734096
>
> That's because IntervalWindowCoder changed [1]. As I understand restore
> savepoint may work but there is no guarantee and this may happens.
>
> From my perspective this leads to very bad experience for users using
> beam. Is there any plan to change it? Flink itself should support coders
> evolution.
>
Is this handled better in portable runner?
>

For well known coders Java serialization is not involved. This covers
things like strings, longs, window coders, ... but is still an issue for
coders that rely on Java serialization to store their internal state. See
https://github.com/apache/beam/blob/410ad7699621e28433d81809f6b9c42fe7bd6a60/model/pipeline/src/main/proto/beam_runner_api.proto#L787
for a current list.

Hopefully schemas become more popular and that should help since they won't
rely on Java serialization of the coder.

How is this handled by another runners, for example in Dataflow?
>

Dataflow has the same concept where it uses well known coders but is still
limited in this case for coders that rely on Java serialization to manage
the serialVersionUID correctly.

Any plan to change this?
>

Dataflow will migrate to the portable proto representation for coders
instead of its own implementation around CloudObject based coders.


>
> L.
>
> [1]
> https://github.com/apache/beam/commit/7d98ad2b6554258eeccf9f7c1b017f9a001bfd87#diff-9bd0fd4fa4cbbd6ddb29b194c39d6d46a4f229aa0f1db9ed79ec2adea7971854R178
>
>
>
>