You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2018/02/20 13:16:00 UTC

[jira] [Commented] (FLINK-8716) AvroSerializer does not use schema of snapshot

    [ https://issues.apache.org/jira/browse/FLINK-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370037#comment-16370037 ] 

Aljoscha Krettek commented on FLINK-8716:
-----------------------------------------

I'm starting to think that this might be on purpose. We can only always read with the newest schema because I think we would get into trouble with RocksDB where we don't always read/write all data in between doing savepoints. (You alluded to this in your mailing list e-mail.) If we do two schema updates with savepoints in between we will not have the first schema anymore but we might have data in RocksDB that was written using the first schema.

[~StephanEwen] What do you think? You reworked those parts recently.

> AvroSerializer does not use schema of snapshot
> ----------------------------------------------
>
>                 Key: FLINK-8716
>                 URL: https://issues.apache.org/jira/browse/FLINK-8716
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Arvid Heise
>            Priority: Major
>
> The new AvroSerializer stores the schema in the snapshot and uses it to validate compability.
> However, it does not use the schema of the snapshot while reading the data. This version will fail for any change of the data layout (so it supports more or less only renaming currently).
>  [https://github.com/apache/flink/blob/f3a2197a23524048200ae2b4712d6ed833208124/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSerializer.java#L265]
>  needs to use the schema from
>  [https://github.com/apache/flink/blob/f3a2197a23524048200ae2b4712d6ed833208124/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSerializer.java#L188]
>  as the first parameter. Accordingly, a readSchema field need to be set
>  in #ensureCompatibility and relayed in #duplicate. Note that the readSchema is passed as the write schema parameter to the DatumReader, as it was the schema that was used to write the data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)