You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Mateusz Mrozewski (Jira)" <ji...@apache.org> on 2020/03/24 23:17:00 UTC

[jira] [Created] (AVRO-2779) Schema evolution and adding fields to nested records

Mateusz Mrozewski created AVRO-2779:
---------------------------------------

             Summary: Schema evolution and adding fields to nested records
                 Key: AVRO-2779
                 URL: https://issues.apache.org/jira/browse/AVRO-2779
             Project: Apache Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.9.2
            Reporter: Mateusz Mrozewski


I have a producer that sometimes adds new fields to schema. Producer usually gets updated first and starts producing serialized records with new fields (data is sent via Kafka).

I have a consumer, that should be able to read the data from Kafka, even when produced with newer schema - new fields can be ignored until consumer gets updated.

I noticed that adding two fields, one at the top level and one in the nested records yields unexpected results.

Old schema:
{code:java}
{
  "namespace" : "some.namespace",
  "name" : "MyRecord",
  "type" : "record",
  "fields" : [
    {"name": "field1", "type": "long"},
    {
      "name": "nested",
      "type": {
        "type" : "record",
        "name" : "nestedRecord",
        "fields" : [
          {"name": "nestedField1", "type": "long"}
        ]
      }
    }
  ]
}
{code}
New Schema:
{code:java}
{
  "namespace" : "some.namespace",
  "name" : "MyRecord",
  "type" : "record",
  "fields" : [
    {"name": "field1", "type": "long"},
    {"name": "field2", "type": "long"},
    {
      "name": "nested",
      "type": {
        "type" : "record",
        "name" : "nestedRecord",
        "fields" : [
          {"name": "nestedField1", "type": "long"},
          {"name": "nestedField2", "type": "long"}
        ]
      }
    }
  ]
}
{code}
And example code:
{code:java}
Schema.Parser parser = new Schema.Parser();
InputStream fin = new FileInputStream("src/main/resources/schemas/old.json");
Schema oldSchema = parser.parse(fin);Schema.Parser parser2 = new Schema.Parser();
fin = new FileInputStream("src/main/resources/schemas/new.json");
Schema newSchema = parser2.parse(fin);GenericData.Record nested = new GenericRecordBuilder(newSchema.getField("nested").schema())
        .set("nestedField1", 3L)
        .set("nestedField2", 4L)
        .build();GenericData.Record newRecord = new GenericRecordBuilder(newSchema)
        .set("field1", 1L)
        .set("field2", 2L)
        .set("nested", nested)
        .build();GenericData gd1 = new GenericData();
RawMessageEncoder<GenericRecord> encoder = new RawMessageEncoder<>(gd1, newSchema);
ByteBuffer encoded = encoder.encode(newRecord);GenericData gd2 = new GenericData();
RawMessageDecoder<GenericRecord> decoder = new RawMessageDecoder<>(gd2, oldSchema);
GenericRecord record = decoder.decode(encoded);System.out.println(record.get("field1")); // prints 1
System.out.println(record.get("field2")); // prints null
System.out.println(record.get("totally-fake-field")); // prints nullSystem.out.println(((GenericRecord) record.get("nested")).get("nestedField1")); // prints 2!
System.out.println(((GenericRecord) record.get("nested")).get("nestedField2")); // prints null
{code}
Is this an expected behavior? Should such schema evolution be supported?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)