You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Mateusz Mrozewski (Jira)" <ji...@apache.org> on 2020/03/24 23:54:00 UTC

[jira] [Updated] (AVRO-2779) Schema evolution and adding fields to nested records

     [ https://issues.apache.org/jira/browse/AVRO-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mateusz Mrozewski updated AVRO-2779:
------------------------------------
    Description: 
I have a producer that sometimes adds new fields to schema. Producer usually gets updated first and starts producing serialized records with new fields (data is sent via Kafka).

I have a consumer, that should be able to read the data from Kafka, even when produced with newer schema - new fields can be ignored until consumer gets updated.

I noticed that adding two fields, one at the top level and one in the nested records yields unexpected results.

Old schema:
{code:java}
{
  "namespace" : "some.namespace",
  "name" : "MyRecord",
  "type" : "record",
  "fields" : [
    {"name": "field1", "type": "long"},
    {
      "name": "nested",
      "type": {
        "type" : "record",
        "name" : "nestedRecord",
        "fields" : [
          {"name": "nestedField1", "type": "long"}
        ]
      }
    }
  ]
}
{code}
New Schema:
{code:java}
{
  "namespace" : "some.namespace",
  "name" : "MyRecord",
  "type" : "record",
  "fields" : [
    {"name": "field1", "type": "long"},
    {"name": "field2", "type": "long"},
    {
      "name": "nested",
      "type": {
        "type" : "record",
        "name" : "nestedRecord",
        "fields" : [
          {"name": "nestedField1", "type": "long"},
          {"name": "nestedField2", "type": "long"}
        ]
      }
    }
  ]
}
{code}
And example code:
{code:java}
Schema.Parser parser = new Schema.Parser();
InputStream fin = new FileInputStream("src/main/resources/schemas/old.json");
Schema oldSchema = parser.parse(fin);Schema.Parser parser2 = new Schema.Parser();
fin = new FileInputStream("src/main/resources/schemas/new.json");
Schema newSchema = parser2.parse(fin);GenericData.Record nested = new GenericRecordBuilder(newSchema.getField("nested").schema())
        .set("nestedField1", 3L)
        .set("nestedField2", 4L)
        .build();
GenericData.Record newRecord = new GenericRecordBuilder(newSchema)
        .set("field1", 1L)
        .set("field2", 2L)
        .set("nested", nested)
        .build();GenericData gd1 = new GenericData();
RawMessageEncoder<GenericRecord> encoder = new RawMessageEncoder<>(gd1, newSchema);
ByteBuffer encoded = encoder.encode(newRecord);GenericData gd2 = new GenericData();
RawMessageDecoder<GenericRecord> decoder = new RawMessageDecoder<>(gd2, oldSchema);
GenericRecord record = decoder.decode(encoded);System.out.println(record.get("field1")); // prints 1
System.out.println(record.get("field2")); // prints null
System.out.println(record.get("totally-fake-field")); // prints nullSystem.out.println(((GenericRecord) record.get("nested")).get("nestedField1")); // prints 2!
System.out.println(((GenericRecord) record.get("nested")).get("nestedField2")); // prints null
{code}
Is this an expected behavior? Should such schema evolution be supported?

  was:
I have a producer that sometimes adds new fields to schema. Producer usually gets updated first and starts producing serialized records with new fields (data is sent via Kafka).

I have a consumer, that should be able to read the data from Kafka, even when produced with newer schema - new fields can be ignored until consumer gets updated.

I noticed that adding two fields, one at the top level and one in the nested records yields unexpected results.

Old schema:
{code:java}
{
  "namespace" : "some.namespace",
  "name" : "MyRecord",
  "type" : "record",
  "fields" : [
    {"name": "field1", "type": "long"},
    {
      "name": "nested",
      "type": {
        "type" : "record",
        "name" : "nestedRecord",
        "fields" : [
          {"name": "nestedField1", "type": "long"}
        ]
      }
    }
  ]
}
{code}
New Schema:
{code:java}
{
  "namespace" : "some.namespace",
  "name" : "MyRecord",
  "type" : "record",
  "fields" : [
    {"name": "field1", "type": "long"},
    {"name": "field2", "type": "long"},
    {
      "name": "nested",
      "type": {
        "type" : "record",
        "name" : "nestedRecord",
        "fields" : [
          {"name": "nestedField1", "type": "long"},
          {"name": "nestedField2", "type": "long"}
        ]
      }
    }
  ]
}
{code}
And example code:
{code:java}
Schema.Parser parser = new Schema.Parser();
InputStream fin = new FileInputStream("src/main/resources/schemas/old.json");
Schema oldSchema = parser.parse(fin);Schema.Parser parser2 = new Schema.Parser();
fin = new FileInputStream("src/main/resources/schemas/new.json");
Schema newSchema = parser2.parse(fin);GenericData.Record nested = new GenericRecordBuilder(newSchema.getField("nested").schema())
        .set("nestedField1", 3L)
        .set("nestedField2", 4L)
        .build();GenericData.Record newRecord = new GenericRecordBuilder(newSchema)
        .set("field1", 1L)
        .set("field2", 2L)
        .set("nested", nested)
        .build();GenericData gd1 = new GenericData();
RawMessageEncoder<GenericRecord> encoder = new RawMessageEncoder<>(gd1, newSchema);
ByteBuffer encoded = encoder.encode(newRecord);GenericData gd2 = new GenericData();
RawMessageDecoder<GenericRecord> decoder = new RawMessageDecoder<>(gd2, oldSchema);
GenericRecord record = decoder.decode(encoded);System.out.println(record.get("field1")); // prints 1
System.out.println(record.get("field2")); // prints null
System.out.println(record.get("totally-fake-field")); // prints nullSystem.out.println(((GenericRecord) record.get("nested")).get("nestedField1")); // prints 2!
System.out.println(((GenericRecord) record.get("nested")).get("nestedField2")); // prints null
{code}
Is this an expected behavior? Should such schema evolution be supported?


> Schema evolution and adding fields to nested records
> ----------------------------------------------------
>
>                 Key: AVRO-2779
>                 URL: https://issues.apache.org/jira/browse/AVRO-2779
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2
>            Reporter: Mateusz Mrozewski
>            Priority: Major
>
> I have a producer that sometimes adds new fields to schema. Producer usually gets updated first and starts producing serialized records with new fields (data is sent via Kafka).
> I have a consumer, that should be able to read the data from Kafka, even when produced with newer schema - new fields can be ignored until consumer gets updated.
> I noticed that adding two fields, one at the top level and one in the nested records yields unexpected results.
> Old schema:
> {code:java}
> {
>   "namespace" : "some.namespace",
>   "name" : "MyRecord",
>   "type" : "record",
>   "fields" : [
>     {"name": "field1", "type": "long"},
>     {
>       "name": "nested",
>       "type": {
>         "type" : "record",
>         "name" : "nestedRecord",
>         "fields" : [
>           {"name": "nestedField1", "type": "long"}
>         ]
>       }
>     }
>   ]
> }
> {code}
> New Schema:
> {code:java}
> {
>   "namespace" : "some.namespace",
>   "name" : "MyRecord",
>   "type" : "record",
>   "fields" : [
>     {"name": "field1", "type": "long"},
>     {"name": "field2", "type": "long"},
>     {
>       "name": "nested",
>       "type": {
>         "type" : "record",
>         "name" : "nestedRecord",
>         "fields" : [
>           {"name": "nestedField1", "type": "long"},
>           {"name": "nestedField2", "type": "long"}
>         ]
>       }
>     }
>   ]
> }
> {code}
> And example code:
> {code:java}
> Schema.Parser parser = new Schema.Parser();
> InputStream fin = new FileInputStream("src/main/resources/schemas/old.json");
> Schema oldSchema = parser.parse(fin);Schema.Parser parser2 = new Schema.Parser();
> fin = new FileInputStream("src/main/resources/schemas/new.json");
> Schema newSchema = parser2.parse(fin);GenericData.Record nested = new GenericRecordBuilder(newSchema.getField("nested").schema())
>         .set("nestedField1", 3L)
>         .set("nestedField2", 4L)
>         .build();
> GenericData.Record newRecord = new GenericRecordBuilder(newSchema)
>         .set("field1", 1L)
>         .set("field2", 2L)
>         .set("nested", nested)
>         .build();GenericData gd1 = new GenericData();
> RawMessageEncoder<GenericRecord> encoder = new RawMessageEncoder<>(gd1, newSchema);
> ByteBuffer encoded = encoder.encode(newRecord);GenericData gd2 = new GenericData();
> RawMessageDecoder<GenericRecord> decoder = new RawMessageDecoder<>(gd2, oldSchema);
> GenericRecord record = decoder.decode(encoded);System.out.println(record.get("field1")); // prints 1
> System.out.println(record.get("field2")); // prints null
> System.out.println(record.get("totally-fake-field")); // prints nullSystem.out.println(((GenericRecord) record.get("nested")).get("nestedField1")); // prints 2!
> System.out.println(((GenericRecord) record.get("nested")).get("nestedField2")); // prints null
> {code}
> Is this an expected behavior? Should such schema evolution be supported?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)