You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Dan Schmitt <da...@gmail.com> on 2020/01/14 20:19:38 UTC

Schema modification with 1.9.1 Java api

Background:
I have devices generating avro files and throwing them at S3.
The S3 consumers want to push some more data into them so they have
a lambda that does a copy/transform to push the data in.
For some reason they wrote their initial code with the 1.0 release of Avro,
and added fields to the records with a default of null (which now breaks the
1.9.1 type checking, which manifested as avro-tools 1.9.1 barfing on the files.)
They were told
1) Stop using the 10 year old library, use the new stuff
2) the way to default to null for a typed value is union null/type
with null first.

Their first fix attempt, read all the records of the existing file and
created them
all again with the union null/type with the SchemaBuilder.   This changed the
shape of the data, and added a bunch of null/defaults that were not valid.

My suggested code (in groovy, using the java 1.9.1 library) to just copy the
records as is, and make a resolving schema you can get a generic avro
data to manipulate was:

for (fileName in args) {
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>()
DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<>(new File(fileName), datumReader)
Schema schema = dataFileReader.getSchema()
List<Field> theFields = new ArrayList<Field>()
for (f in schema.getFields()) {
   f.position = -1
   theFields.add(f)
}
Field fieldWithDefault = new Field("withDefault",
Schema.create(Type.STRING) ,"", "Spoon")
fieldWithDefault.position = -1
theFields.add(fieldWithDefault)
Schema newSchema = Schema.createRecord(schema.getName(), "",
schema.getNamespace(), false, theFields)
System.out.println(newSchema)
}

The f.position = -1 to get a Field type that is addable to another
schema felt wrong,
but seems to work.   Is there a better idiom for "I want to add a
field to a record and
populate it with data" that I missed?

         Dan