You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by KV 59 <kv...@gmail.com> on 2021/12/05 16:31:21 UTC

Converting SpecificRecord to GenericRecord of different Schema Versions

Hi All,

Here is my situation, I have a SpecificRecord for a Schema S and I want to
convert it into a GenericRecord of a compatible Schema T (It is an older
version of S). I have seen many such examples but all strategies point to
serializing the SpecificRecord to either bytes or JSON and deserializing
back to the GenericRecord. This seems to be inefficient especially if the
records are huge and in a high volume streaming scenario like mine.

I cannot simply cast the SpecificRecord to GenericRecord because of some
type incompatibilities like Enums and Instants.

I have been looking at the SpecificRecordDatumWriter/Reader sources and try
to build a Mapper which just sets the value in the GenericRecord but I
cannot write such a mapper without the help of any of the protected and
private methods in them.

There is a same problem of converting a POJO to GenericRecord as well


Appreciate your inputs and recommendations

Thanks
Kishore

Re: Converting SpecificRecord to GenericRecord of different Schema Versions

Posted by Scott Reynolds <sr...@twilio.com>.
I wrote code to do this and it is used in production at high volume.

It is a recursive implementation that reads the next schema field or record
from the desired schema and gets the field from the original object by name.

The tricky parts are dealing with enumerations (as you noted). The code
used to make that translation is here, fieldSchema is the desired enum
field from the GenericRecord and unknownValue is the Enum from the
SpecificRecord:

GenericData.EnumSymbol(fieldSchema, unknownValue.toString().toUpperCase())

Fixed also required a special branch:

new GenericData.Fixed(fieldSchema, (byte[])unknownValue);

And finally logical types required the use of conversions:

Conversions.convertToRawType(unknownValue, fieldSchema,
fieldSchema.getLogicalType(),
conversions.get(fieldSchema.getLogicalType().getName()));

On Mon, Dec 6, 2021, 4:32 AM Martin Grigorov <mg...@apache.org> wrote:

> Hi,
>
> You will need to write a new org.apache.avro.io.Encoder.
> If you succeed making it then please share it with the commuity via Pull
> Request!
> If you don't - please create an issue and we will try to help!
>
> On Sun, Dec 5, 2021 at 6:31 PM KV 59 <kv...@gmail.com> wrote:
>
>> Hi All,
>>
>> Here is my situation, I have a SpecificRecord for a Schema S and I want
>> to convert it into a GenericRecord of a compatible Schema T (It is an older
>> version of S). I have seen many such examples but all strategies point to
>> serializing the SpecificRecord to either bytes or JSON and deserializing
>> back to the GenericRecord. This seems to be inefficient especially if the
>> records are huge and in a high volume streaming scenario like mine.
>>
>> I cannot simply cast the SpecificRecord to GenericRecord because of some
>> type incompatibilities like Enums and Instants.
>>
>> I have been looking at the SpecificRecordDatumWriter/Reader sources and
>> try to build a Mapper which just sets the value in the GenericRecord but I
>> cannot write such a mapper without the help of any of the protected and
>> private methods in them.
>>
>> There is a same problem of converting a POJO to GenericRecord as well
>>
>>
>> Appreciate your inputs and recommendations
>>
>> Thanks
>> Kishore
>>
>>
>>
>>
>>

Re: Converting SpecificRecord to GenericRecord of different Schema Versions

Posted by Martin Grigorov <mg...@apache.org>.
Hi,

You will need to write a new org.apache.avro.io.Encoder.
If you succeed making it then please share it with the commuity via Pull
Request!
If you don't - please create an issue and we will try to help!

On Sun, Dec 5, 2021 at 6:31 PM KV 59 <kv...@gmail.com> wrote:

> Hi All,
>
> Here is my situation, I have a SpecificRecord for a Schema S and I want to
> convert it into a GenericRecord of a compatible Schema T (It is an older
> version of S). I have seen many such examples but all strategies point to
> serializing the SpecificRecord to either bytes or JSON and deserializing
> back to the GenericRecord. This seems to be inefficient especially if the
> records are huge and in a high volume streaming scenario like mine.
>
> I cannot simply cast the SpecificRecord to GenericRecord because of some
> type incompatibilities like Enums and Instants.
>
> I have been looking at the SpecificRecordDatumWriter/Reader sources and
> try to build a Mapper which just sets the value in the GenericRecord but I
> cannot write such a mapper without the help of any of the protected and
> private methods in them.
>
> There is a same problem of converting a POJO to GenericRecord as well
>
>
> Appreciate your inputs and recommendations
>
> Thanks
> Kishore
>
>
>
>
>