You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Gary Steelman <ga...@gmail.com> on 2014/02/19 01:21:29 UTC
General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
Hi all,
Here's my use case: I've got a bunch of different Java objects generated
from Avro schema files. So the class definition headers look something like
this: public class MyObject extends
org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord. I've got many other types than
MyObject too. I need to write a method which can serialize (from MyObject
or another class to byte[]) and deserialize (from byte[] to MyObject or
another class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for
SpecificRecord, so I tired serializing/deserializing these things as
GenericRecord instead:
public static byte[] serializeFromAvro(GenericRecord gr) {
try {
DatumWriter<GenericRecord> writer2 = new
GenericDatumWriter<GenericRecord>(gr.getSchema());
ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
BinaryEncoder encoder2 =
EncoderFactory.get().directBinaryEncoder(bao2, null);
writer2.write(gr, encoder2);
byte[] avroBytes2 = bao2.toByteArray();
return avroBytes2;
} catch (IOException e) {
LOG.debug(e);
return null;
}
}
// Here I use a DataType enum and the AvroSchemaFactory to quickly
retrieve a Schema object for a supported DataType.
public static GenericRecord deserializeFromAvro(byte[] avroBytes,
DataType dataType) {
try {
Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
DatumReader<GenericRecord> reader2 = new
GenericDatumReader<GenericRecord>(schema);
ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
BinaryDecoder decoder2 =
DecoderFactory.get().directBinaryDecoder(bai2, null);
GenericRecord gr2 = reader2.read(null, decoder2);
return gr2;
} catch (Exception e) {
LOG.debug(e);
return null;
}
}
And use them like such:
// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to datatypes.generated.avro.MyObject
Is there an easier way to achieve my use case, or some way I can fix my
methods to allow the sort of behavior I want?
Thanks,
Gary
Re: General-Purpose Serialization and Deserialization for
Avro-Generated SpecificRecords
Posted by Devin Suiter RDX <ds...@rdx.com>.
Guys,
These are great examples. Awesome. Thanks!
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com
On Tue, Feb 18, 2014 at 8:57 PM, Dave McAlpin <dm...@inome.com> wrote:
> That's great Gary. Thanks for the follow up.
>
>
>
> Dave
>
>
>
> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
> Behalf Of *Gary Steelman
> *Sent:* Tuesday, February 18, 2014 5:15 PM
> *To:* Gary Steelman
> *Cc:* user@avro.apache.org
> *Subject:* Re: General-Purpose Serialization and Deserialization for
> Avro-Generated SpecificRecords
>
>
>
> Hey all, I've adapted Dave's solution to serialize to/from byte[] rather
> than JSON. Thanks a lot! The two methods are below:
>
> @SuppressWarnings("unchecked")
> public static <T> byte[] avroSerialize(Class<T> clazz, Object object) {
> byte[] ret = null;
> try {
> if (object == null || !(object instanceof SpecificRecord)) {
> return null;
> }
>
> T record = (T) object;
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
> SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
> w.write(record, e);
> e.flush();
> ret = out.toByteArray();
> } catch (IOException e) {
> LOG.debug(e);
> }
>
> return ret;
> }
>
> public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz,
> Schema schema) {
> T ret = null;
> try {
> ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
> Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
> SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz);
> ret = reader.read(null, d);
> } catch (IOException e) {
> LOG.debug(e);
> }
>
> return ret;
> }
>
> And they're called like so:
>
> MyObject x = new MyObject();
>
> byte[] avroBytes = avroSerialize(x.getClass(), x);
>
> MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);
>
> Thanks,
> Gary
>
>
>
> On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <ga...@gmail.com>
> wrote:
>
> Thank you Dave, I appreciate it. I'll give those a shot and let you know
> how it goes.
>
> -Gary
>
> On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com> wrote:
>
> Here are some utility functions we've used for serialization to and from
> JSON. Something similar should work for binary.
>
>
>
> public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
>
> String avroEncodedJson = null;
>
> try {
>
> if (object == null || !(object instanceof SpecificRecord)) {
>
> return null;
>
> }
>
> T record = (T) object;
>
> Schema schema = ((SpecificRecord) record).getSchema();
>
> ByteArrayOutputStream out = new ByteArrayOutputStream();
>
> Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
>
> SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>
> w.write(record, e);
>
> e.flush();
>
> avroEncodedJson = new String(out.toByteArray());
>
> } catch (IOException e) {
>
> e.printStackTrace();
>
> }
>
>
>
> return avroEncodedJson;
>
> }
>
>
>
> public <T> T jsonDecodeToAvro(String inputString, Class<T> className,
> Schema schema) {
>
> T returnObject = null;
>
> try {
>
> JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema,
> inputString);
>
> SpecificDatumReader<T> reader = new
> SpecificDatumReader<T>(className);
>
> returnObject = reader.read(null, jsonDecoder);
>
> } catch (IOException e) {
>
> e.printStackTrace();
>
> }
>
>
>
> return returnObject;
>
> }
>
>
>
> Dave
>
>
>
> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
> Behalf Of *Gary Steelman
> *Sent:* Tuesday, February 18, 2014 4:21 PM
> *To:* user@avro.apache.org
> *Subject:* General-Purpose Serialization and Deserialization for
> Avro-Generated SpecificRecords
>
>
>
> Hi all,
>
> Here's my use case: I've got a bunch of different Java objects generated
> from Avro schema files. So the class definition headers look something like
> this: public class MyObject extends
> org.apache.avro.specific.SpecificRecordBase implements
> org.apache.avro.specific.SpecificRecord. I've got many other types than
> MyObject too. I need to write a method which can serialize (from MyObject
> or another class to byte[]) and deserialize (from byte[] to MyObject or
> another class) in memory (not writing to disk).
>
> I couldn't figure out how to write one method to handle it for
> SpecificRecord, so I tired serializing/deserializing these things as
> GenericRecord instead:
>
> public static byte[] serializeFromAvro(GenericRecord gr) {
> try {
> DatumWriter<GenericRecord> writer2 = new
> GenericDatumWriter<GenericRecord>(gr.getSchema());
> ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
> BinaryEncoder encoder2 =
> EncoderFactory.get().directBinaryEncoder(bao2, null);
> writer2.write(gr, encoder2);
> byte[] avroBytes2 = bao2.toByteArray();
> return avroBytes2;
> } catch (IOException e) {
> LOG.debug(e);
> return null;
> }
> }
>
> // Here I use a DataType enum and the AvroSchemaFactory to quickly
> retrieve a Schema object for a supported DataType.
>
> public static GenericRecord deserializeFromAvro(byte[] avroBytes,
> DataType dataType) {
> try {
> Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
> DatumReader<GenericRecord> reader2 = new
> GenericDatumReader<GenericRecord>(schema);
> ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
> BinaryDecoder decoder2 =
> DecoderFactory.get().directBinaryDecoder(bai2, null);
> GenericRecord gr2 = reader2.read(null, decoder2);
> return gr2;
> } catch (Exception e) {
> LOG.debug(e);
> return null;
> }
> }
>
> And use them like such:
>
> // Remember MyObject is the SpecificRecord implementing class.
>
> MyObject x = new MyObject();
>
> byte[] avroBytes = serializeFromAvro(x);
>
> MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
>
> Which results in this:
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
> cannot be cast to datatypes.generated.avro.MyObject
>
> Is there an easier way to achieve my use case, or some way I can fix my
> methods to allow the sort of behavior I want?
>
> Thanks,
>
> Gary
>
>
>
RE: General-Purpose Serialization and Deserialization for
Avro-Generated SpecificRecords
Posted by Dave McAlpin <dm...@inome.com>.
That's great Gary. Thanks for the follow up.
Dave
From: flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] On Behalf Of Gary Steelman
Sent: Tuesday, February 18, 2014 5:15 PM
To: Gary Steelman
Cc: user@avro.apache.org
Subject: Re: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
Hey all, I've adapted Dave's solution to serialize to/from byte[] rather than JSON. Thanks a lot! The two methods are below:
@SuppressWarnings("unchecked")
public static <T> byte[] avroSerialize(Class<T> clazz, Object object) {
byte[] ret = null;
try {
if (object == null || !(object instanceof SpecificRecord)) {
return null;
}
T record = (T) object;
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
w.write(record, e);
e.flush();
ret = out.toByteArray();
} catch (IOException e) {
LOG.debug(e);
}
return ret;
}
public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz, Schema schema) {
T ret = null;
try {
ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz);
ret = reader.read(null, d);
} catch (IOException e) {
LOG.debug(e);
}
return ret;
}
And they're called like so:
MyObject x = new MyObject();
byte[] avroBytes = avroSerialize(x.getClass(), x);
MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);
Thanks,
Gary
On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <ga...@gmail.com>> wrote:
Thank you Dave, I appreciate it. I'll give those a shot and let you know how it goes.
-Gary
On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com>> wrote:
Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary.
public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
String avroEncodedJson = null;
try {
if (object == null || !(object instanceof SpecificRecord)) {
return null;
}
T record = (T) object;
Schema schema = ((SpecificRecord) record).getSchema();
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
w.write(record, e);
e.flush();
avroEncodedJson = new String(out.toByteArray());
} catch (IOException e) {
e.printStackTrace();
}
return avroEncodedJson;
}
public <T> T jsonDecodeToAvro(String inputString, Class<T> className, Schema schema) {
T returnObject = null;
try {
JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString);
SpecificDatumReader<T> reader = new SpecificDatumReader<T>(className);
returnObject = reader.read(null, jsonDecoder);
} catch (IOException e) {
e.printStackTrace();
}
return returnObject;
}
Dave
From: flaming.zelda@gmail.com<ma...@gmail.com> [mailto:flaming.zelda@gmail.com<ma...@gmail.com>] On Behalf Of Gary Steelman
Sent: Tuesday, February 18, 2014 4:21 PM
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
Hi all,
Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead:
public static byte[] serializeFromAvro(GenericRecord gr) {
try {
DatumWriter<GenericRecord> writer2 = new GenericDatumWriter<GenericRecord>(gr.getSchema());
ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null);
writer2.write(gr, encoder2);
byte[] avroBytes2 = bao2.toByteArray();
return avroBytes2;
} catch (IOException e) {
LOG.debug(e);
return null;
}
}
// Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType.
public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) {
try {
Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
DatumReader<GenericRecord> reader2 = new GenericDatumReader<GenericRecord>(schema);
ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null);
GenericRecord gr2 = reader2.read(null, decoder2);
return gr2;
} catch (Exception e) {
LOG.debug(e);
return null;
}
}
And use them like such:
// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject
Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want?
Thanks,
Gary
Re: General-Purpose Serialization and Deserialization for
Avro-Generated SpecificRecords
Posted by Gary Steelman <ga...@gmail.com>.
Hey all, I've adapted Dave's solution to serialize to/from byte[] rather
than JSON. Thanks a lot! The two methods are below:
@SuppressWarnings("unchecked")
public static <T> byte[] avroSerialize(Class<T> clazz, Object object) {
byte[] ret = null;
try {
if (object == null || !(object instanceof SpecificRecord)) {
return null;
}
T record = (T) object;
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
w.write(record, e);
e.flush();
ret = out.toByteArray();
} catch (IOException e) {
LOG.debug(e);
}
return ret;
}
public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz,
Schema schema) {
T ret = null;
try {
ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz);
ret = reader.read(null, d);
} catch (IOException e) {
LOG.debug(e);
}
return ret;
}
And they're called like so:
MyObject x = new MyObject();
byte[] avroBytes = avroSerialize(x.getClass(), x);
MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);
Thanks,
Gary
On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <ga...@gmail.com>wrote:
> Thank you Dave, I appreciate it. I'll give those a shot and let you know
> how it goes.
>
> -Gary
> On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com> wrote:
>
>> Here are some utility functions we've used for serialization to and
>> from JSON. Something similar should work for binary.
>>
>>
>>
>> public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
>>
>> String avroEncodedJson = null;
>>
>> try {
>>
>> if (object == null || !(object instanceof SpecificRecord)) {
>>
>> return null;
>>
>> }
>>
>> T record = (T) object;
>>
>> Schema schema = ((SpecificRecord) record).getSchema();
>>
>> ByteArrayOutputStream out = new ByteArrayOutputStream();
>>
>> Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
>>
>> SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>>
>> w.write(record, e);
>>
>> e.flush();
>>
>> avroEncodedJson = new String(out.toByteArray());
>>
>> } catch (IOException e) {
>>
>> e.printStackTrace();
>>
>> }
>>
>>
>>
>> return avroEncodedJson;
>>
>> }
>>
>>
>>
>> public <T> T jsonDecodeToAvro(String inputString, Class<T> className,
>> Schema schema) {
>>
>> T returnObject = null;
>>
>> try {
>>
>> JsonDecoder jsonDecoder =
>> DecoderFactory.get().jsonDecoder(schema, inputString);
>>
>> SpecificDatumReader<T> reader = new
>> SpecificDatumReader<T>(className);
>>
>> returnObject = reader.read(null, jsonDecoder);
>>
>> } catch (IOException e) {
>>
>> e.printStackTrace();
>>
>> }
>>
>>
>>
>> return returnObject;
>>
>> }
>>
>>
>>
>> Dave
>>
>>
>>
>> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
>> Behalf Of *Gary Steelman
>> *Sent:* Tuesday, February 18, 2014 4:21 PM
>> *To:* user@avro.apache.org
>> *Subject:* General-Purpose Serialization and Deserialization for
>> Avro-Generated SpecificRecords
>>
>>
>>
>> Hi all,
>>
>> Here's my use case: I've got a bunch of different Java objects generated
>> from Avro schema files. So the class definition headers look something like
>> this: public class MyObject extends
>> org.apache.avro.specific.SpecificRecordBase implements
>> org.apache.avro.specific.SpecificRecord. I've got many other types than
>> MyObject too. I need to write a method which can serialize (from MyObject
>> or another class to byte[]) and deserialize (from byte[] to MyObject or
>> another class) in memory (not writing to disk).
>>
>> I couldn't figure out how to write one method to handle it for
>> SpecificRecord, so I tired serializing/deserializing these things as
>> GenericRecord instead:
>>
>> public static byte[] serializeFromAvro(GenericRecord gr) {
>> try {
>> DatumWriter<GenericRecord> writer2 = new
>> GenericDatumWriter<GenericRecord>(gr.getSchema());
>> ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
>> BinaryEncoder encoder2 =
>> EncoderFactory.get().directBinaryEncoder(bao2, null);
>> writer2.write(gr, encoder2);
>> byte[] avroBytes2 = bao2.toByteArray();
>> return avroBytes2;
>> } catch (IOException e) {
>> LOG.debug(e);
>> return null;
>> }
>> }
>>
>> // Here I use a DataType enum and the AvroSchemaFactory to quickly
>> retrieve a Schema object for a supported DataType.
>>
>> public static GenericRecord deserializeFromAvro(byte[] avroBytes,
>> DataType dataType) {
>> try {
>> Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
>> DatumReader<GenericRecord> reader2 = new
>> GenericDatumReader<GenericRecord>(schema);
>> ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
>> BinaryDecoder decoder2 =
>> DecoderFactory.get().directBinaryDecoder(bai2, null);
>> GenericRecord gr2 = reader2.read(null, decoder2);
>> return gr2;
>> } catch (Exception e) {
>> LOG.debug(e);
>> return null;
>> }
>> }
>>
>> And use them like such:
>>
>> // Remember MyObject is the SpecificRecord implementing class.
>>
>> MyObject x = new MyObject();
>>
>> byte[] avroBytes = serializeFromAvro(x);
>>
>> MyObject x2 = (MyObject) deserializeFromAvro(avroBytes,
>> DataType.MyObject);
>>
>> Which results in this:
>> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
>> cannot be cast to datatypes.generated.avro.MyObject
>>
>> Is there an easier way to achieve my use case, or some way I can fix my
>> methods to allow the sort of behavior I want?
>>
>> Thanks,
>>
>> Gary
>>
>
RE: General-Purpose Serialization and Deserialization for
Avro-Generated SpecificRecords
Posted by Gary Steelman <ga...@gmail.com>.
Thank you Dave, I appreciate it. I'll give those a shot and let you know
how it goes.
-Gary
On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com> wrote:
> Here are some utility functions we've used for serialization to and from
> JSON. Something similar should work for binary.
>
>
>
> public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
>
> String avroEncodedJson = null;
>
> try {
>
> if (object == null || !(object instanceof SpecificRecord)) {
>
> return null;
>
> }
>
> T record = (T) object;
>
> Schema schema = ((SpecificRecord) record).getSchema();
>
> ByteArrayOutputStream out = new ByteArrayOutputStream();
>
> Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
>
> SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>
> w.write(record, e);
>
> e.flush();
>
> avroEncodedJson = new String(out.toByteArray());
>
> } catch (IOException e) {
>
> e.printStackTrace();
>
> }
>
>
>
> return avroEncodedJson;
>
> }
>
>
>
> public <T> T jsonDecodeToAvro(String inputString, Class<T> className,
> Schema schema) {
>
> T returnObject = null;
>
> try {
>
> JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema,
> inputString);
>
> SpecificDatumReader<T> reader = new
> SpecificDatumReader<T>(className);
>
> returnObject = reader.read(null, jsonDecoder);
>
> } catch (IOException e) {
>
> e.printStackTrace();
>
> }
>
>
>
> return returnObject;
>
> }
>
>
>
> Dave
>
>
>
> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
> Behalf Of *Gary Steelman
> *Sent:* Tuesday, February 18, 2014 4:21 PM
> *To:* user@avro.apache.org
> *Subject:* General-Purpose Serialization and Deserialization for
> Avro-Generated SpecificRecords
>
>
>
> Hi all,
>
> Here's my use case: I've got a bunch of different Java objects generated
> from Avro schema files. So the class definition headers look something like
> this: public class MyObject extends
> org.apache.avro.specific.SpecificRecordBase implements
> org.apache.avro.specific.SpecificRecord. I've got many other types than
> MyObject too. I need to write a method which can serialize (from MyObject
> or another class to byte[]) and deserialize (from byte[] to MyObject or
> another class) in memory (not writing to disk).
>
> I couldn't figure out how to write one method to handle it for
> SpecificRecord, so I tired serializing/deserializing these things as
> GenericRecord instead:
>
> public static byte[] serializeFromAvro(GenericRecord gr) {
> try {
> DatumWriter<GenericRecord> writer2 = new
> GenericDatumWriter<GenericRecord>(gr.getSchema());
> ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
> BinaryEncoder encoder2 =
> EncoderFactory.get().directBinaryEncoder(bao2, null);
> writer2.write(gr, encoder2);
> byte[] avroBytes2 = bao2.toByteArray();
> return avroBytes2;
> } catch (IOException e) {
> LOG.debug(e);
> return null;
> }
> }
>
> // Here I use a DataType enum and the AvroSchemaFactory to quickly
> retrieve a Schema object for a supported DataType.
>
> public static GenericRecord deserializeFromAvro(byte[] avroBytes,
> DataType dataType) {
> try {
> Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
> DatumReader<GenericRecord> reader2 = new
> GenericDatumReader<GenericRecord>(schema);
> ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
> BinaryDecoder decoder2 =
> DecoderFactory.get().directBinaryDecoder(bai2, null);
> GenericRecord gr2 = reader2.read(null, decoder2);
> return gr2;
> } catch (Exception e) {
> LOG.debug(e);
> return null;
> }
> }
>
> And use them like such:
>
> // Remember MyObject is the SpecificRecord implementing class.
>
> MyObject x = new MyObject();
>
> byte[] avroBytes = serializeFromAvro(x);
>
> MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
>
> Which results in this:
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
> cannot be cast to datatypes.generated.avro.MyObject
>
> Is there an easier way to achieve my use case, or some way I can fix my
> methods to allow the sort of behavior I want?
>
> Thanks,
>
> Gary
>
RE: General-Purpose Serialization and Deserialization for
Avro-Generated SpecificRecords
Posted by Dave McAlpin <dm...@inome.com>.
Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary.
public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
String avroEncodedJson = null;
try {
if (object == null || !(object instanceof SpecificRecord)) {
return null;
}
T record = (T) object;
Schema schema = ((SpecificRecord) record).getSchema();
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
w.write(record, e);
e.flush();
avroEncodedJson = new String(out.toByteArray());
} catch (IOException e) {
e.printStackTrace();
}
return avroEncodedJson;
}
public <T> T jsonDecodeToAvro(String inputString, Class<T> className, Schema schema) {
T returnObject = null;
try {
JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString);
SpecificDatumReader<T> reader = new SpecificDatumReader<T>(className);
returnObject = reader.read(null, jsonDecoder);
} catch (IOException e) {
e.printStackTrace();
}
return returnObject;
}
Dave
From: flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] On Behalf Of Gary Steelman
Sent: Tuesday, February 18, 2014 4:21 PM
To: user@avro.apache.org
Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
Hi all,
Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead:
public static byte[] serializeFromAvro(GenericRecord gr) {
try {
DatumWriter<GenericRecord> writer2 = new GenericDatumWriter<GenericRecord>(gr.getSchema());
ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null);
writer2.write(gr, encoder2);
byte[] avroBytes2 = bao2.toByteArray();
return avroBytes2;
} catch (IOException e) {
LOG.debug(e);
return null;
}
}
// Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType.
public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) {
try {
Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
DatumReader<GenericRecord> reader2 = new GenericDatumReader<GenericRecord>(schema);
ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null);
GenericRecord gr2 = reader2.read(null, decoder2);
return gr2;
} catch (Exception e) {
LOG.debug(e);
return null;
}
}
And use them like such:
// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject
Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want?
Thanks,
Gary