You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Gary Steelman <ga...@gmail.com> on 2014/02/19 01:21:29 UTC

General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Hi all,

Here's my use case: I've got a bunch of different Java objects generated
from Avro schema files. So the class definition headers look something like
this: public class MyObject extends
org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord. I've got many other types than
MyObject too. I need to write a method which can serialize (from MyObject
or another class to byte[]) and deserialize (from byte[] to MyObject or
another class) in memory (not writing to disk).

I couldn't figure out how to write one method to handle it for
SpecificRecord, so I tired serializing/deserializing these things as
GenericRecord instead:

  public static byte[] serializeFromAvro(GenericRecord gr) {
    try {
      DatumWriter<GenericRecord> writer2 = new
GenericDatumWriter<GenericRecord>(gr.getSchema());
      ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
      BinaryEncoder encoder2 =
EncoderFactory.get().directBinaryEncoder(bao2, null);
      writer2.write(gr, encoder2);
      byte[] avroBytes2 = bao2.toByteArray();
      return avroBytes2;
    } catch (IOException e) {
      LOG.debug(e);
      return null;
    }
  }

  // Here I use a DataType enum and the AvroSchemaFactory to quickly
retrieve a Schema object for a supported DataType.
  public static GenericRecord deserializeFromAvro(byte[] avroBytes,
DataType dataType) {
    try {
      Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
      DatumReader<GenericRecord> reader2 = new
GenericDatumReader<GenericRecord>(schema);
      ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
      BinaryDecoder decoder2 =
DecoderFactory.get().directBinaryDecoder(bai2, null);
      GenericRecord gr2 = reader2.read(null, decoder2);
      return gr2;
    } catch (Exception e) {
      LOG.debug(e);
      return null;
    }
  }

And use them like such:

// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);

Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to datatypes.generated.avro.MyObject

Is there an easier way to achieve my use case, or some way I can fix my
methods to allow the sort of behavior I want?

Thanks,
Gary

Re: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Posted by Devin Suiter RDX <ds...@rdx.com>.
Guys,

These are great examples. Awesome. Thanks!

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Tue, Feb 18, 2014 at 8:57 PM, Dave McAlpin <dm...@inome.com> wrote:

>  That's great Gary. Thanks for the follow up.
>
>
>
> Dave
>
>
>
> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
> Behalf Of *Gary Steelman
> *Sent:* Tuesday, February 18, 2014 5:15 PM
> *To:* Gary Steelman
> *Cc:* user@avro.apache.org
> *Subject:* Re: General-Purpose Serialization and Deserialization for
> Avro-Generated SpecificRecords
>
>
>
> Hey all, I've adapted Dave's solution to serialize to/from byte[] rather
> than JSON. Thanks a lot! The two methods are below:
>
>   @SuppressWarnings("unchecked")
>   public static <T> byte[] avroSerialize(Class<T> clazz, Object object) {
>     byte[] ret = null;
>     try {
>       if (object == null || !(object instanceof SpecificRecord)) {
>         return null;
>       }
>
>       T record = (T) object;
>       ByteArrayOutputStream out = new ByteArrayOutputStream();
>       Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
>       SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>       w.write(record, e);
>       e.flush();
>       ret = out.toByteArray();
>     } catch (IOException e) {
>       LOG.debug(e);
>     }
>
>     return ret;
>   }
>
>   public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz,
> Schema schema) {
>     T ret = null;
>     try {
>       ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
>       Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
>       SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz);
>       ret = reader.read(null, d);
>     } catch (IOException e) {
>       LOG.debug(e);
>     }
>
>     return ret;
>   }
>
> And they're called like so:
>
> MyObject x = new MyObject();
>
> byte[] avroBytes = avroSerialize(x.getClass(), x);
>
> MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);
>
> Thanks,
> Gary
>
>
>
> On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <ga...@gmail.com>
> wrote:
>
> Thank you Dave, I appreciate it. I'll give those a shot and let you know
> how it goes.
>
> -Gary
>
> On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com> wrote:
>
> Here are some utility functions we've used for serialization to and from
> JSON. Something similar should work for binary.
>
>
>
> public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
>
>     String avroEncodedJson = null;
>
>     try {
>
>         if (object == null || !(object instanceof SpecificRecord)) {
>
>             return null;
>
>         }
>
>         T record = (T) object;
>
>         Schema schema = ((SpecificRecord) record).getSchema();
>
>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>
>         Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
>
>         SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>
>         w.write(record, e);
>
>         e.flush();
>
>         avroEncodedJson = new String(out.toByteArray());
>
>     } catch (IOException e) {
>
>         e.printStackTrace();
>
>     }
>
>
>
>     return avroEncodedJson;
>
> }
>
>
>
> public <T> T jsonDecodeToAvro(String inputString, Class<T> className,
> Schema schema) {
>
>     T returnObject = null;
>
>     try {
>
>         JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema,
> inputString);
>
>         SpecificDatumReader<T> reader = new
> SpecificDatumReader<T>(className);
>
>         returnObject = reader.read(null, jsonDecoder);
>
>     } catch (IOException e) {
>
>         e.printStackTrace();
>
>     }
>
>
>
>     return returnObject;
>
> }
>
>
>
> Dave
>
>
>
> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
> Behalf Of *Gary Steelman
> *Sent:* Tuesday, February 18, 2014 4:21 PM
> *To:* user@avro.apache.org
> *Subject:* General-Purpose Serialization and Deserialization for
> Avro-Generated SpecificRecords
>
>
>
> Hi all,
>
> Here's my use case: I've got a bunch of different Java objects generated
> from Avro schema files. So the class definition headers look something like
> this: public class MyObject extends
> org.apache.avro.specific.SpecificRecordBase implements
> org.apache.avro.specific.SpecificRecord. I've got many other types than
> MyObject too. I need to write a method which can serialize (from MyObject
> or another class to byte[]) and deserialize (from byte[] to MyObject or
> another class) in memory (not writing to disk).
>
> I couldn't figure out how to write one method to handle it for
> SpecificRecord, so I tired serializing/deserializing these things as
> GenericRecord instead:
>
>   public static byte[] serializeFromAvro(GenericRecord gr) {
>     try {
>       DatumWriter<GenericRecord> writer2 = new
> GenericDatumWriter<GenericRecord>(gr.getSchema());
>       ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
>       BinaryEncoder encoder2 =
> EncoderFactory.get().directBinaryEncoder(bao2, null);
>       writer2.write(gr, encoder2);
>       byte[] avroBytes2 = bao2.toByteArray();
>       return avroBytes2;
>     } catch (IOException e) {
>       LOG.debug(e);
>       return null;
>     }
>   }
>
>   // Here I use a DataType enum and the AvroSchemaFactory to quickly
> retrieve a Schema object for a supported DataType.
>
>   public static GenericRecord deserializeFromAvro(byte[] avroBytes,
> DataType dataType) {
>     try {
>       Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
>       DatumReader<GenericRecord> reader2 = new
> GenericDatumReader<GenericRecord>(schema);
>       ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
>       BinaryDecoder decoder2 =
> DecoderFactory.get().directBinaryDecoder(bai2, null);
>       GenericRecord gr2 = reader2.read(null, decoder2);
>       return gr2;
>     } catch (Exception e) {
>       LOG.debug(e);
>       return null;
>     }
>   }
>
> And use them like such:
>
> // Remember MyObject is the SpecificRecord implementing class.
>
> MyObject x = new MyObject();
>
> byte[] avroBytes = serializeFromAvro(x);
>
> MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
>
> Which results in this:
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
> cannot be cast to datatypes.generated.avro.MyObject
>
> Is there an easier way to achieve my use case, or some way I can fix my
> methods to allow the sort of behavior I want?
>
> Thanks,
>
> Gary
>
>
>

RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Posted by Dave McAlpin <dm...@inome.com>.
That's great Gary. Thanks for the follow up.

Dave

From: flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] On Behalf Of Gary Steelman
Sent: Tuesday, February 18, 2014 5:15 PM
To: Gary Steelman
Cc: user@avro.apache.org
Subject: Re: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Hey all, I've adapted Dave's solution to serialize to/from byte[] rather than JSON. Thanks a lot! The two methods are below:

  @SuppressWarnings("unchecked")
  public static <T> byte[] avroSerialize(Class<T> clazz, Object object) {
    byte[] ret = null;
    try {
      if (object == null || !(object instanceof SpecificRecord)) {
        return null;
      }

      T record = (T) object;
      ByteArrayOutputStream out = new ByteArrayOutputStream();
      Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
      SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
      w.write(record, e);
      e.flush();
      ret = out.toByteArray();
    } catch (IOException e) {
      LOG.debug(e);
    }

    return ret;
  }

  public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz, Schema schema) {
    T ret = null;
    try {
      ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
      Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
      SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz);
      ret = reader.read(null, d);
    } catch (IOException e) {
      LOG.debug(e);
    }

    return ret;
  }
And they're called like so:
MyObject x = new MyObject();
byte[] avroBytes = avroSerialize(x.getClass(), x);
MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);
Thanks,
Gary

On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <ga...@gmail.com>> wrote:

Thank you Dave, I appreciate it. I'll give those a shot and let you know how it goes.

-Gary
On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com>> wrote:
Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary.

public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
    String avroEncodedJson = null;
    try {
        if (object == null || !(object instanceof SpecificRecord)) {
            return null;
        }
        T record = (T) object;
        Schema schema = ((SpecificRecord) record).getSchema();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
        SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
        w.write(record, e);
        e.flush();
        avroEncodedJson = new String(out.toByteArray());
    } catch (IOException e) {
        e.printStackTrace();
    }

    return avroEncodedJson;
}

public <T> T jsonDecodeToAvro(String inputString, Class<T> className, Schema schema) {
    T returnObject = null;
    try {
        JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString);
        SpecificDatumReader<T> reader = new SpecificDatumReader<T>(className);
        returnObject = reader.read(null, jsonDecoder);
    } catch (IOException e) {
        e.printStackTrace();
    }

    return returnObject;
}

Dave

From: flaming.zelda@gmail.com<ma...@gmail.com> [mailto:flaming.zelda@gmail.com<ma...@gmail.com>] On Behalf Of Gary Steelman
Sent: Tuesday, February 18, 2014 4:21 PM
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Hi all,
Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead:

  public static byte[] serializeFromAvro(GenericRecord gr) {
    try {
      DatumWriter<GenericRecord> writer2 = new GenericDatumWriter<GenericRecord>(gr.getSchema());
      ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
      BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null);
      writer2.write(gr, encoder2);
      byte[] avroBytes2 = bao2.toByteArray();
      return avroBytes2;
    } catch (IOException e) {
      LOG.debug(e);
      return null;
    }
  }
  // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType.
  public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) {
    try {
      Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
      DatumReader<GenericRecord> reader2 = new GenericDatumReader<GenericRecord>(schema);
      ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
      BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null);
      GenericRecord gr2 = reader2.read(null, decoder2);
      return gr2;
    } catch (Exception e) {
      LOG.debug(e);
      return null;
    }
  }
And use them like such:
// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject
Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want?
Thanks,
Gary




Re: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Posted by Gary Steelman <ga...@gmail.com>.
Hey all, I've adapted Dave's solution to serialize to/from byte[] rather
than JSON. Thanks a lot! The two methods are below:

  @SuppressWarnings("unchecked")
  public static <T> byte[] avroSerialize(Class<T> clazz, Object object) {
    byte[] ret = null;
    try {
      if (object == null || !(object instanceof SpecificRecord)) {
        return null;
      }

      T record = (T) object;
      ByteArrayOutputStream out = new ByteArrayOutputStream();
      Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
      SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
      w.write(record, e);
      e.flush();
      ret = out.toByteArray();
    } catch (IOException e) {
      LOG.debug(e);
    }

    return ret;
  }

  public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz,
Schema schema) {
    T ret = null;
    try {
      ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
      Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
      SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz);
      ret = reader.read(null, d);
    } catch (IOException e) {
      LOG.debug(e);
    }

    return ret;
  }

And they're called like so:
MyObject x = new MyObject();
byte[] avroBytes = avroSerialize(x.getClass(), x);
MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);

Thanks,
Gary


On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <ga...@gmail.com>wrote:

> Thank you Dave, I appreciate it. I'll give those a shot and let you know
> how it goes.
>
> -Gary
> On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com> wrote:
>
>>  Here are some utility functions we've used for serialization to and
>> from JSON. Something similar should work for binary.
>>
>>
>>
>> public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
>>
>>     String avroEncodedJson = null;
>>
>>     try {
>>
>>         if (object == null || !(object instanceof SpecificRecord)) {
>>
>>             return null;
>>
>>         }
>>
>>         T record = (T) object;
>>
>>         Schema schema = ((SpecificRecord) record).getSchema();
>>
>>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>>
>>         Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
>>
>>         SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>>
>>         w.write(record, e);
>>
>>         e.flush();
>>
>>         avroEncodedJson = new String(out.toByteArray());
>>
>>     } catch (IOException e) {
>>
>>         e.printStackTrace();
>>
>>     }
>>
>>
>>
>>     return avroEncodedJson;
>>
>> }
>>
>>
>>
>> public <T> T jsonDecodeToAvro(String inputString, Class<T> className,
>> Schema schema) {
>>
>>     T returnObject = null;
>>
>>     try {
>>
>>         JsonDecoder jsonDecoder =
>> DecoderFactory.get().jsonDecoder(schema, inputString);
>>
>>         SpecificDatumReader<T> reader = new
>> SpecificDatumReader<T>(className);
>>
>>         returnObject = reader.read(null, jsonDecoder);
>>
>>     } catch (IOException e) {
>>
>>         e.printStackTrace();
>>
>>     }
>>
>>
>>
>>     return returnObject;
>>
>> }
>>
>>
>>
>> Dave
>>
>>
>>
>> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
>> Behalf Of *Gary Steelman
>> *Sent:* Tuesday, February 18, 2014 4:21 PM
>> *To:* user@avro.apache.org
>> *Subject:* General-Purpose Serialization and Deserialization for
>> Avro-Generated SpecificRecords
>>
>>
>>
>> Hi all,
>>
>> Here's my use case: I've got a bunch of different Java objects generated
>> from Avro schema files. So the class definition headers look something like
>> this: public class MyObject extends
>> org.apache.avro.specific.SpecificRecordBase implements
>> org.apache.avro.specific.SpecificRecord. I've got many other types than
>> MyObject too. I need to write a method which can serialize (from MyObject
>> or another class to byte[]) and deserialize (from byte[] to MyObject or
>> another class) in memory (not writing to disk).
>>
>> I couldn't figure out how to write one method to handle it for
>> SpecificRecord, so I tired serializing/deserializing these things as
>> GenericRecord instead:
>>
>>   public static byte[] serializeFromAvro(GenericRecord gr) {
>>     try {
>>       DatumWriter<GenericRecord> writer2 = new
>> GenericDatumWriter<GenericRecord>(gr.getSchema());
>>       ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
>>       BinaryEncoder encoder2 =
>> EncoderFactory.get().directBinaryEncoder(bao2, null);
>>       writer2.write(gr, encoder2);
>>       byte[] avroBytes2 = bao2.toByteArray();
>>       return avroBytes2;
>>     } catch (IOException e) {
>>       LOG.debug(e);
>>       return null;
>>     }
>>   }
>>
>>   // Here I use a DataType enum and the AvroSchemaFactory to quickly
>> retrieve a Schema object for a supported DataType.
>>
>>   public static GenericRecord deserializeFromAvro(byte[] avroBytes,
>> DataType dataType) {
>>     try {
>>       Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
>>       DatumReader<GenericRecord> reader2 = new
>> GenericDatumReader<GenericRecord>(schema);
>>       ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
>>       BinaryDecoder decoder2 =
>> DecoderFactory.get().directBinaryDecoder(bai2, null);
>>       GenericRecord gr2 = reader2.read(null, decoder2);
>>       return gr2;
>>     } catch (Exception e) {
>>       LOG.debug(e);
>>       return null;
>>     }
>>   }
>>
>> And use them like such:
>>
>> // Remember MyObject is the SpecificRecord implementing class.
>>
>> MyObject x = new MyObject();
>>
>> byte[] avroBytes = serializeFromAvro(x);
>>
>> MyObject x2 = (MyObject) deserializeFromAvro(avroBytes,
>> DataType.MyObject);
>>
>> Which results in this:
>> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
>> cannot be cast to datatypes.generated.avro.MyObject
>>
>> Is there an easier way to achieve my use case, or some way I can fix my
>> methods to allow the sort of behavior I want?
>>
>> Thanks,
>>
>> Gary
>>
>

RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Posted by Gary Steelman <ga...@gmail.com>.
Thank you Dave, I appreciate it. I'll give those a shot and let you know
how it goes.

-Gary
On Feb 18, 2014 6:45 PM, "Dave McAlpin" <dm...@inome.com> wrote:

>  Here are some utility functions we've used for serialization to and from
> JSON. Something similar should work for binary.
>
>
>
> public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
>
>     String avroEncodedJson = null;
>
>     try {
>
>         if (object == null || !(object instanceof SpecificRecord)) {
>
>             return null;
>
>         }
>
>         T record = (T) object;
>
>         Schema schema = ((SpecificRecord) record).getSchema();
>
>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>
>         Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
>
>         SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
>
>         w.write(record, e);
>
>         e.flush();
>
>         avroEncodedJson = new String(out.toByteArray());
>
>     } catch (IOException e) {
>
>         e.printStackTrace();
>
>     }
>
>
>
>     return avroEncodedJson;
>
> }
>
>
>
> public <T> T jsonDecodeToAvro(String inputString, Class<T> className,
> Schema schema) {
>
>     T returnObject = null;
>
>     try {
>
>         JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema,
> inputString);
>
>         SpecificDatumReader<T> reader = new
> SpecificDatumReader<T>(className);
>
>         returnObject = reader.read(null, jsonDecoder);
>
>     } catch (IOException e) {
>
>         e.printStackTrace();
>
>     }
>
>
>
>     return returnObject;
>
> }
>
>
>
> Dave
>
>
>
> *From:* flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] *On
> Behalf Of *Gary Steelman
> *Sent:* Tuesday, February 18, 2014 4:21 PM
> *To:* user@avro.apache.org
> *Subject:* General-Purpose Serialization and Deserialization for
> Avro-Generated SpecificRecords
>
>
>
> Hi all,
>
> Here's my use case: I've got a bunch of different Java objects generated
> from Avro schema files. So the class definition headers look something like
> this: public class MyObject extends
> org.apache.avro.specific.SpecificRecordBase implements
> org.apache.avro.specific.SpecificRecord. I've got many other types than
> MyObject too. I need to write a method which can serialize (from MyObject
> or another class to byte[]) and deserialize (from byte[] to MyObject or
> another class) in memory (not writing to disk).
>
> I couldn't figure out how to write one method to handle it for
> SpecificRecord, so I tired serializing/deserializing these things as
> GenericRecord instead:
>
>   public static byte[] serializeFromAvro(GenericRecord gr) {
>     try {
>       DatumWriter<GenericRecord> writer2 = new
> GenericDatumWriter<GenericRecord>(gr.getSchema());
>       ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
>       BinaryEncoder encoder2 =
> EncoderFactory.get().directBinaryEncoder(bao2, null);
>       writer2.write(gr, encoder2);
>       byte[] avroBytes2 = bao2.toByteArray();
>       return avroBytes2;
>     } catch (IOException e) {
>       LOG.debug(e);
>       return null;
>     }
>   }
>
>   // Here I use a DataType enum and the AvroSchemaFactory to quickly
> retrieve a Schema object for a supported DataType.
>
>   public static GenericRecord deserializeFromAvro(byte[] avroBytes,
> DataType dataType) {
>     try {
>       Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
>       DatumReader<GenericRecord> reader2 = new
> GenericDatumReader<GenericRecord>(schema);
>       ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
>       BinaryDecoder decoder2 =
> DecoderFactory.get().directBinaryDecoder(bai2, null);
>       GenericRecord gr2 = reader2.read(null, decoder2);
>       return gr2;
>     } catch (Exception e) {
>       LOG.debug(e);
>       return null;
>     }
>   }
>
> And use them like such:
>
> // Remember MyObject is the SpecificRecord implementing class.
>
> MyObject x = new MyObject();
>
> byte[] avroBytes = serializeFromAvro(x);
>
> MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
>
> Which results in this:
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
> cannot be cast to datatypes.generated.avro.MyObject
>
> Is there an easier way to achieve my use case, or some way I can fix my
> methods to allow the sort of behavior I want?
>
> Thanks,
>
> Gary
>

RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Posted by Dave McAlpin <dm...@inome.com>.
Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary.

public <T> String avroEncodeAsJson(Class<T> clazz, Object object) {
    String avroEncodedJson = null;
    try {
        if (object == null || !(object instanceof SpecificRecord)) {
            return null;
        }
        T record = (T) object;
        Schema schema = ((SpecificRecord) record).getSchema();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
        SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz);
        w.write(record, e);
        e.flush();
        avroEncodedJson = new String(out.toByteArray());
    } catch (IOException e) {
        e.printStackTrace();
    }

    return avroEncodedJson;
}

public <T> T jsonDecodeToAvro(String inputString, Class<T> className, Schema schema) {
    T returnObject = null;
    try {
        JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString);
        SpecificDatumReader<T> reader = new SpecificDatumReader<T>(className);
        returnObject = reader.read(null, jsonDecoder);
    } catch (IOException e) {
        e.printStackTrace();
    }

    return returnObject;
}

Dave

From: flaming.zelda@gmail.com [mailto:flaming.zelda@gmail.com] On Behalf Of Gary Steelman
Sent: Tuesday, February 18, 2014 4:21 PM
To: user@avro.apache.org
Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

Hi all,
Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead:

  public static byte[] serializeFromAvro(GenericRecord gr) {
    try {
      DatumWriter<GenericRecord> writer2 = new GenericDatumWriter<GenericRecord>(gr.getSchema());
      ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
      BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null);
      writer2.write(gr, encoder2);
      byte[] avroBytes2 = bao2.toByteArray();
      return avroBytes2;
    } catch (IOException e) {
      LOG.debug(e);
      return null;
    }
  }
  // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType.
  public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) {
    try {
      Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
      DatumReader<GenericRecord> reader2 = new GenericDatumReader<GenericRecord>(schema);
      ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
      BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null);
      GenericRecord gr2 = reader2.read(null, decoder2);
      return gr2;
    } catch (Exception e) {
      LOG.debug(e);
      return null;
    }
  }
And use them like such:
// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject
Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want?
Thanks,
Gary