You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Tom van den Berge <to...@gmail.com> on 2017/01/06 12:50:54 UTC

Binding to java objects using Jackson

Hi,

I'm would like to serialize from and deserialize to java objects. If I'm
correct, Avro offers three possibilities:

GenericDatumReader/Writer
SpecificDatumReader/Writer
ReflectDatumReader/Writer

The GenericDatum uses a generic, map-like structure. It would require some
additional code to convert it into the java objects I want to use. Creating
and maintaining this code makes this approach not very useful for me.

The SpecificDatum is used with the objects that can be generated by Avro.
These java objects are automatically instantiated, which is what I want.
But it requires me to use the generated classes in my code, and as with all
generated code, it's wise not to change them yourself. The generated
classes also extend an Avro class, which restricts the use of the classes
too much for me.

The ReflectDatum uses reflection to instantiate classes, and doesn't come
with the restrictions of the generated classes. But it requires the classes
to be java beans, so it must have public setter methods for all properties.
This makes this approach not useful to me, since the classes I'd like to
use are designed to be immutable.

To summarize, the three options all force me to design the classes I'd like
to use is a specific way, which are too restrictive for me. I'm a big fan
of Jackson, because it allows me to deserialize to any POJO, regardless of
how the POJO is designed. So I was thinking of how great it would be if I
could use Jackson to deserialize from avro to java objects (and vv).

Then I found jackson-dataformat-avro(
https://github.com/FasterXML/jackson-dataformats-binary/tree/master/avro),
which is a Jackson extension to serialize to avro and back. This works
great, but, as far as I could find, only allows serializing a single
object. It does not provide a way to serialize multiple objects to a single
file or stream, to generate an Avro object container file. This is actually
what I would like to achieve.

For serialization, I found a way to do this (it seems to be working at
least): I'm using the DataFileWriter.appendEncoded(ByteBuffer) method to
write objects that I serialized using jackson-dataformat-avro.

But I failed find a way to do deserialization of an object container file
using Jackson. I tried the following.
I created my own JacksonDatumReader, which basically does this:

public T read(T reuse, Decoder in) throws IOException {
   InputStream inputStream = ((BinaryDecoder) in).inputStream();
   return objectReader.readValue(inputStream);
}

The objectReader is a com.fasterxml.jackson.dataformat.avro.AvroMapper that
does the actual parsing and binding.

This works fine for the very first object in the file, but it crashes at
the second. The problem seems to be that for the second object, the
InputStream is already fully read -- there are no bytes left to read. It
seems that the Decoder that is passed in is a wrapper for an entire data
block, containing the data for multiple objects. When the second object is
parsed, the same Decoder instance is passed in. The decoder keeps the read
position internally, but for the second object, it seems that the entire
input is already read, and there is no data left to read, and an
EOFException is thrown.

Does anyone know a way to make this JacksonDatumRead work? Or maybe anyone
knows of another way to deserialize an avro file and use Jackson data
binding?

Thanks,
Tom