You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Wei Yang <we...@weizilla.com> on 2011/07/13 21:15:23 UTC

DataFileWriter/Reader vs Encoder/Decoders in Java

I'm just learning Avro but am quite confused about the different ways of
writing and reading data in Java.

Along with the required GenericDatumWriter/Reader (or Specific), it seems
like there are two ways of writing/reading data: one using the
DataFileWriter/Reader and Encoder/Decoders.

>From the code that I've played around with and seen, the DataFileWriter
method only writes in binary and will encode the schema into the output.
With the Encoder/Decoder method, the output can be Json or binary but will
not contain the schema. Therefore, when using that method, the writer schema
file is required on the reader side.

So my questions are: 1.) Is there a way to encode the schema into the output
using the Encoder/Decoder method? 2.) Is there a way to encode the schema
into an Json output? 3.) How does the DataFileWrite/Reader,
GenericDatumWriter/Reader and Encoder/Decoder all fit together
architecturally?

Re: DataFileWriter/Reader vs Encoder/Decoders in Java

Posted by Wei Yang <we...@weizilla.com>.
Ahhh ok. I thought that json encoding was a fully implemented standard which
can be used alternatively with binary. Since it's not, it makes sense as to
why there's no application level API for it.

This clears up a lot of confusion. Thanks for your help Doug!
On Jul 13, 2011 3:57 PM, "Doug Cutting" <cu...@apache.org> wrote:
> There is not currently a standard Avro json-encoded file format.
>
> Note that most Avro implementations don't support the JSON encoding, and
> we want file formats to be maximally interoperable, so we perhaps
> shouldn't encourage folks to use the JSON encoding for persistent data.
>
> I have in one instance used the following textual format for
> JSON-encoded Avro data:
> - first line of file is schema
> - one JSON object per following line in file, conforming to schema
>
> Perhaps we could standardize on something like this if there are
> applications for an Avro textual, line-based file format.
>
> Doug
>
> On 07/13/2011 01:27 PM, Wei Yang wrote:
>> I read somewhere that it's suppose to be easy to switch between binary
>> encoding and json encoding for application testing and debugging. I also
>> read that schemas were always added to the data output so the schema
>> file itself is not required on the reader side. But after playing around
>> with the code, I only figured out how save the schema with the data in
>> binary (DataFileWriter method) or without the schema in json (Encoder
>> method). That's when I became confused as to which method to use and if
>> its possible to save data in json with a schema.
>>

Re: DataFileWriter/Reader vs Encoder/Decoders in Java

Posted by Doug Cutting <cu...@apache.org>.
There is not currently a standard Avro json-encoded file format.

Note that most Avro implementations don't support the JSON encoding, and
we want file formats to be maximally interoperable, so we perhaps
shouldn't encourage folks to use the JSON encoding for persistent data.

I have in one instance used the following textual format for
JSON-encoded Avro data:
 - first line of file is schema
 - one JSON object per following line in file, conforming to schema

Perhaps we could standardize on something like this if there are
applications for an Avro textual, line-based file format.

Doug

On 07/13/2011 01:27 PM, Wei Yang wrote:
> I read somewhere that it's suppose to be easy to switch between binary
> encoding and json encoding for application testing and debugging. I also
> read that schemas were always added to the data output so the schema
> file itself is not required on the reader side. But after playing around
> with the code, I only figured out how save the schema with the data in
> binary (DataFileWriter method) or without the schema in json (Encoder
> method). That's when I became confused as to which method to use and if
> its possible to save data in json with a schema.
> 

Re: DataFileWriter/Reader vs Encoder/Decoders in Java

Posted by Wei Yang <we...@weizilla.com>.
Thanks for the architecture clarification! That helps a lot.

I read somewhere that it's suppose to be easy to switch between binary
encoding and json encoding for application testing and debugging. I also
read that schemas were always added to the data output so the schema file
itself is not required on the reader side. But after playing around with the
code, I only figured out how save the schema with the data in binary
(DataFileWriter method) or without the schema in json (Encoder method).
That's when I became confused as to which method to use and if its possible
to save data in json with a schema.
On Jul 13, 2011 2:45 PM, "Doug Cutting" <cu...@apache.org> wrote:
> On 07/13/2011 12:15 PM, Wei Yang wrote:
>
>> So my questions are: 1.) Is there a way to encode the schema into the
>> output using the Encoder/Decoder method? 2.) Is there a way to encode
>> the schema into an Json output?
>
> One can encode the schema as a string. What is the use case you have in
> mind?
>
>> 3.) How does the DataFileWrite/Reader,
>> GenericDatumWriter/Reader and Encoder/Decoder all fit together
>> architecturally?
>
> Encoder and Decoder are the lowest level APIs. They primarily encode
> and decode primitive values. These are not generally used directly by
> applications, although an application might use them as a event-oriented
> API, akin to XML's SAX.
>
> DatumReader and DatumWriter are mid-level APIs. They serialize and
> de-serialize Java objects using an Encoder or Decoder. Different
> versions are used for different Java representations of a Schema. These
> are used by folks implementing Avro data containers, like data files or
RPC.
>
> DataFileReader and DataFileWriter are the application-level APIs. They
> permit one to read and write objects from files.
>
> Doug

Re: DataFileWriter/Reader vs Encoder/Decoders in Java

Posted by Doug Cutting <cu...@apache.org>.
On 07/13/2011 12:15 PM, Wei Yang wrote:

> So my questions are: 1.) Is there a way to encode the schema into the
> output using the Encoder/Decoder method? 2.) Is there a way to encode
> the schema into an Json output?

One can encode the schema as a string.  What is the use case you have in
mind?

> 3.) How does the DataFileWrite/Reader,
> GenericDatumWriter/Reader and Encoder/Decoder all fit together
> architecturally?

Encoder and Decoder are the lowest level APIs.  They primarily encode
and decode primitive values.  These are not generally used directly by
applications, although an application might use them as a event-oriented
API, akin to XML's SAX.

DatumReader and DatumWriter are mid-level APIs.  They serialize and
de-serialize Java objects using an Encoder or Decoder.  Different
versions are used for different Java representations of a Schema.  These
are used by folks implementing Avro data containers, like data files or RPC.

DataFileReader and DataFileWriter are the application-level APIs.  They
permit one to read and write objects from files.

Doug