You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Pratyush Chandra <ch...@gmail.com> on 2013/01/07 13:42:15 UTC

Embedding schema with binary encoding

I am able to serialize with binary encoding to a file using following :
        FileOutputStream outputStream = new FileOutputStream(file);
        Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
        DatumWriter<GenericRecord> datumWriter = new
GenericDatumWriter<GenericRecord>(schema);
        GenericRecord message1= new GenericData.Record(schema);
        message1.put("to", "Alyssa");
        datumWriter.write(message1, e);
        e.flush();
        outputStream.close();

But the output file contains only serialized data and not schema. How can I
add schema also ?

Thanks
Pratyush Chandra

Re: Embedding schema with binary encoding

Posted by Pratyush Chandra <ch...@gmail.com>.
Thanks Scott. Even I realized, default is binary encoding and not json.

On Thu, Jan 10, 2013 at 12:52 AM, Scott Carey <sc...@apache.org> wrote:

> In an Avro file, it always writes the schema in JSON form in the header.
>  There may be an old JIRA ticket considering the possibility of writing the
> schema in a more compact form.    The data in the file is always encoded in
> Avro binary form, optionally with snappy or deflate(gzip) compression and
> with a variable block size.
>
> On 1/8/13 1:49 AM, "Pratyush Chandra" <ch...@gmail.com> wrote:
>
> Hi Scott,
>
> I am able to find example for json encoding with DataFileWriter which
> embedds schema, but unable to find DataFileWriter example for binary
> encoding with schema.
>
> Thanks
> Pratyush
>
> On Tue, Jan 8, 2013 at 2:56 PM, Scott Carey <sc...@apache.org> wrote:
>
>> Calling toJson() on a Schema will print it in json fom.  However you most
>> likely do not want to invent your own file format for Avro data.
>>
>> DataFileWriter which will manage the schema for you, along with
>> compression, metadata, and the ability to seek to the middle of the file.
>>  Additionally it is then readable by several other languages and tools.
>>
>> On 1/7/13 4:42 AM, "Pratyush Chandra" <ch...@gmail.com> wrote:
>>
>> I am able to serialize with binary encoding to a file using following :
>>         FileOutputStream outputStream = new FileOutputStream(file);
>>         Encoder e = EncoderFactory.get().binaryEncoder(outputStream,
>> null);
>>         DatumWriter<GenericRecord> datumWriter = new
>> GenericDatumWriter<GenericRecord>(schema);
>>         GenericRecord message1= new GenericData.Record(schema);
>>         message1.put("to", "Alyssa");
>>         datumWriter.write(message1, e);
>>         e.flush();
>>         outputStream.close();
>>
>> But the output file contains only serialized data and not schema. How can
>> I add schema also ?
>>
>> Thanks
>> Pratyush Chandra
>>
>>
>
>
> --
> Pratyush Chandra
>
>


-- 
Pratyush Chandra

Re: Embedding schema with binary encoding

Posted by Scott Carey <sc...@apache.org>.
In an Avro file, it always writes the schema in JSON form in the header.
There may be an old JIRA ticket considering the possibility of writing the
schema in a more compact form.    The data in the file is always encoded in
Avro binary form, optionally with snappy or deflate(gzip) compression and
with a variable block size.

On 1/8/13 1:49 AM, "Pratyush Chandra" <ch...@gmail.com> wrote:

> Hi Scott,
> 
> I am able to find example for json encoding with DataFileWriter which embedds
> schema, but unable to find DataFileWriter example for binary encoding with
> schema.
> 
> Thanks
> Pratyush
> 
> On Tue, Jan 8, 2013 at 2:56 PM, Scott Carey <sc...@apache.org> wrote:
>> Calling toJson() on a Schema will print it in json fom.  However you most
>> likely do not want to invent your own file format for Avro data.
>> 
>> DataFileWriter which will manage the schema for you, along with compression,
>> metadata, and the ability to seek to the middle of the file.    Additionally
>> it is then readable by several other languages and tools.
>> 
>> On 1/7/13 4:42 AM, "Pratyush Chandra" <ch...@gmail.com> wrote:
>> 
>>> I am able to serialize with binary encoding to a file using following :
>>>         FileOutputStream outputStream = new FileOutputStream(file);
>>>         Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
>>>         DatumWriter<GenericRecord> datumWriter = new
>>> GenericDatumWriter<GenericRecord>(schema);
>>>         GenericRecord message1= new GenericData.Record(schema);
>>>         message1.put("to", "Alyssa");
>>>         datumWriter.write(message1, e);
>>>         e.flush();
>>>         outputStream.close();
>>> 
>>> But the output file contains only serialized data and not schema. How can I
>>> add schema also ?
>>> 
>>> Thanks
>>> Pratyush Chandra
> 
> 
> 
> -- 
> Pratyush Chandra



Re: Embedding schema with binary encoding

Posted by Pratyush Chandra <ch...@gmail.com>.
Hi Scott,

I am able to find example for json encoding with DataFileWriter which
embedds schema, but unable to find DataFileWriter example for binary
encoding with schema.

Thanks
Pratyush

On Tue, Jan 8, 2013 at 2:56 PM, Scott Carey <sc...@apache.org> wrote:

> Calling toJson() on a Schema will print it in json fom.  However you most
> likely do not want to invent your own file format for Avro data.
>
> DataFileWriter which will manage the schema for you, along with
> compression, metadata, and the ability to seek to the middle of the file.
>  Additionally it is then readable by several other languages and tools.
>
> On 1/7/13 4:42 AM, "Pratyush Chandra" <ch...@gmail.com> wrote:
>
> I am able to serialize with binary encoding to a file using following :
>         FileOutputStream outputStream = new FileOutputStream(file);
>         Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
>         DatumWriter<GenericRecord> datumWriter = new
> GenericDatumWriter<GenericRecord>(schema);
>         GenericRecord message1= new GenericData.Record(schema);
>         message1.put("to", "Alyssa");
>         datumWriter.write(message1, e);
>         e.flush();
>         outputStream.close();
>
> But the output file contains only serialized data and not schema. How can
> I add schema also ?
>
> Thanks
> Pratyush Chandra
>
>


-- 
Pratyush Chandra

Re: Embedding schema with binary encoding

Posted by Scott Carey <sc...@apache.org>.
Calling toJson() on a Schema will print it in json fom.  However you most
likely do not want to invent your own file format for Avro data.

DataFileWriter which will manage the schema for you, along with compression,
metadata, and the ability to seek to the middle of the file.    Additionally
it is then readable by several other languages and tools.

On 1/7/13 4:42 AM, "Pratyush Chandra" <ch...@gmail.com> wrote:

> I am able to serialize with binary encoding to a file using following :
>         FileOutputStream outputStream = new FileOutputStream(file);
>         Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
>         DatumWriter<GenericRecord> datumWriter = new
> GenericDatumWriter<GenericRecord>(schema);
>         GenericRecord message1= new GenericData.Record(schema);
>         message1.put("to", "Alyssa");
>         datumWriter.write(message1, e);
>         e.flush();
>         outputStream.close();
> 
> But the output file contains only serialized data and not schema. How can I
> add schema also ?
> 
> Thanks
> Pratyush Chandra