You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Vinod Jammula <vi...@ericsson.com> on 2013/04/09 07:47:30 UTC

Enabling compression

Hi,

I have a a csv string which I want to serialize, compress and write to a 
database.
*
I have the following code to serialize the string*

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
GenericDatumWriter w = new GenericDatumWriter(schema);
w.write(record, e)
byte[] avroBytes = outputStream.toByteArray();


*Following code to de-serialize and process the record.*

DatumReader<GenericRecord> reader = new 
GenericDatumReader<GenericRecord>(schema);

  Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);

GenericRecord record = reader.read(decoder, null);


I find compression with DataFileWriter and DataFileReader. But how to 
enable the compression for avro serialized buffer.

Thanks and Regards,
Vinod

Re: Enabling compression

Posted by Harsh J <ha...@cloudera.com>.
Hi Vinod,

In Avro, compression is provided only at the file container level
(i.e. block compression).

For compressing a simple byte array, you can rely on the Hadoop's
compression classes such as a GzipCodec [1] to compress the byte
stream directly (wrapping via a compressed output stream [2] got by
its helper method [3]).

Something like this, for example (I've not tested it out):

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
GzipCodec codec = ReflectionUtils.newInstance(GzipCodec.class, new
Configuration());
OutputStream compressedOutputStream = codec.createOutputStream(outputStream);
[… Encode over compressedOutputStream, etc. …]

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html
[2] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressorStream.html
[3] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html#createOutputStream(java.io.OutputStream)

On Tue, Apr 9, 2013 at 11:17 AM, Vinod Jammula
<vi...@ericsson.com> wrote:
> Hi,
>
> I have a a csv string which I want to serialize, compress and write to a
> database.
>
> I have the following code to serialize the string
>
> ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
> GenericDatumWriter w = new GenericDatumWriter(schema);
> w.write(record, e)
> byte[] avroBytes = outputStream.toByteArray();
>
>
> Following code to de-serialize and process the record.
>
> DatumReader<GenericRecord> reader = new
> GenericDatumReader<GenericRecord>(schema);
>
>  Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);
>
> GenericRecord record = reader.read(decoder, null);
>
>
> I find compression with DataFileWriter and DataFileReader. But how to enable
> the compression for avro serialized buffer.
>
> Thanks and Regards,
> Vinod



-- 
Harsh J