You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Vinod Jammula <vi...@ericsson.com> on 2013/04/09 07:47:30 UTC
Enabling compression
Hi,
I have a a csv string which I want to serialize, compress and write to a
database.
*
I have the following code to serialize the string*
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
GenericDatumWriter w = new GenericDatumWriter(schema);
w.write(record, e)
byte[] avroBytes = outputStream.toByteArray();
*Following code to de-serialize and process the record.*
DatumReader<GenericRecord> reader = new
GenericDatumReader<GenericRecord>(schema);
Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);
GenericRecord record = reader.read(decoder, null);
I find compression with DataFileWriter and DataFileReader. But how to
enable the compression for avro serialized buffer.
Thanks and Regards,
Vinod
Re: Enabling compression
Posted by Harsh J <ha...@cloudera.com>.
Hi Vinod,
In Avro, compression is provided only at the file container level
(i.e. block compression).
For compressing a simple byte array, you can rely on the Hadoop's
compression classes such as a GzipCodec [1] to compress the byte
stream directly (wrapping via a compressed output stream [2] got by
its helper method [3]).
Something like this, for example (I've not tested it out):
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
GzipCodec codec = ReflectionUtils.newInstance(GzipCodec.class, new
Configuration());
OutputStream compressedOutputStream = codec.createOutputStream(outputStream);
[… Encode over compressedOutputStream, etc. …]
[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html
[2] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressorStream.html
[3] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html#createOutputStream(java.io.OutputStream)
On Tue, Apr 9, 2013 at 11:17 AM, Vinod Jammula
<vi...@ericsson.com> wrote:
> Hi,
>
> I have a a csv string which I want to serialize, compress and write to a
> database.
>
> I have the following code to serialize the string
>
> ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
> GenericDatumWriter w = new GenericDatumWriter(schema);
> w.write(record, e)
> byte[] avroBytes = outputStream.toByteArray();
>
>
> Following code to de-serialize and process the record.
>
> DatumReader<GenericRecord> reader = new
> GenericDatumReader<GenericRecord>(schema);
>
> Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);
>
> GenericRecord record = reader.read(decoder, null);
>
>
> I find compression with DataFileWriter and DataFileReader. But how to enable
> the compression for avro serialized buffer.
>
> Thanks and Regards,
> Vinod
--
Harsh J