You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by amit nanda <am...@gmail.com> on 2013/08/19 18:12:00 UTC

Avro file Compression

I am try to compress the avro files that i am writing, for that i am using
the latest Avro C, with "deflate" option, but i am not able to see any
difference in the file size.

Is there any special type to data that this works on, or is there any more
setting that needs to be done for this to work.

Re: Avro file Compression

Posted by Scott Carey <sc...@apache.org>.
The file format compresses in blocks, and the block size is configurable.
This will compress across objects in a block, so it works for small objects
as well as large ones ‹ as long as the total block size is large enough.

I have found that I can increase the ratio of compression by ordering the
objects carefully so that neighbor records have more in common.

From:  Bill Baird <bi...@traxtech.com>
Reply-To:  "user@avro.apache.org" <us...@avro.apache.org>
Date:  Thursday, August 22, 2013 7:47 AM
To:  "user@avro.apache.org" <us...@avro.apache.org>
Subject:  Re: Avro file Compression

As with any compression, how much you get depends on the size and nature of
the data.  I have objects where unserialized they take 4 or 5k, and they
serialize to 1.5 to 3k, or about 2 to 1.  However, for the same object
structure (which contains several nested arrays ... lots of strings, numbers
... basic business data) when uncompressed it 17MB, it deflates to 1MB (or
17 to 1).  For very small objects, deflate will actually produce a larger
output, but it does quite well as the size of the data being deflated grows.

Bill


On Wed, Aug 21, 2013 at 11:31 PM, Harsh J <ha...@cloudera.com> wrote:
> Can you share your test? There is an example at
> http://svn.apache.org/repos/asf/avro/trunk/lang/c/examples/quickstop.c
> which has the right calls for using a file writer with a deflate codec
> - is yours similar?
> 
> On Mon, Aug 19, 2013 at 9:42 PM, amit nanda <am...@gmail.com> wrote:
>> > I am try to compress the avro files that i am writing, for that i am using
>> > the latest Avro C, with "deflate" option, but i am not able to see any
>> > difference in the file size.
>> >
>> > Is there any special type to data that this works on, or is there any more
>> > setting that needs to be done for this to work.
>> >
>> >
> 
> 
> 
> --
> Harsh J




Re: Avro file Compression

Posted by Bill Baird <bi...@traxtech.com>.
As with any compression, how much you get depends on the size and nature of
the data.  I have objects where unserialized they take 4 or 5k, and they
serialize to 1.5 to 3k, or about 2 to 1.  However, for the same object
structure (which contains several nested arrays ... lots of strings,
numbers ... basic business data) when uncompressed it 17MB, it deflates to
1MB (or 17 to 1).  For very small objects, deflate will actually produce a
larger output, but it does quite well as the size of the data being
deflated grows.

Bill


On Wed, Aug 21, 2013 at 11:31 PM, Harsh J <ha...@cloudera.com> wrote:

> Can you share your test? There is an example at
> http://svn.apache.org/repos/asf/avro/trunk/lang/c/examples/quickstop.c
> which has the right calls for using a file writer with a deflate codec
> - is yours similar?
>
> On Mon, Aug 19, 2013 at 9:42 PM, amit nanda <am...@gmail.com> wrote:
> > I am try to compress the avro files that i am writing, for that i am
> using
> > the latest Avro C, with "deflate" option, but i am not able to see any
> > difference in the file size.
> >
> > Is there any special type to data that this works on, or is there any
> more
> > setting that needs to be done for this to work.
> >
> >
>
>
>
> --
> Harsh J
>

Re: Avro file Compression

Posted by Harsh J <ha...@cloudera.com>.
Can you share your test? There is an example at
http://svn.apache.org/repos/asf/avro/trunk/lang/c/examples/quickstop.c
which has the right calls for using a file writer with a deflate codec
- is yours similar?

On Mon, Aug 19, 2013 at 9:42 PM, amit nanda <am...@gmail.com> wrote:
> I am try to compress the avro files that i am writing, for that i am using
> the latest Avro C, with "deflate" option, but i am not able to see any
> difference in the file size.
>
> Is there any special type to data that this works on, or is there any more
> setting that needs to be done for this to work.
>
>



-- 
Harsh J