You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "James A. Robinson" <ji...@stanford.edu> on 2012/08/03 02:39:13 UTC

compression ratios?

Hi folks,

We've got a system where we're pushing small XML documents, produced
as part of an event stream, through kafka to another service.  Each of
these messages tends to be only around 600 to 900 bytes in length.

I was wondering if any of you had statistics on the average
compression ratio for a given message format you use, when the
publisher is configured to compress kafka messages using gzip?

I'm expecting that the compression ratio won't be very high if Kafka
is compressing each individual message (versus compressing entire
message sets). In our test we were seeing a compression ratio of
perhaps 25%, and I think that's about what I'd expect for per-message
compression.

Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       jim.robinson@stanford.edu
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)

Re: compression ratios?

Posted by "James A. Robinson" <ji...@stanford.edu>.
>> On Thu, Aug 2, 2012 at 5:39 PM, James A. Robinson
>> <jim.robinson@stanford.edu wrote:
>>
>> ...
>> message sets). In our test we were seeing a compression ratio of
>> perhaps 25%, and I think that's about what I'd expect for
>> per-message

On Thu, Aug 2, 2012 at 6:46 PM, Jun Rao <ju...@gmail.com> wrote:
>
> The compression ratio is going to be very data dependent. If you can
> compress to 1/4 of the original size, that's pretty good. At
> LinkedIn, we compress messages up to 200 using gzip. The compressed
> data is about 1/3 of the original data.

Thanks very much.  I didn't phrase my original email very well.  What
I was trying to indicate was that the size of the broker log was about
1/4 smaller than the size of the original input data.

Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       jim.robinson@stanford.edu
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)

Re: compression ratios?

Posted by Jun Rao <ju...@gmail.com>.
Jim,

The compression ratio is going to be very data dependent. If you can
compress to 1/4 of the original size, that's pretty good. At LinkedIn, we
compress messages up to 200 using gzip. The compressed data is about 1/3 of
the original data.

Thanks,

Jun

On Thu, Aug 2, 2012 at 5:39 PM, James A. Robinson <jim.robinson@stanford.edu
> wrote:

> Hi folks,
>
> We've got a system where we're pushing small XML documents, produced
> as part of an event stream, through kafka to another service.  Each of
> these messages tends to be only around 600 to 900 bytes in length.
>
> I was wondering if any of you had statistics on the average
> compression ratio for a given message format you use, when the
> publisher is configured to compress kafka messages using gzip?
>
> I'm expecting that the compression ratio won't be very high if Kafka
> is compressing each individual message (versus compressing entire
> message sets). In our test we were seeing a compression ratio of
> perhaps 25%, and I think that's about what I'd expect for per-message
> compression.
>
> Jim
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> James A. Robinson                       jim.robinson@stanford.edu
> Stanford University HighWire Press      http://highwire.stanford.edu/
> +1 650 7237294 (Work)                   +1 650 7259335 (Fax)
>