You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Shantanu Deshmukh <sh...@gmail.com> on 2021/05/12 05:39:35 UTC

Kafka 2.0.0 - How to verify if Kafka compression is working

I am trying snappy compression on my producer. Here's my setup

Kafka - 2.0.0
Spring-Kafka - 2.1.2

Here's my producer config

compressed producer ==========

configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
            bootstrapServer);
    configProps.put(
            ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);
    configProps.put(
            ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);
    configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
    configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);

config of un-compressed producer ============

configProps.put(
            ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
            bootstrapServer);
    configProps.put(
            ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);
    configProps.put(
            ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);

My payload is almost 1mb worth of string. After sending 1000 compressed and
1000 uncompressed such messages this is the result
=======================
[shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
/data/compressed-string-test-0/*
8.0K /data/compressed-string-test-0/00000000000000000000.index
990M /data/compressed-string-test-0/00000000000000000000.log
12K /data/compressed-string-test-0/00000000000000000000.timeindex
4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
990M total

[shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
/data/uncompressed-string-test-0/*
8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
992M    /data/uncompressed-string-test-0/00000000000000000000.log
12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
992M    total
=======================

Here we can see the difference is merely 2MB. Is compression even working?
I used dump-log-segment tool
=======================
[shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
kafka.tools.DumpLogSegments --files
/data/compressed-string-test-0/00000000000000000000.log --print-data-log |
head | grep compresscodec

offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
-1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
payload: klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
=======================

I can see SNAPPY is mentioned as compression codec. But the difference
between compressed and uncompressed disk size is negligible.

I tried gzip later on. And results are
=======================
[shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
/data/compressed-string-test-0/*
8.0K /data/compressed-string-test-0/00000000000000000000.index
640M /data/compressed-string-test-0/00000000000000000000.log
12K /data/compressed-string-test-0/00000000000000000000.timeindex
4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
640M total
=======================

So gzip seems to have worked somehow. I tried lz4 compression as well.
Results were same as that of snappy.

Is snappy/lz4 compression really working here? Gzip seems to be working but
I have read a lot that snappy gives best CPU usage to compression ratio
balance. So we want to go ahead with snappy.

Please help

*Thanks & Regards,*
*Shantanu*

Re: Kafka 2.0.0 - How to verify if Kafka compression is working

Posted by Scott Carey <sc...@gmail.com>.
If it is not compressed, you should be able to see the raw strings in the
partition data file with a text reader, such as 'less'(mixed with binary
content.

On Wed, May 12, 2021, 23:58 Shantanu Deshmukh <sh...@gmail.com> wrote:

> Hi Scott,
> Thanks for responding.
>
> As part of my test. I have also run my own program wherein I used both
> random strings and json messages.
> I was aware that snappy may not perform good on random strings. So I used a
> 10kb json message and sent 10,000 messages of that same json.
> So with this test payload the entire message is same, repeating over and
> over again.
>
> At least with that it could have compressed. But to no avail!
>
> I also applied batch.size=10000 and linger.ms=10 still it didnt work.
>
> On Thu, May 13, 2021 at 11:32 AM Scott Carey <sc...@gmail.com>
> wrote:
>
> > Snappy and lz4 do not have entropy encoders (
> > https://en.wikipedia.org/wiki/Entropy_encoding).  If your data is random
> > text they will not compress.  If your text is a string of all zeros or
> any
> > repeating pattern, it will compress significantly.  If its something like
> > JSON, or XML it will compress.
> >
> > I suspect you either aren't using real world data, or haven't compared
> the
> > compression with different types of data (json?  web pages?   numbers?).
> > No compression test is of much use unless you specify _what_ you are
> trying
> > to compress and either construct a realistic corpus for your use case, or
> > test with a few well defined types of real data that might be similar to
> > your expected use case.
> >
> >   Gzip and zstandard have entropy encoding (Huffman for gzip, and a
> > combination of Huffman and ANS for zstandard).  With these, even if your
> > text is purely random _text_, it will compress somewhat since text
> doesn't
> > use all 256 possible byte values and so it can use less than 8 bits per
> > character in the encoding.
> >
> >
> >
> > On Wed, May 12, 2021, 22:35 Shantanu Deshmukh <sh...@gmail.com>
> > wrote:
> >
> > > I have some updates on this.
> > > I tried this on latest kafka 2.8. Ran my application. Results are same,
> > > snappy and lz4 dont seem to be working as uncompressed and compressed
> > > storage both measure the same.
> > >
> > > *I even tried kafka-producer-perf-test tool*. Below are the results
> > >
> > > Without any compression:
> > > ==========================>>
> > > sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput
> > 10000
> > > --record-size 102400 --topic perf-test-uncompressed --producer-props
> > > *compression.type=none* bootstrap.servers=localhost:9092
> --print-metrics
> > >
> > > 100000 records sent, *862.113558 records/sec (84.19 MB/sec)*, 376.08 ms
> > avg
> > > latency, 1083.00 ms max latency, 371 ms 50th, 610 ms 95th, 778 ms 99th,
> > > 1061 ms 99.9th.
> > > ...
> > > producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> > > topic=perf-test-uncompressed}   : *1.000*
> > >
> > > With snappy compression:
> > > ==========================>>
> > > sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput
> > 10000
> > > --record-size 102400 --topic perf-test-uncompressed --producer-props
> > > *compression.type=snappy
> > > batch.size=100000 linger.ms <http://linger.ms>=5
> > > *bootstrap.servers=localhost:9092
> > > --print-metrics
> > >
> > > 100000 records sent, 599.905215 *records/sec (58.58 MB/sec)*, 540.79 ms
> > avg
> > > latency, 1395.00 ms max latency, 521 ms 50th, 816 ms 95th, 1016 ms
> 99th,
> > > 1171 ms 99.9th.
> > > ...
> > > producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> > > topic=perf-test-uncompressed}   : *1.001*
> > >
> > > <<======++++===============
> > > Above mentioned compression-rate didnt change even with
> > >
> > > With  Gzip compression
> > > *==========================>>*
> > > sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput
> > 10000
> > > --record-size 102400 --topic perf-test-compressed --producer-props
> > > *compression.type=gzip* bootstrap.servers=localhost:9092
> > *batch.size=100000
> > > linger.ms <http://linger.ms>=5* --print-metrics
> > >
> > > 100000 records sent, *200.760078 records/sec (19.61 MB/sec)*, 1531.40
> ms
> > > avg latency, 2744.00 ms max latency, 1514 ms 50th, 1897 ms 95th, 2123
> ms
> > > 99th, 2610 ms 99.9th.
> > > ...
> > > producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> > > topic=perf-test-compressed}   : *0.635*
> > >
> > > *<<============================*
> > >
> > > To summarise*:*
> > > compression type
> > > messages sent
> > > avg latency/throughput
> > > effective compression-rate
> > > none
> > > 100000
> > > 862.113558 records/sec (84.19 MB/sec)
> > > 1.000
> > > snappy
> > > 100000
> > > 599.905215 records/sec (58.58 MB/sec),
> > > 1.001
> > > gzip
> > > 100000
> > > 200.760078 records/sec (19.61 MB/sec)
> > > 0.635
> > >
> > > In short snappy = uncompressed !! Why is this happening?
> > >
> > > On Wed, May 12, 2021 at 11:40 AM Shantanu Deshmukh <
> > shantanu88d@gmail.com>
> > > wrote:
> > >
> > > > Hey Nitin,
> > > >
> > > > I have already done that. I used dump-log-segments option. And I can
> > see
> > > > the codec used is snappy/gzip/lz4. My question is, only gzip is
> giving
> > me
> > > > compression. Rest are equivalent to uncompressed storage,
> > > >
> > > > On Wed, May 12, 2021 at 11:16 AM nitin agarwal <
> nitingarg456@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> You can read the data from the disk and see compression type.
> > > >> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
> > > >>
> > > >> Thanks,
> > > >> Nitin
> > > >>
> > > >> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <
> > > shantanu88d@gmail.com
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > I am trying snappy compression on my producer. Here's my setup
> > > >> >
> > > >> > Kafka - 2.0.0
> > > >> > Spring-Kafka - 2.1.2
> > > >> >
> > > >> > Here's my producer config
> > > >> >
> > > >> > compressed producer ==========
> > > >> >
> > > >> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> > > >> >             bootstrapServer);
> > > >> >     configProps.put(
> > > >> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> > > >> >             StringSerializer.class);
> > > >> >     configProps.put(
> > > >> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> > > >> >             StringSerializer.class);
> > > >> >     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG,
> > "snappy");
> > > >> >     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
> > > >> >
> > > >> > config of un-compressed producer ============
> > > >> >
> > > >> > configProps.put(
> > > >> >             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> > > >> >             bootstrapServer);
> > > >> >     configProps.put(
> > > >> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> > > >> >             StringSerializer.class);
> > > >> >     configProps.put(
> > > >> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> > > >> >             StringSerializer.class);
> > > >> >
> > > >> > My payload is almost 1mb worth of string. After sending 1000
> > > compressed
> > > >> and
> > > >> > 1000 uncompressed such messages this is the result
> > > >> > =======================
> > > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > > >> > /data/compressed-string-test-0/*
> > > >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > > >> > 990M /data/compressed-string-test-0/00000000000000000000.log
> > > >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > > >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > > >> > 990M total
> > > >> >
> > > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> > > >> > /data/uncompressed-string-test-0/*
> > > >> > 8.0K
> /data/uncompressed-string-test-0/00000000000000000000.index
> > > >> > 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> > > >> > 12K
> /data/uncompressed-string-test-0/00000000000000000000.timeindex
> > > >> > 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> > > >> > 992M    total
> > > >> > =======================
> > > >> >
> > > >> > Here we can see the difference is merely 2MB. Is compression even
> > > >> working?
> > > >> > I used dump-log-segment tool
> > > >> > =======================
> > > >> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh
> bin/kafka-run-class.sh
> > > >> > kafka.tools.DumpLogSegments --files
> > > >> > /data/compressed-string-test-0/00000000000000000000.log
> > > >> --print-data-log |
> > > >> > head | grep compresscodec
> > > >> >
> > > >> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true
> > keysize:
> > > >> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId:
> -1
> > > >> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys:
> []
> > > >> > payload:
> > > >> >
> > > >>
> > >
> >
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> > > >> > =======================
> > > >> >
> > > >> > I can see SNAPPY is mentioned as compression codec. But the
> > difference
> > > >> > between compressed and uncompressed disk size is negligible.
> > > >> >
> > > >> > I tried gzip later on. And results are
> > > >> > =======================
> > > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > > >> > /data/compressed-string-test-0/*
> > > >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > > >> > 640M /data/compressed-string-test-0/00000000000000000000.log
> > > >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > > >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > > >> > 640M total
> > > >> > =======================
> > > >> >
> > > >> > So gzip seems to have worked somehow. I tried lz4 compression as
> > well.
> > > >> > Results were same as that of snappy.
> > > >> >
> > > >> > Is snappy/lz4 compression really working here? Gzip seems to be
> > > working
> > > >> but
> > > >> > I have read a lot that snappy gives best CPU usage to compression
> > > ratio
> > > >> > balance. So we want to go ahead with snappy.
> > > >> >
> > > >> > Please help
> > > >> >
> > > >> > *Thanks & Regards,*
> > > >> > *Shantanu*
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Kafka 2.0.0 - How to verify if Kafka compression is working

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Hi Scott,
Thanks for responding.

As part of my test. I have also run my own program wherein I used both
random strings and json messages.
I was aware that snappy may not perform good on random strings. So I used a
10kb json message and sent 10,000 messages of that same json.
So with this test payload the entire message is same, repeating over and
over again.

At least with that it could have compressed. But to no avail!

I also applied batch.size=10000 and linger.ms=10 still it didnt work.

On Thu, May 13, 2021 at 11:32 AM Scott Carey <sc...@gmail.com> wrote:

> Snappy and lz4 do not have entropy encoders (
> https://en.wikipedia.org/wiki/Entropy_encoding).  If your data is random
> text they will not compress.  If your text is a string of all zeros or any
> repeating pattern, it will compress significantly.  If its something like
> JSON, or XML it will compress.
>
> I suspect you either aren't using real world data, or haven't compared the
> compression with different types of data (json?  web pages?   numbers?).
> No compression test is of much use unless you specify _what_ you are trying
> to compress and either construct a realistic corpus for your use case, or
> test with a few well defined types of real data that might be similar to
> your expected use case.
>
>   Gzip and zstandard have entropy encoding (Huffman for gzip, and a
> combination of Huffman and ANS for zstandard).  With these, even if your
> text is purely random _text_, it will compress somewhat since text doesn't
> use all 256 possible byte values and so it can use less than 8 bits per
> character in the encoding.
>
>
>
> On Wed, May 12, 2021, 22:35 Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > I have some updates on this.
> > I tried this on latest kafka 2.8. Ran my application. Results are same,
> > snappy and lz4 dont seem to be working as uncompressed and compressed
> > storage both measure the same.
> >
> > *I even tried kafka-producer-perf-test tool*. Below are the results
> >
> > Without any compression:
> > ==========================>>
> > sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput
> 10000
> > --record-size 102400 --topic perf-test-uncompressed --producer-props
> > *compression.type=none* bootstrap.servers=localhost:9092 --print-metrics
> >
> > 100000 records sent, *862.113558 records/sec (84.19 MB/sec)*, 376.08 ms
> avg
> > latency, 1083.00 ms max latency, 371 ms 50th, 610 ms 95th, 778 ms 99th,
> > 1061 ms 99.9th.
> > ...
> > producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> > topic=perf-test-uncompressed}   : *1.000*
> >
> > With snappy compression:
> > ==========================>>
> > sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput
> 10000
> > --record-size 102400 --topic perf-test-uncompressed --producer-props
> > *compression.type=snappy
> > batch.size=100000 linger.ms <http://linger.ms>=5
> > *bootstrap.servers=localhost:9092
> > --print-metrics
> >
> > 100000 records sent, 599.905215 *records/sec (58.58 MB/sec)*, 540.79 ms
> avg
> > latency, 1395.00 ms max latency, 521 ms 50th, 816 ms 95th, 1016 ms 99th,
> > 1171 ms 99.9th.
> > ...
> > producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> > topic=perf-test-uncompressed}   : *1.001*
> >
> > <<======++++===============
> > Above mentioned compression-rate didnt change even with
> >
> > With  Gzip compression
> > *==========================>>*
> > sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput
> 10000
> > --record-size 102400 --topic perf-test-compressed --producer-props
> > *compression.type=gzip* bootstrap.servers=localhost:9092
> *batch.size=100000
> > linger.ms <http://linger.ms>=5* --print-metrics
> >
> > 100000 records sent, *200.760078 records/sec (19.61 MB/sec)*, 1531.40 ms
> > avg latency, 2744.00 ms max latency, 1514 ms 50th, 1897 ms 95th, 2123 ms
> > 99th, 2610 ms 99.9th.
> > ...
> > producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> > topic=perf-test-compressed}   : *0.635*
> >
> > *<<============================*
> >
> > To summarise*:*
> > compression type
> > messages sent
> > avg latency/throughput
> > effective compression-rate
> > none
> > 100000
> > 862.113558 records/sec (84.19 MB/sec)
> > 1.000
> > snappy
> > 100000
> > 599.905215 records/sec (58.58 MB/sec),
> > 1.001
> > gzip
> > 100000
> > 200.760078 records/sec (19.61 MB/sec)
> > 0.635
> >
> > In short snappy = uncompressed !! Why is this happening?
> >
> > On Wed, May 12, 2021 at 11:40 AM Shantanu Deshmukh <
> shantanu88d@gmail.com>
> > wrote:
> >
> > > Hey Nitin,
> > >
> > > I have already done that. I used dump-log-segments option. And I can
> see
> > > the codec used is snappy/gzip/lz4. My question is, only gzip is giving
> me
> > > compression. Rest are equivalent to uncompressed storage,
> > >
> > > On Wed, May 12, 2021 at 11:16 AM nitin agarwal <nitingarg456@gmail.com
> >
> > > wrote:
> > >
> > >> You can read the data from the disk and see compression type.
> > >> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
> > >>
> > >> Thanks,
> > >> Nitin
> > >>
> > >> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <
> > shantanu88d@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > I am trying snappy compression on my producer. Here's my setup
> > >> >
> > >> > Kafka - 2.0.0
> > >> > Spring-Kafka - 2.1.2
> > >> >
> > >> > Here's my producer config
> > >> >
> > >> > compressed producer ==========
> > >> >
> > >> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> > >> >             bootstrapServer);
> > >> >     configProps.put(
> > >> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> > >> >             StringSerializer.class);
> > >> >     configProps.put(
> > >> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> > >> >             StringSerializer.class);
> > >> >     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG,
> "snappy");
> > >> >     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
> > >> >
> > >> > config of un-compressed producer ============
> > >> >
> > >> > configProps.put(
> > >> >             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> > >> >             bootstrapServer);
> > >> >     configProps.put(
> > >> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> > >> >             StringSerializer.class);
> > >> >     configProps.put(
> > >> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> > >> >             StringSerializer.class);
> > >> >
> > >> > My payload is almost 1mb worth of string. After sending 1000
> > compressed
> > >> and
> > >> > 1000 uncompressed such messages this is the result
> > >> > =======================
> > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > >> > /data/compressed-string-test-0/*
> > >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > >> > 990M /data/compressed-string-test-0/00000000000000000000.log
> > >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > >> > 990M total
> > >> >
> > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> > >> > /data/uncompressed-string-test-0/*
> > >> > 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
> > >> > 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> > >> > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
> > >> > 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> > >> > 992M    total
> > >> > =======================
> > >> >
> > >> > Here we can see the difference is merely 2MB. Is compression even
> > >> working?
> > >> > I used dump-log-segment tool
> > >> > =======================
> > >> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
> > >> > kafka.tools.DumpLogSegments --files
> > >> > /data/compressed-string-test-0/00000000000000000000.log
> > >> --print-data-log |
> > >> > head | grep compresscodec
> > >> >
> > >> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true
> keysize:
> > >> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
> > >> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
> > >> > payload:
> > >> >
> > >>
> >
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> > >> > =======================
> > >> >
> > >> > I can see SNAPPY is mentioned as compression codec. But the
> difference
> > >> > between compressed and uncompressed disk size is negligible.
> > >> >
> > >> > I tried gzip later on. And results are
> > >> > =======================
> > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > >> > /data/compressed-string-test-0/*
> > >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > >> > 640M /data/compressed-string-test-0/00000000000000000000.log
> > >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > >> > 640M total
> > >> > =======================
> > >> >
> > >> > So gzip seems to have worked somehow. I tried lz4 compression as
> well.
> > >> > Results were same as that of snappy.
> > >> >
> > >> > Is snappy/lz4 compression really working here? Gzip seems to be
> > working
> > >> but
> > >> > I have read a lot that snappy gives best CPU usage to compression
> > ratio
> > >> > balance. So we want to go ahead with snappy.
> > >> >
> > >> > Please help
> > >> >
> > >> > *Thanks & Regards,*
> > >> > *Shantanu*
> > >> >
> > >>
> > >
> >
>

Re: Kafka 2.0.0 - How to verify if Kafka compression is working

Posted by Scott Carey <sc...@gmail.com>.
Snappy and lz4 do not have entropy encoders (
https://en.wikipedia.org/wiki/Entropy_encoding).  If your data is random
text they will not compress.  If your text is a string of all zeros or any
repeating pattern, it will compress significantly.  If its something like
JSON, or XML it will compress.

I suspect you either aren't using real world data, or haven't compared the
compression with different types of data (json?  web pages?   numbers?).
No compression test is of much use unless you specify _what_ you are trying
to compress and either construct a realistic corpus for your use case, or
test with a few well defined types of real data that might be similar to
your expected use case.

  Gzip and zstandard have entropy encoding (Huffman for gzip, and a
combination of Huffman and ANS for zstandard).  With these, even if your
text is purely random _text_, it will compress somewhat since text doesn't
use all 256 possible byte values and so it can use less than 8 bits per
character in the encoding.



On Wed, May 12, 2021, 22:35 Shantanu Deshmukh <sh...@gmail.com> wrote:

> I have some updates on this.
> I tried this on latest kafka 2.8. Ran my application. Results are same,
> snappy and lz4 dont seem to be working as uncompressed and compressed
> storage both measure the same.
>
> *I even tried kafka-producer-perf-test tool*. Below are the results
>
> Without any compression:
> ==========================>>
> sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000
> --record-size 102400 --topic perf-test-uncompressed --producer-props
> *compression.type=none* bootstrap.servers=localhost:9092 --print-metrics
>
> 100000 records sent, *862.113558 records/sec (84.19 MB/sec)*, 376.08 ms avg
> latency, 1083.00 ms max latency, 371 ms 50th, 610 ms 95th, 778 ms 99th,
> 1061 ms 99.9th.
> ...
> producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> topic=perf-test-uncompressed}   : *1.000*
>
> With snappy compression:
> ==========================>>
> sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000
> --record-size 102400 --topic perf-test-uncompressed --producer-props
> *compression.type=snappy
> batch.size=100000 linger.ms <http://linger.ms>=5
> *bootstrap.servers=localhost:9092
> --print-metrics
>
> 100000 records sent, 599.905215 *records/sec (58.58 MB/sec)*, 540.79 ms avg
> latency, 1395.00 ms max latency, 521 ms 50th, 816 ms 95th, 1016 ms 99th,
> 1171 ms 99.9th.
> ...
> producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> topic=perf-test-uncompressed}   : *1.001*
>
> <<======++++===============
> Above mentioned compression-rate didnt change even with
>
> With  Gzip compression
> *==========================>>*
> sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000
> --record-size 102400 --topic perf-test-compressed --producer-props
> *compression.type=gzip* bootstrap.servers=localhost:9092 *batch.size=100000
> linger.ms <http://linger.ms>=5* --print-metrics
>
> 100000 records sent, *200.760078 records/sec (19.61 MB/sec)*, 1531.40 ms
> avg latency, 2744.00 ms max latency, 1514 ms 50th, 1897 ms 95th, 2123 ms
> 99th, 2610 ms 99.9th.
> ...
> producer-topic-metrics:*compression-rate*:{client-id=producer-1,
> topic=perf-test-compressed}   : *0.635*
>
> *<<============================*
>
> To summarise*:*
> compression type
> messages sent
> avg latency/throughput
> effective compression-rate
> none
> 100000
> 862.113558 records/sec (84.19 MB/sec)
> 1.000
> snappy
> 100000
> 599.905215 records/sec (58.58 MB/sec),
> 1.001
> gzip
> 100000
> 200.760078 records/sec (19.61 MB/sec)
> 0.635
>
> In short snappy = uncompressed !! Why is this happening?
>
> On Wed, May 12, 2021 at 11:40 AM Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > Hey Nitin,
> >
> > I have already done that. I used dump-log-segments option. And I can see
> > the codec used is snappy/gzip/lz4. My question is, only gzip is giving me
> > compression. Rest are equivalent to uncompressed storage,
> >
> > On Wed, May 12, 2021 at 11:16 AM nitin agarwal <ni...@gmail.com>
> > wrote:
> >
> >> You can read the data from the disk and see compression type.
> >> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
> >>
> >> Thanks,
> >> Nitin
> >>
> >> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <
> shantanu88d@gmail.com
> >> >
> >> wrote:
> >>
> >> > I am trying snappy compression on my producer. Here's my setup
> >> >
> >> > Kafka - 2.0.0
> >> > Spring-Kafka - 2.1.2
> >> >
> >> > Here's my producer config
> >> >
> >> > compressed producer ==========
> >> >
> >> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> >> >             bootstrapServer);
> >> >     configProps.put(
> >> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> >> >             StringSerializer.class);
> >> >     configProps.put(
> >> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> >> >             StringSerializer.class);
> >> >     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
> >> >     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
> >> >
> >> > config of un-compressed producer ============
> >> >
> >> > configProps.put(
> >> >             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> >> >             bootstrapServer);
> >> >     configProps.put(
> >> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> >> >             StringSerializer.class);
> >> >     configProps.put(
> >> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> >> >             StringSerializer.class);
> >> >
> >> > My payload is almost 1mb worth of string. After sending 1000
> compressed
> >> and
> >> > 1000 uncompressed such messages this is the result
> >> > =======================
> >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> >> > /data/compressed-string-test-0/*
> >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> >> > 990M /data/compressed-string-test-0/00000000000000000000.log
> >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> >> > 990M total
> >> >
> >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> >> > /data/uncompressed-string-test-0/*
> >> > 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
> >> > 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> >> > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
> >> > 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> >> > 992M    total
> >> > =======================
> >> >
> >> > Here we can see the difference is merely 2MB. Is compression even
> >> working?
> >> > I used dump-log-segment tool
> >> > =======================
> >> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
> >> > kafka.tools.DumpLogSegments --files
> >> > /data/compressed-string-test-0/00000000000000000000.log
> >> --print-data-log |
> >> > head | grep compresscodec
> >> >
> >> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
> >> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
> >> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
> >> > payload:
> >> >
> >>
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> >> > =======================
> >> >
> >> > I can see SNAPPY is mentioned as compression codec. But the difference
> >> > between compressed and uncompressed disk size is negligible.
> >> >
> >> > I tried gzip later on. And results are
> >> > =======================
> >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> >> > /data/compressed-string-test-0/*
> >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> >> > 640M /data/compressed-string-test-0/00000000000000000000.log
> >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> >> > 640M total
> >> > =======================
> >> >
> >> > So gzip seems to have worked somehow. I tried lz4 compression as well.
> >> > Results were same as that of snappy.
> >> >
> >> > Is snappy/lz4 compression really working here? Gzip seems to be
> working
> >> but
> >> > I have read a lot that snappy gives best CPU usage to compression
> ratio
> >> > balance. So we want to go ahead with snappy.
> >> >
> >> > Please help
> >> >
> >> > *Thanks & Regards,*
> >> > *Shantanu*
> >> >
> >>
> >
>

Re: Kafka 2.0.0 - How to verify if Kafka compression is working

Posted by Shantanu Deshmukh <sh...@gmail.com>.
I have some updates on this.
I tried this on latest kafka 2.8. Ran my application. Results are same,
snappy and lz4 dont seem to be working as uncompressed and compressed
storage both measure the same.

*I even tried kafka-producer-perf-test tool*. Below are the results

Without any compression:
==========================>>
sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000
--record-size 102400 --topic perf-test-uncompressed --producer-props
*compression.type=none* bootstrap.servers=localhost:9092 --print-metrics

100000 records sent, *862.113558 records/sec (84.19 MB/sec)*, 376.08 ms avg
latency, 1083.00 ms max latency, 371 ms 50th, 610 ms 95th, 778 ms 99th,
1061 ms 99.9th.
...
producer-topic-metrics:*compression-rate*:{client-id=producer-1,
topic=perf-test-uncompressed}   : *1.000*

With snappy compression:
==========================>>
sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000
--record-size 102400 --topic perf-test-uncompressed --producer-props
*compression.type=snappy
batch.size=100000 linger.ms <http://linger.ms>=5
*bootstrap.servers=localhost:9092
--print-metrics

100000 records sent, 599.905215 *records/sec (58.58 MB/sec)*, 540.79 ms avg
latency, 1395.00 ms max latency, 521 ms 50th, 816 ms 95th, 1016 ms 99th,
1171 ms 99.9th.
...
producer-topic-metrics:*compression-rate*:{client-id=producer-1,
topic=perf-test-uncompressed}   : *1.001*

<<======++++===============
Above mentioned compression-rate didnt change even with

With  Gzip compression
*==========================>>*
sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000
--record-size 102400 --topic perf-test-compressed --producer-props
*compression.type=gzip* bootstrap.servers=localhost:9092 *batch.size=100000
linger.ms <http://linger.ms>=5* --print-metrics

100000 records sent, *200.760078 records/sec (19.61 MB/sec)*, 1531.40 ms
avg latency, 2744.00 ms max latency, 1514 ms 50th, 1897 ms 95th, 2123 ms
99th, 2610 ms 99.9th.
...
producer-topic-metrics:*compression-rate*:{client-id=producer-1,
topic=perf-test-compressed}   : *0.635*

*<<============================*

To summarise*:*
compression type
messages sent
avg latency/throughput
effective compression-rate
none
100000
862.113558 records/sec (84.19 MB/sec)
1.000
snappy
100000
599.905215 records/sec (58.58 MB/sec),
1.001
gzip
100000
200.760078 records/sec (19.61 MB/sec)
0.635

In short snappy = uncompressed !! Why is this happening?

On Wed, May 12, 2021 at 11:40 AM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> Hey Nitin,
>
> I have already done that. I used dump-log-segments option. And I can see
> the codec used is snappy/gzip/lz4. My question is, only gzip is giving me
> compression. Rest are equivalent to uncompressed storage,
>
> On Wed, May 12, 2021 at 11:16 AM nitin agarwal <ni...@gmail.com>
> wrote:
>
>> You can read the data from the disk and see compression type.
>> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
>>
>> Thanks,
>> Nitin
>>
>> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <shantanu88d@gmail.com
>> >
>> wrote:
>>
>> > I am trying snappy compression on my producer. Here's my setup
>> >
>> > Kafka - 2.0.0
>> > Spring-Kafka - 2.1.2
>> >
>> > Here's my producer config
>> >
>> > compressed producer ==========
>> >
>> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>> >             bootstrapServer);
>> >     configProps.put(
>> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>> >             StringSerializer.class);
>> >     configProps.put(
>> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>> >             StringSerializer.class);
>> >     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>> >     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
>> >
>> > config of un-compressed producer ============
>> >
>> > configProps.put(
>> >             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>> >             bootstrapServer);
>> >     configProps.put(
>> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>> >             StringSerializer.class);
>> >     configProps.put(
>> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>> >             StringSerializer.class);
>> >
>> > My payload is almost 1mb worth of string. After sending 1000 compressed
>> and
>> > 1000 uncompressed such messages this is the result
>> > =======================
>> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
>> > /data/compressed-string-test-0/*
>> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
>> > 990M /data/compressed-string-test-0/00000000000000000000.log
>> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
>> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
>> > 990M total
>> >
>> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
>> > /data/uncompressed-string-test-0/*
>> > 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
>> > 992M    /data/uncompressed-string-test-0/00000000000000000000.log
>> > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
>> > 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
>> > 992M    total
>> > =======================
>> >
>> > Here we can see the difference is merely 2MB. Is compression even
>> working?
>> > I used dump-log-segment tool
>> > =======================
>> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
>> > kafka.tools.DumpLogSegments --files
>> > /data/compressed-string-test-0/00000000000000000000.log
>> --print-data-log |
>> > head | grep compresscodec
>> >
>> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
>> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
>> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
>> > payload:
>> >
>> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
>> > =======================
>> >
>> > I can see SNAPPY is mentioned as compression codec. But the difference
>> > between compressed and uncompressed disk size is negligible.
>> >
>> > I tried gzip later on. And results are
>> > =======================
>> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
>> > /data/compressed-string-test-0/*
>> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
>> > 640M /data/compressed-string-test-0/00000000000000000000.log
>> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
>> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
>> > 640M total
>> > =======================
>> >
>> > So gzip seems to have worked somehow. I tried lz4 compression as well.
>> > Results were same as that of snappy.
>> >
>> > Is snappy/lz4 compression really working here? Gzip seems to be working
>> but
>> > I have read a lot that snappy gives best CPU usage to compression ratio
>> > balance. So we want to go ahead with snappy.
>> >
>> > Please help
>> >
>> > *Thanks & Regards,*
>> > *Shantanu*
>> >
>>
>

Re: Kafka 2.0.0 - How to verify if Kafka compression is working

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Hey Nitin,

I have already done that. I used dump-log-segments option. And I can see
the codec used is snappy/gzip/lz4. My question is, only gzip is giving me
compression. Rest are equivalent to uncompressed storage,

On Wed, May 12, 2021 at 11:16 AM nitin agarwal <ni...@gmail.com>
wrote:

> You can read the data from the disk and see compression type.
> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
>
> Thanks,
> Nitin
>
> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > I am trying snappy compression on my producer. Here's my setup
> >
> > Kafka - 2.0.0
> > Spring-Kafka - 2.1.2
> >
> > Here's my producer config
> >
> > compressed producer ==========
> >
> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> >             bootstrapServer);
> >     configProps.put(
> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >     configProps.put(
> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
> >     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
> >
> > config of un-compressed producer ============
> >
> > configProps.put(
> >             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> >             bootstrapServer);
> >     configProps.put(
> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >     configProps.put(
> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >
> > My payload is almost 1mb worth of string. After sending 1000 compressed
> and
> > 1000 uncompressed such messages this is the result
> > =======================
> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > /data/compressed-string-test-0/*
> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > 990M /data/compressed-string-test-0/00000000000000000000.log
> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > 990M total
> >
> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> > /data/uncompressed-string-test-0/*
> > 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
> > 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
> > 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> > 992M    total
> > =======================
> >
> > Here we can see the difference is merely 2MB. Is compression even
> working?
> > I used dump-log-segment tool
> > =======================
> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
> > kafka.tools.DumpLogSegments --files
> > /data/compressed-string-test-0/00000000000000000000.log --print-data-log
> |
> > head | grep compresscodec
> >
> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
> > payload:
> >
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> > =======================
> >
> > I can see SNAPPY is mentioned as compression codec. But the difference
> > between compressed and uncompressed disk size is negligible.
> >
> > I tried gzip later on. And results are
> > =======================
> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > /data/compressed-string-test-0/*
> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > 640M /data/compressed-string-test-0/00000000000000000000.log
> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > 640M total
> > =======================
> >
> > So gzip seems to have worked somehow. I tried lz4 compression as well.
> > Results were same as that of snappy.
> >
> > Is snappy/lz4 compression really working here? Gzip seems to be working
> but
> > I have read a lot that snappy gives best CPU usage to compression ratio
> > balance. So we want to go ahead with snappy.
> >
> > Please help
> >
> > *Thanks & Regards,*
> > *Shantanu*
> >
>

Re: Kafka 2.0.0 - How to verify if Kafka compression is working

Posted by nitin agarwal <ni...@gmail.com>.
You can read the data from the disk and see compression type.
https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026

Thanks,
Nitin

On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> I am trying snappy compression on my producer. Here's my setup
>
> Kafka - 2.0.0
> Spring-Kafka - 2.1.2
>
> Here's my producer config
>
> compressed producer ==========
>
> configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>             bootstrapServer);
>     configProps.put(
>             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>     configProps.put(
>             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
>
> config of un-compressed producer ============
>
> configProps.put(
>             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>             bootstrapServer);
>     configProps.put(
>             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>     configProps.put(
>             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>
> My payload is almost 1mb worth of string. After sending 1000 compressed and
> 1000 uncompressed such messages this is the result
> =======================
> [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> /data/compressed-string-test-0/*
> 8.0K /data/compressed-string-test-0/00000000000000000000.index
> 990M /data/compressed-string-test-0/00000000000000000000.log
> 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> 990M total
>
> [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> /data/uncompressed-string-test-0/*
> 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
> 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
> 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> 992M    total
> =======================
>
> Here we can see the difference is merely 2MB. Is compression even working?
> I used dump-log-segment tool
> =======================
> [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
> kafka.tools.DumpLogSegments --files
> /data/compressed-string-test-0/00000000000000000000.log --print-data-log |
> head | grep compresscodec
>
> offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
> -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
> producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
> payload:
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> =======================
>
> I can see SNAPPY is mentioned as compression codec. But the difference
> between compressed and uncompressed disk size is negligible.
>
> I tried gzip later on. And results are
> =======================
> [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> /data/compressed-string-test-0/*
> 8.0K /data/compressed-string-test-0/00000000000000000000.index
> 640M /data/compressed-string-test-0/00000000000000000000.log
> 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> 640M total
> =======================
>
> So gzip seems to have worked somehow. I tried lz4 compression as well.
> Results were same as that of snappy.
>
> Is snappy/lz4 compression really working here? Gzip seems to be working but
> I have read a lot that snappy gives best CPU usage to compression ratio
> balance. So we want to go ahead with snappy.
>
> Please help
>
> *Thanks & Regards,*
> *Shantanu*
>