You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Frank Zhou <fr...@itiviti.com> on 2019/12/12 11:27:37 UTC

Message batch & compression doesn't work as expected

Hi,

I am testing kafka client on message batch and compression. I have enabled
message batching along with compression, with batch.size set to 3M,
linger.ms set to 5000ms and compression.type set to gzip(Attached whole
producer config). Then testing with wireshark to check the details.
First issue our team notice is that compression codec seems to have some
issue. Since we set it as gzip, but we notice in wireshark, it will display
as other compression codec, like Snappy in attached screenshot(Not sure if
this is wireshark's issue, or a real issue on Kafka, but the whole packet
details display seems fine in wireshark).
Second issue is that we have set the latency and batch number so high, but
it still send the Produce request to server much more frequently than we
expected. Size per message that is sending before batch & compression
should be around 200 bytes, and during testing, all the message generated
by us should be around 200KB, so we are expecting much less packets
transferred than this(screenshot only shows small amount of them, total
number is 1472).
[image: 2019-12-12_19h00_15.png]
 Is it we miss some config or the config is not correct leading to this?


-- 
*Frank Zhou*
R&D, Itiviti
Java Developer
D +852 2521 7480
frank.zhou@itiviti.com

______________________________

itiviti.com <https://www.itiviti.com/>

*The information contained in or attached to this email is strictly
confidential. If you are not the intended recipient, please notify us
immediately by telephone and return the message to us.*

*Email communications by definition contain personal information. The
Itiviti group of companies is subject to European data protection
regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
<http://www.itiviti.com/>. Itiviti expects the recipient of this email to
be compliant with Itiviti’s Privacy Notice and applicable regulations.
Please advise us immediately at dataprotectionteam@Itiviti.com if you are
not compliant with these.*

-- 
______________________________

itiviti.com <https://www.itiviti.com/>
 
<https://www.linkedin.com/company/itiviti> Follow Itiviti on Linkedin 
<https://www.linkedin.com/company/10438325/>




The information contained 
in or attached to this email is strictly confidential. If you are not the 
intended recipient, please notify us immediately by telephone and return 
the message to us. Email communications by definition contain personal 
information. The Itiviti group of companies is subject to European data 
protection regulations.



Itiviti’s Privacy Notice is available at 
www.itiviti.com <http://www.itiviti.com/>. Itiviti expects the recipient of 
this email to be compliant with Itiviti’s Privacy Notice and applicable 
regulations. Please advise us immediately at dataprotectionteam@Itiviti.com 
if you are not compliant with these.

Re: Message batch & compression doesn't work as expected

Posted by Frank Zhou <fr...@itiviti.com>.
Hi,

I think the root cause of this is transaction. The producer we are using
has enabled transaction, and batching then is not working as expected. Once
we turn it off, the situation is back to normal, I can see batching is
performed, and everything looks fine. Is transaction not working well with
batching?

On Fri, Dec 13, 2019 at 10:09 AM Frank Zhou <fr...@itiviti.com> wrote:

> Hi,
>
> Definitely will check this out, thanks. We just started tuning recently,
> and we are quite new to kafka world. The problem we are facing is with
> batch.size and linger.ms both set, the patch sent out doesn't seem to
> meet either one condition. We tried with more "reasonable" values as well,
> it just doesn't seem to work, that's why we tested with more extreme case.
> And we are using 2.2.2 kafka server with 2.3.1 kafka client, not sure if
> there will be any compatibility issue in between. Also we are using
> transactions as same message will be pushed to multiple topics in our case.
>
> On Thu, Dec 12, 2019 at 9:10 PM M. Manna <ma...@gmail.com> wrote:
>
>> Frank,
>>
>> On Thu, 12 Dec 2019 at 11:28, Frank Zhou <fr...@itiviti.com> wrote:
>>
>>> Hi,
>>>
>>> I am testing kafka client on message batch and compression. I have
>>> enabled message batching along with compression, with batch.size set to 3M,
>>> linger.ms set to 5000ms and compression.type set to gzip(Attached whole
>>> producer config). Then testing with wireshark to check the details.
>>> First issue our team notice is that compression codec seems to have some
>>> issue. Since we set it as gzip, but we notice in wireshark, it will display
>>> as other compression codec, like Snappy in attached screenshot(Not sure if
>>> this is wireshark's issue, or a real issue on Kafka, but the whole packet
>>> details display seems fine in wireshark).
>>> Second issue is that we have set the latency and batch number so high,
>>> but it still send the Produce request to server much more frequently than
>>> we expected. Size per message that is sending before batch & compression
>>> should be around 200 bytes, and during testing, all the message generated
>>> by us should be around 200KB, so we are expecting much less packets
>>> transferred than this(screenshot only shows small amount of them, total
>>> number is 1472).
>>> [image: 2019-12-12_19h00_15.png]
>>>  Is it we miss some config or the config is not correct leading to this?
>>>
>>>
>>>
>> I recently tuned our GCP based test cluster using batch of 800K, no
>> compression, and no linger.ms. We got the desired consistency and
>> desired throughput. But we used 2.3.0 version, and I don't suppose it
>> matters much at that point.
>>
>> https://www.youtube.com/watch?v=oQe7PpDDdzA
>>
>> The above shows a very good and detailed analysis done by Becket
>> regarding throughput calculation and estimation. Have you checked this to
>> see what matches your scenario?
>> The sending is impacted by linger.ms too, not just batch.size. So tuning
>> them together is a bit tricky. Perhaps you want to see which one you need
>> more.
>>
>>
>>
>>
>>> --
>>> *Frank Zhou*
>>> R&D, Itiviti
>>> Java Developer
>>> D +852 2521 7480
>>> frank.zhou@itiviti.com
>>>
>>> ______________________________
>>>
>>> itiviti.com <https://www.itiviti.com/>
>>>
>>> *The information contained in or attached to this email is strictly
>>> confidential. If you are not the intended recipient, please notify us
>>> immediately by telephone and return the message to us.*
>>>
>>> *Email communications by definition contain personal information. The
>>> Itiviti group of companies is subject to European data protection
>>> regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
>>> <http://www.itiviti.com/>. Itiviti expects the recipient of this email to
>>> be compliant with Itiviti’s Privacy Notice and applicable regulations.
>>> Please advise us immediately at dataprotectionteam@Itiviti.com if you are
>>> not compliant with these.*
>>>
>>> ______________________________
>>>
>>> itiviti.com <https://www.itiviti.com/>
>>> <https://www.linkedin.com/company/itiviti> Follow Itiviti on Linkedin
>>> <https://www.linkedin.com/company/10438325/>
>>>
>>> The information contained in or attached to this email is strictly
>>> confidential. If you are not the intended recipient, please notify us
>>> immediately by telephone and return the message to us. Email communications
>>> by definition contain personal information. The Itiviti group of companies
>>> is subject to European data protection regulations.
>>>
>>> Itiviti’s Privacy Notice is available at www.itiviti.com. Itiviti
>>> expects the recipient of this email to be compliant with Itiviti’s Privacy
>>> Notice and applicable regulations. Please advise us immediately at
>>> dataprotectionteam@Itiviti.com if you are not compliant with these.
>>>
>>
>
> --
> *Frank Zhou*
> R&D, Itiviti
> Java Developer
> D +852 2521 7480
> frank.zhou@itiviti.com
>
> ______________________________
>
> itiviti.com <https://www.itiviti.com/>
>
> *The information contained in or attached to this email is strictly
> confidential. If you are not the intended recipient, please notify us
> immediately by telephone and return the message to us.*
>
> *Email communications by definition contain personal information. The
> Itiviti group of companies is subject to European data protection
> regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
> <http://www.itiviti.com/>. Itiviti expects the recipient of this email to
> be compliant with Itiviti’s Privacy Notice and applicable regulations.
> Please advise us immediately at dataprotectionteam@Itiviti.com if you are
> not compliant with these.*
>


-- 
*Frank Zhou*
R&D, Itiviti
Java Developer
D +852 2521 7480
frank.zhou@itiviti.com

______________________________

itiviti.com <https://www.itiviti.com/>

*The information contained in or attached to this email is strictly
confidential. If you are not the intended recipient, please notify us
immediately by telephone and return the message to us.*

*Email communications by definition contain personal information. The
Itiviti group of companies is subject to European data protection
regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
<http://www.itiviti.com/>. Itiviti expects the recipient of this email to
be compliant with Itiviti’s Privacy Notice and applicable regulations.
Please advise us immediately at dataprotectionteam@Itiviti.com if you are
not compliant with these.*

-- 
______________________________

itiviti.com <https://www.itiviti.com/>
 
<https://www.linkedin.com/company/itiviti> Follow Itiviti on Linkedin 
<https://www.linkedin.com/company/10438325/>




The information contained 
in or attached to this email is strictly confidential. If you are not the 
intended recipient, please notify us immediately by telephone and return 
the message to us. Email communications by definition contain personal 
information. The Itiviti group of companies is subject to European data 
protection regulations.



Itiviti’s Privacy Notice is available at 
www.itiviti.com <http://www.itiviti.com/>. Itiviti expects the recipient of 
this email to be compliant with Itiviti’s Privacy Notice and applicable 
regulations. Please advise us immediately at dataprotectionteam@Itiviti.com 
if you are not compliant with these.

Re: Message batch & compression doesn't work as expected

Posted by Frank Zhou <fr...@itiviti.com>.
Hi,

Definitely will check this out, thanks. We just started tuning recently,
and we are quite new to kafka world. The problem we are facing is with
batch.size and linger.ms both set, the patch sent out doesn't seem to meet
either one condition. We tried with more "reasonable" values as well, it
just doesn't seem to work, that's why we tested with more extreme case. And
we are using 2.2.2 kafka server with 2.3.1 kafka client, not sure if there
will be any compatibility issue in between. Also we are using transactions
as same message will be pushed to multiple topics in our case.

On Thu, Dec 12, 2019 at 9:10 PM M. Manna <ma...@gmail.com> wrote:

> Frank,
>
> On Thu, 12 Dec 2019 at 11:28, Frank Zhou <fr...@itiviti.com> wrote:
>
>> Hi,
>>
>> I am testing kafka client on message batch and compression. I have
>> enabled message batching along with compression, with batch.size set to 3M,
>> linger.ms set to 5000ms and compression.type set to gzip(Attached whole
>> producer config). Then testing with wireshark to check the details.
>> First issue our team notice is that compression codec seems to have some
>> issue. Since we set it as gzip, but we notice in wireshark, it will display
>> as other compression codec, like Snappy in attached screenshot(Not sure if
>> this is wireshark's issue, or a real issue on Kafka, but the whole packet
>> details display seems fine in wireshark).
>> Second issue is that we have set the latency and batch number so high,
>> but it still send the Produce request to server much more frequently than
>> we expected. Size per message that is sending before batch & compression
>> should be around 200 bytes, and during testing, all the message generated
>> by us should be around 200KB, so we are expecting much less packets
>> transferred than this(screenshot only shows small amount of them, total
>> number is 1472).
>> [image: 2019-12-12_19h00_15.png]
>>  Is it we miss some config or the config is not correct leading to this?
>>
>>
>>
> I recently tuned our GCP based test cluster using batch of 800K, no
> compression, and no linger.ms. We got the desired consistency and desired
> throughput. But we used 2.3.0 version, and I don't suppose it matters much
> at that point.
>
> https://www.youtube.com/watch?v=oQe7PpDDdzA
>
> The above shows a very good and detailed analysis done by Becket regarding
> throughput calculation and estimation. Have you checked this to see what
> matches your scenario?
> The sending is impacted by linger.ms too, not just batch.size. So tuning
> them together is a bit tricky. Perhaps you want to see which one you need
> more.
>
>
>
>
>> --
>> *Frank Zhou*
>> R&D, Itiviti
>> Java Developer
>> D +852 2521 7480
>> frank.zhou@itiviti.com
>>
>> ______________________________
>>
>> itiviti.com <https://www.itiviti.com/>
>>
>> *The information contained in or attached to this email is strictly
>> confidential. If you are not the intended recipient, please notify us
>> immediately by telephone and return the message to us.*
>>
>> *Email communications by definition contain personal information. The
>> Itiviti group of companies is subject to European data protection
>> regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
>> <http://www.itiviti.com/>. Itiviti expects the recipient of this email to
>> be compliant with Itiviti’s Privacy Notice and applicable regulations.
>> Please advise us immediately at dataprotectionteam@Itiviti.com if you are
>> not compliant with these.*
>>
>> ______________________________
>>
>> itiviti.com <https://www.itiviti.com/>
>> <https://www.linkedin.com/company/itiviti> Follow Itiviti on Linkedin
>> <https://www.linkedin.com/company/10438325/>
>>
>> The information contained in or attached to this email is strictly
>> confidential. If you are not the intended recipient, please notify us
>> immediately by telephone and return the message to us. Email communications
>> by definition contain personal information. The Itiviti group of companies
>> is subject to European data protection regulations.
>>
>> Itiviti’s Privacy Notice is available at www.itiviti.com. Itiviti
>> expects the recipient of this email to be compliant with Itiviti’s Privacy
>> Notice and applicable regulations. Please advise us immediately at
>> dataprotectionteam@Itiviti.com if you are not compliant with these.
>>
>

-- 
*Frank Zhou*
R&D, Itiviti
Java Developer
D +852 2521 7480
frank.zhou@itiviti.com

______________________________

itiviti.com <https://www.itiviti.com/>

*The information contained in or attached to this email is strictly
confidential. If you are not the intended recipient, please notify us
immediately by telephone and return the message to us.*

*Email communications by definition contain personal information. The
Itiviti group of companies is subject to European data protection
regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
<http://www.itiviti.com/>. Itiviti expects the recipient of this email to
be compliant with Itiviti’s Privacy Notice and applicable regulations.
Please advise us immediately at dataprotectionteam@Itiviti.com if you are
not compliant with these.*

-- 
______________________________

itiviti.com <https://www.itiviti.com/>
 
<https://www.linkedin.com/company/itiviti> Follow Itiviti on Linkedin 
<https://www.linkedin.com/company/10438325/>




The information contained 
in or attached to this email is strictly confidential. If you are not the 
intended recipient, please notify us immediately by telephone and return 
the message to us. Email communications by definition contain personal 
information. The Itiviti group of companies is subject to European data 
protection regulations.



Itiviti’s Privacy Notice is available at 
www.itiviti.com <http://www.itiviti.com/>. Itiviti expects the recipient of 
this email to be compliant with Itiviti’s Privacy Notice and applicable 
regulations. Please advise us immediately at dataprotectionteam@Itiviti.com 
if you are not compliant with these.

Re: Message batch & compression doesn't work as expected

Posted by "M. Manna" <ma...@gmail.com>.
Frank,

On Thu, 12 Dec 2019 at 11:28, Frank Zhou <fr...@itiviti.com> wrote:

> Hi,
>
> I am testing kafka client on message batch and compression. I have enabled
> message batching along with compression, with batch.size set to 3M,
> linger.ms set to 5000ms and compression.type set to gzip(Attached whole
> producer config). Then testing with wireshark to check the details.
> First issue our team notice is that compression codec seems to have some
> issue. Since we set it as gzip, but we notice in wireshark, it will display
> as other compression codec, like Snappy in attached screenshot(Not sure if
> this is wireshark's issue, or a real issue on Kafka, but the whole packet
> details display seems fine in wireshark).
> Second issue is that we have set the latency and batch number so high, but
> it still send the Produce request to server much more frequently than we
> expected. Size per message that is sending before batch & compression
> should be around 200 bytes, and during testing, all the message generated
> by us should be around 200KB, so we are expecting much less packets
> transferred than this(screenshot only shows small amount of them, total
> number is 1472).
> [image: 2019-12-12_19h00_15.png]
>  Is it we miss some config or the config is not correct leading to this?
>
>
>
I recently tuned our GCP based test cluster using batch of 800K, no
compression, and no linger.ms. We got the desired consistency and desired
throughput. But we used 2.3.0 version, and I don't suppose it matters much
at that point.

https://www.youtube.com/watch?v=oQe7PpDDdzA

The above shows a very good and detailed analysis done by Becket regarding
throughput calculation and estimation. Have you checked this to see what
matches your scenario?
The sending is impacted by linger.ms too, not just batch.size. So tuning
them together is a bit tricky. Perhaps you want to see which one you need
more.




> --
> *Frank Zhou*
> R&D, Itiviti
> Java Developer
> D +852 2521 7480
> frank.zhou@itiviti.com
>
> ______________________________
>
> itiviti.com <https://www.itiviti.com/>
>
> *The information contained in or attached to this email is strictly
> confidential. If you are not the intended recipient, please notify us
> immediately by telephone and return the message to us.*
>
> *Email communications by definition contain personal information. The
> Itiviti group of companies is subject to European data protection
> regulations. Itiviti’s Privacy Notice is available at www.itiviti.com
> <http://www.itiviti.com/>. Itiviti expects the recipient of this email to
> be compliant with Itiviti’s Privacy Notice and applicable regulations.
> Please advise us immediately at dataprotectionteam@Itiviti.com if you are
> not compliant with these.*
>
> ______________________________
>
> itiviti.com <https://www.itiviti.com/>
> <https://www.linkedin.com/company/itiviti> Follow Itiviti on Linkedin
> <https://www.linkedin.com/company/10438325/>
>
> The information contained in or attached to this email is strictly
> confidential. If you are not the intended recipient, please notify us
> immediately by telephone and return the message to us. Email communications
> by definition contain personal information. The Itiviti group of companies
> is subject to European data protection regulations.
>
> Itiviti’s Privacy Notice is available at www.itiviti.com. Itiviti expects
> the recipient of this email to be compliant with Itiviti’s Privacy Notice
> and applicable regulations. Please advise us immediately at
> dataprotectionteam@Itiviti.com if you are not compliant with these.
>