You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by ijuma <gi...@git.apache.org> on 2017/06/02 10:17:55 UTC

[GitHub] kafka pull request #3205: KAFKA-5236; Increase the block/buffer size when co...

GitHub user ijuma opened a pull request:

    https://github.com/apache/kafka/pull/3205

    KAFKA-5236; Increase the block/buffer size when compressing with Snappy and Gzip

    We had originally increased Snappy’s block size as part of KAFKA-3704. However,
    we had some issues with excessive memory usage in the producer and we reverted
    it in 7c6ee8d5e.
    
    After more investigation, we fixed the underlying reason why memory usage seemed
    to grow much more than expected in KAFKA-3747 (included in 0.10.0.1).
    
    In 0.10.2, we changed the broker to use the same classes as the producer and the
    broker’s block size for Snappy was changed from 32 KB to 1KB. As reported in
    KAFKA-5236, the on disk size is, in some cases, 50% larger when the data is compressed
    with 1 KB instead of 32 KB as the block size.
    
    As discussed in KAFKA-3704, it may be worth making this configurable and/or allocate
    the compression buffers from the producer pool. However, for 0.11.0.0, I think the
    simplest thing to do is to default to 32 KB for Snappy (the default if no block size
    is provided).
    
    I also increased the Gzip buffer size. 1 KB is too small and the default is smaller
    still (512 bytes). 8 KB (which is the default buffer size for BufferedOutputStream)
    seemed like a reasonable default.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ijuma/kafka kafka-5236-snappy-block-size

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3205
    
----
commit ef4af6757575e694c109074b67e59704ff437b56
Author: Ismael Juma <is...@juma.me.uk>
Date:   2017-06-02T10:17:23Z

    KAFKA-5236; Increase the block/buffer size when compressing with Snappy and Gzip
    
    We had originally increased Snappy’s block size as part of KAFKA-3704. However,
    we had some issues with excessive memory usage in the producer and we reverted
    it in 7c6ee8d5e.
    
    After more investigation, we fixed the underlying reason why memory usage seemed
    to grow much more than expected in KAFKA-3747 (included in 0.10.0.1).
    
    In 0.10.2, we changed the broker to use the same classes as the producer and the
    broker’s block size for Snappy was changed from 32 KB to 1KB. As reported in
    KAFKA-5236, the on disk size is, in some cases, 50% larger when the data is compressed
    with 1 KB instead of 32 KB as the block size.
    
    As discussed in KAFKA-3704, it may be worth making this configurable and/or allocate
    the compression buffers from the producer pool. However, for 0.11.0.0, I think the
    simplest thing to do is to default to 32 KB for Snappy (the default if no block size
    is provided).
    
    I also increased the Gzip buffer size. 1 KB is too small and the default is smaller
    still (512 bytes). 8 KB (which is the default buffer size for BufferedOutputStream)
    seemed like a reasonable default.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #3205: KAFKA-5236; Increase the block/buffer size when co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/kafka/pull/3205


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---