You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jordan Shaw (JIRA)" <ji...@apache.org> on 2015/09/25 21:30:04 UTC

[jira] [Commented] (KAFKA-2189) Snappy compression of message batches less efficient in 0.8.2.1

    [ https://issues.apache.org/jira/browse/KAFKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908553#comment-14908553 ] 

Jordan Shaw commented on KAFKA-2189:
------------------------------------

Hi all,
I was wondering if this affects only 0.8.2.1 or also 0.8.2? We are on 0.8.2 and just did a complete rebalance across our brokers and some brokers are at 70% disk utilization and some are at 30%. Thanks.

> Snappy compression of message batches less efficient in 0.8.2.1
> ---------------------------------------------------------------
>
>                 Key: KAFKA-2189
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2189
>             Project: Kafka
>          Issue Type: Bug
>          Components: build, compression, log
>    Affects Versions: 0.8.2.1
>            Reporter: Olson,Andrew
>            Assignee: Ismael Juma
>            Priority: Blocker
>              Labels: trivial
>             Fix For: 0.9.0.0, 0.8.2.2
>
>         Attachments: KAFKA-2189.patch
>
>
> We are using snappy compression and noticed a fairly substantial increase (about 2.25x) in log filesystem space consumption after upgrading a Kafka cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages being seemingly recompressed individually (or possibly with a much smaller buffer or dictionary?) instead of as a batch as sent by producers. We eventually tracked down the change in compression ratio/scope to this [1] commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka client version does not appear to be relevant as we can reproduce this with both the 0.8.1.1 and 0.8.2.1 Producer.
> Here are the log files from our troubleshooting that contain the same set of 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue.
> {noformat}
> -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45 /var/kafka2/f9d9b-batch-1-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45 /var/kafka2/f9d9b-batch-10-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka  89645 May 12 11:45 /var/kafka2/f9d9b-batch-100-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka  88279 May 12 11:45 /var/kafka2/f9d9b-batch-1000-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41 /var/kafka2/f5ab8-batch-1-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41 /var/kafka2/f5ab8-batch-10-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41 /var/kafka2/f5ab8-batch-100-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41 /var/kafka2/f5ab8-batch-1000-0/00000000000000000000.log
> {noformat}
> [1] https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)