You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (JIRA)" <ji...@apache.org> on 2012/10/31 17:25:14 UTC

[jira] [Updated] (KAFKA-595) Producer side compression is unnecessary

     [ https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-595:
--------------------------------

    Issue Type: Improvement  (was: Bug)
    
> Producer side compression is unnecessary
> ----------------------------------------
>
>                 Key: KAFKA-595
>                 URL: https://issues.apache.org/jira/browse/KAFKA-595
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>              Labels: feature, features
>
> Compression can be used to store something in less space (less IO) and/or transfer it less expensively (better use of network bandwidth). Often the two go hand in hand, such as when compressed data is written to a disk: the disk I/O takes less time, since less bits are being transferred, and the storage occupied on the disk after the transfer is less. Unfortunately, the time to compress the data can exceed the savings gained from transferring less data, resulting in overall degradation.
> After KAFKA-506, the network usage gains we used to get by compressing data at the producers is  exceeded by the cost of decompressing and re-compressing data at the server side. Compression to save on network costs must be done either to reduce the contention in a wide-area network due to multiple point to point connections OR to efficiently transfer data over low-bandwidth networks (cross DC). In the case of producer-server connections, neither is typically true, which means we might not benefit from producer side compression at all in most production deployments of Kafka. On the contrary, it might actually hurt performance since most production deployments turn on compression for all topics.
> The main benefit of compressing data in Kafka is to efficiently transfer data cross DC for setting up mirrored Kafka clusters. The performance benefit is also true for real time consumers, especially when there are multiple groups of consumers consuming the same topic. If data is compressed on the server side instead, which we do anyways, we can get the I/O savings as well as efficient network transfer on the server-consumer links.
> I don't have numbers to quantify the performance impact of re-compression now, since there are other changes that need to be done to test this correctly.
> Thoughts ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira