You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ismael Juma (JIRA)" <ji...@apache.org> on 2016/02/19 00:45:18 UTC

[jira] [Commented] (KAFKA-3250) release tarball is unnecessarily large due to duplicate libraries

    [ https://issues.apache.org/jira/browse/KAFKA-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153383#comment-15153383 ] 

Ismael Juma commented on KAFKA-3250:
------------------------------------

https://github.com/apache/kafka/pull/693 may fix it.

> release tarball is unnecessarily large due to duplicate libraries
> -----------------------------------------------------------------
>
>                 Key: KAFKA-3250
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3250
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Gwen Shapira
>
> Between 0.8.2.2 and 0.9.0, our release tarballs grew from 17M to 34M. We thought it is just due to new libraries and dependencies. But:
> 1. If you untar Kafka into a directory and check the directory size (du -sh), it is around 28M, smaller than the tarball. Recompressing give you 25M tarball.
> 2. If you list the original tar contents and grep for "snappy", you see it 4 times in the tarball.
> Clearly we are creating a tarball with duplicates (and we didn't before).
> I think its due to how we are generating the tarball from core but pull in other projects into libs/ directory with their dependencies (which overlap).
> We need to find out how to sort it out (possibly with excludes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)