You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2015/03/05 20:57:38 UTC
[jira] [Created] (HADOOP-11680) Deduplicate jars in convenience
binary distribution
Sean Busbey created HADOOP-11680:
------------------------------------
Summary: Deduplicate jars in convenience binary distribution
Key: HADOOP-11680
URL: https://issues.apache.org/jira/browse/HADOOP-11680
Project: Hadoop Common
Issue Type: Improvement
Components: build
Reporter: Sean Busbey
Assignee: Sean Busbey
Pulled from discussion on HADOOP-11656 Colin wrote:
{quote}
bq. Andrew wrote: One additional note related to this, we can spend a lot of time right now distributing 100s of MBs of jar dependencies when launching a YARN job. Maybe this is ameliorated by the new shared distributed cache, but I've heard this come up quite a bit as a complaint. If we could meaningfully slim down our client, it could lead to a nice win.
I'm frustrated that nobody responded to my earlier suggestion that we de-duplicate jars. This would drastically reduce the size of our install, and without rearchitecting anything.
In fact I was so frustrated that I decided to write a program to do it myself and measure the delta. Here it is:
Before:
{code}
du -h /h
249M /h
{code}
After:
{code}
du -h /h
140M /h
{code}
Seems like deduplicating jars would be a much better project than splitting into a client jar, if we really cared about this.
<snip>
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)