You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Alejandro Abdelnur <tu...@cloudera.com> on 2010/12/09 01:21:11 UTC

Re: Pig 0.7 Release Engineering and how to drop 18 MB out of the distro size

+1

This wil greatly simplify (or rather say enable) the use of Pig from within
other systems (like Oozie) as it will allow to do a proper component
dependency resolution.

Thanks.

Alejandro

On Thu, Dec 9, 2010 at 3:37 AM, Stephen Watt <sw...@us.ibm.com> wrote:

> Hi Folks
>
> I've been doing some release engineering around Pig 0.7 and thought I
> would share this in case any of you have it baked into a distribution.
> Using the current techniques you can drop the current distro from 44MB to
> a runtime only distro of 26MB. Also, if I've missed something or anything
> I'm suggesting here has any negative ramifications I'd love to know.
>
> 1) Delete everything out of lib directory and copy the following files
> into the lib directory commons-el.jar  commons-httpclient-3.0.1.jar
> commons-logging-1.0.4.jar  hadoop-0.20.2-core.jar  hbase-0.20.6.jar
> hbase-0.20.6-test.jar  jline-0.9.94.jar  log4j-1.2.15.jar
> 2) Delete the Pig Jars in $PIG_HOME except pig-0.7.1-dev-core.jar and copy
> it into the lib directory
> 3) Add the following to bin/pig so that grunt still works:
>
> for f in $PIG_DIR/lib/*.jar; do
>    CLASSPATH=${CLASSPATH}:$f;
> done
>
> Lastly, some observations
>
> - According to its JIRA ticket, automaton.jar is part of Pig 0.8, what is
> the jar doing in Pig 0.7?
>
> - Those that ship Pig need to do Legal scans on the software to ensure all
> the dependencies (jars in the lib folder) have friendly licenses and can
> be shipped along with the base project. Creating files like Hadoop20.jar,
> where Hadoop and all of its dependencies + a bunch of classes of
> undetermined origin are all compiled into a single jar makes this
> extremely difficult. I'd like to bring it up for consideration that in
> future releases we could have an independent jar for each project in the
> lib. Otherwise, for each class we have to figure out what the project is
> (to determine its license) and what version it is based on the package
> name and date of the classes.
>
> Regards
> Steve Watt