You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Stefan Groschupf <sg...@101tec.com> on 2008/02/13 20:12:37 UTC

hadoop15 & hadoop14 both in lib

Hi,
sorry for the traffic.
Why is there a hadoop14 and a hadoop15 jar in lib?
Wouldn't be one enough? As far I understand the code 15 is required  
since generics are used.
Thanks for any clarification.
Stefan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com



Re: hadoop15 & hadoop14 both in lib

Posted by Stefan Groschupf <sg...@101tec.com>.
On Feb 19, 2008, at 3:54 AM, Craig Macdonald wrote:
>> As mentioned earlier we need to write a pig shell script anyhow  
>> from my point of view.
> What would the shell script do that the Perl script pig.pl couldn't  
> be made to do?


Ups, I never payed attention to the perl script. I guess nothing,  
except remove the perl dependency but introduce a dependency to a  
shell (cygwin on windows). But since hadoop require cygwin on windows  
already anyway, I guess I would prefer that.



Re: hadoop15 & hadoop14 both in lib

Posted by Craig Macdonald <cr...@dcs.gla.ac.uk>.
Stefan Groschupf wrote:
> On Feb 16, 2008, at 1:56 AM, Andrzej Bialecki wrote:
>> If you want to keep hadoop-related jars separate from other jars, you 
>> could put them all together in a lib/hadoop subdir.
> +1, I like that idea.
> As mentioned earlier we need to write a pig shell script anyhow from 
> my point of view.
What would the shell script do that the Perl script pig.pl couldn't be 
made to do?

C

Re: hadoop15 & hadoop14 both in lib

Posted by Stefan Groschupf <sg...@101tec.com>.
On Feb 16, 2008, at 1:56 AM, Andrzej Bialecki wrote:
> If you want to keep hadoop-related jars separate from other jars,  
> you could put them all together in a lib/hadoop subdir.

+1, I like that idea.
As mentioned earlier we need to write a pig shell script anyhow from  
my point of view.


Re: hadoop15 & hadoop14 both in lib

Posted by Andrzej Bialecki <ab...@getopt.org>.
Alan Gates wrote:
> A few answers to your questions.
> 
> The hadoopX.jar files in pig's lib directory are not the standard hadoop 
> jars.  They differ in two ways.  First, we recreate a hadoop jar that 
> rolls in all the jars needed to compile with hadoop.  This is somewhere 
> around 15 jars.  Second, we have a small hack we add for historical 
> reasons.  We need to resolve both of those issues.  Once we do we can 
> use stock hadoop jars instead of carrying along our own.


If you want to keep hadoop-related jars separate from other jars, you 
could put them all together in a lib/hadoop subdir. Re-packaging jars is 
confusing, you lose versioning information of dependent jars and also 
some jars may depend on specific values in MANIFEST, which repackaging 
may have dropped.

Regarding the hack: we had similar problems in Nutch. If changes are 
required to core Hadoop, perhaps it's better to submit them to Hadoop 
for inclusion. If they are a temporary hack, perhaps a facade class is a 
better approach. In some cases in Nutch we had to used a patched library 
anyway, which was then clearly marked as such and diffs from the stock 
version were available in JIRA.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


RE: hadoop15 & hadoop14 both in lib

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
I am working on integrating 0.16 and will remove 0.14 jar as part of that update.

Olga 

> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com] 
> Sent: Friday, February 15, 2008 4:02 PM
> To: pig-dev@incubator.apache.org
> Subject: Re: hadoop15 & hadoop14 both in lib
> 
> A few answers to your questions.
> 
> The hadoopX.jar files in pig's lib directory are not the 
> standard hadoop jars.  They differ in two ways.  First, we 
> recreate a hadoop jar that rolls in all the jars needed to 
> compile with hadoop.  This is somewhere around 15 jars.  
> Second, we have a small hack we add for historical reasons.  
> We need to resolve both of those issues.  Once we do we can 
> use stock hadoop jars instead of carrying along our own.
> 
> The reason for having multiple versions is to support 
> compilation against multiple versions of hadoop.  I'm not 
> sure what the use of
> hadoop14 is since we can't compile against it anymore.  But 
> once we've tested a build against hadoop16 (coming soon), 
> we'll need a library for that.  And we will be able to build 
> against either 15 or 16.
> 
> Alan.
> 
> Benjamin Francisoud wrote:
> > Stefan Groschupf a écrit :
> >>> Also, I think the Pig project should follow the common 
> practice and 
> >>> NOT rename third-party libraries, i.e. in this case to keep the 
> >>> original name of hadoop-0.15.0.jar (if indeed it was that Hadoop 
> >>> release).
> >>
> >> 100 % agreed. What would be great!
> >> What would be a perfect solution would be using 
> >> http://ant.apache.org/ivy/.
> >> However as I understand it this required that the hadoop 
> developers 
> >> publish there releases into a repository.
> >> However not sure if hadoop developers are willing to do that. It 
> >> would help quite a lot for many other projects as well.
> >>
> >> Stefan
> > Even if I think ivy is great pig has so few libs (4 if I exclude
> > hadoop14) that I think a "classical" lib folder holding jars (with 
> > version numbers in the jars name) could be enough...
> >
> > http://svn.apache.org/repos/asf/incubator/pig/trunk/lib/
> >
> > my 2 cents
> >
> 

Re: hadoop15 & hadoop14 both in lib

Posted by Alan Gates <ga...@yahoo-inc.com>.
A few answers to your questions.

The hadoopX.jar files in pig's lib directory are not the standard hadoop 
jars.  They differ in two ways.  First, we recreate a hadoop jar that 
rolls in all the jars needed to compile with hadoop.  This is somewhere 
around 15 jars.  Second, we have a small hack we add for historical 
reasons.  We need to resolve both of those issues.  Once we do we can 
use stock hadoop jars instead of carrying along our own.

The reason for having multiple versions is to support compilation 
against multiple versions of hadoop.  I'm not sure what the use of 
hadoop14 is since we can't compile against it anymore.  But once we've 
tested a build against hadoop16 (coming soon), we'll need a library for 
that.  And we will be able to build against either 15 or 16.

Alan.

Benjamin Francisoud wrote:
> Stefan Groschupf a écrit :
>>> Also, I think the Pig project should follow the common practice and 
>>> NOT rename third-party libraries, i.e. in this case to keep the 
>>> original name of hadoop-0.15.0.jar (if indeed it was that Hadoop 
>>> release).
>>
>> 100 % agreed. What would be great!
>> What would be a perfect solution would be using 
>> http://ant.apache.org/ivy/.
>> However as I understand it this required that the hadoop developers 
>> publish there releases into a repository.
>> However not sure if hadoop developers are willing to do that. It 
>> would help quite a lot for many other projects as well.
>>
>> Stefan
> Even if I think ivy is great pig has so few libs (4 if I exclude 
> hadoop14) that I think a "classical" lib folder holding jars (with 
> version numbers in the jars name) could be enough...
>
> http://svn.apache.org/repos/asf/incubator/pig/trunk/lib/
>
> my 2 cents
>

Re: hadoop15 & hadoop14 both in lib

Posted by Benjamin Francisoud <be...@joost.com>.
Stefan Groschupf a écrit :
>> Also, I think the Pig project should follow the common practice and 
>> NOT rename third-party libraries, i.e. in this case to keep the 
>> original name of hadoop-0.15.0.jar (if indeed it was that Hadoop 
>> release).
>
> 100 % agreed. What would be great!
> What would be a perfect solution would be using 
> http://ant.apache.org/ivy/.
> However as I understand it this required that the hadoop developers 
> publish there releases into a repository.
> However not sure if hadoop developers are willing to do that. It would 
> help quite a lot for many other projects as well.
>
> Stefan
Even if I think ivy is great pig has so few libs (4 if I exclude 
hadoop14) that I think a "classical" lib folder holding jars (with 
version numbers in the jars name) could be enough...

http://svn.apache.org/repos/asf/incubator/pig/trunk/lib/

my 2 cents


Re: hadoop15 & hadoop14 both in lib

Posted by Stefan Groschupf <sg...@101tec.com>.
> Also, I think the Pig project should follow the common practice and  
> NOT rename third-party libraries, i.e. in this case to keep the  
> original name of hadoop-0.15.0.jar (if indeed it was that Hadoop  
> release).

100 % agreed. What would be great!
What would be a perfect solution would be using http://ant.apache.org/ivy/ 
.
However as I understand it this required that the hadoop developers  
publish there releases into a repository.
However not sure if hadoop developers are willing to do that. It would  
help quite a lot for many other projects as well.

Stefan


Re: hadoop15 & hadoop14 both in lib

Posted by Andrzej Bialecki <ab...@getopt.org>.
Benjamin Francisoud wrote:
> Stefan Groschupf a écrit :
>> Hi,
>> sorry for the traffic.
>> Why is there a hadoop14 and a hadoop15 jar in lib?
>> Wouldn't be one enough? As far I understand the code 15 is required 
>> since generics are used.
>> Thanks for any clarification.
>> Stefan
> The build.xml exclude hadoop14 from classpath when building...
> 
> As a user, I don't need 14 backward compatibility, our production 
> cluster is using 15.

Also, I think the Pig project should follow the common practice and NOT 
rename third-party libraries, i.e. in this case to keep the original 
name of hadoop-0.15.0.jar (if indeed it was that Hadoop release).



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: hadoop15 & hadoop14 both in lib

Posted by Stefan Groschupf <sg...@101tec.com>.
> The build.xml exclude hadoop14 from classpath when building...

So can we remove hadoop 14 from the project than? Can one of the  
contributors please do that?
Thanks.
Stefan


Re: hadoop15 & hadoop14 both in lib

Posted by Benjamin Francisoud <be...@joost.com>.
Stefan Groschupf a écrit :
> Hi,
> sorry for the traffic.
> Why is there a hadoop14 and a hadoop15 jar in lib?
> Wouldn't be one enough? As far I understand the code 15 is required 
> since generics are used.
> Thanks for any clarification.
> Stefan
The build.xml exclude hadoop14 from classpath when building...

As a user, I don't need 14 backward compatibility, our production 
cluster is using 15.