You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by Josh Wills <jw...@cloudera.com> on 2013/03/11 22:01:14 UTC

refactoring crunch-archetype

Hey Matthias,

I cc'd everyone else on here, but since this was your module, I thought it
best to solicit your opinion before refactoring it.

We never managed to get crunch-archetypes working w/hadoop 2.x, which is
apparently deprecating the lib/* trick for including client dependencies in
favor of the -libjars option (see
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and
http://architects.dzone.com/articles/using-libjars-option-hadoop )

The way that I have found to do this in Maven is to use the
copy-dependencies option of the maven-dependency-plugin and include a shell
script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH and
libjars arguments for use with hadoop jar. Although this approach is more
complex than the lib/* trick, it will be able to support hadoop 1.x as well
as hadoop 2.x.

Do you have any objections to me taking this on, and/or any other landmines
I should keep an eye out for?

Thanks!
Josh

-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: refactoring crunch-archetype

Posted by Josh Wills <jo...@gmail.com>.
No, please go ahead and fix it. I think (as per Matthias' comments) it's
going to be something that's going to take me a few weeks at least.

J


On Tue, Mar 12, 2013 at 11:19 AM, Gabriel Reid <ga...@gmail.com>wrote:

> Hi guys,
>
> Speaking of the archetype, I just tried to use it today (actually for the
> first time) and it seems that there's an issue with it -- when I tried to
> run the generated project within Eclipse, I ran into a class versioning
> issue. Namely, the version of commons-codec that is pulled in by
> commons-httpclient doesn't match up with the version used by Hadoop.
>
> I was going to fix this (by adding an exclusion for commons-codec to the
> commons-httpclient dependency). Josh if you're going to do some work on the
> archetype in the short term I'll just leave this as it is and it can get
> tackled as part of the refactoring of the archetype. Were you planning on
> doing this refactoring in the pretty short term? If not, I'll fix the
> archetype now.
>
> - Gabriel
>
>
> On 12 Mar 2013, at 09:33, Matthias Friedrich <ma...@mafr.de> wrote:
>
> > Hi,
> >
> > sure, feel free to take this on. The tricky thing is to make sure that
> > the generated project has correct dependencies for both Hadoop 1 and 2.
> >
> > Last time I tried this (and failed due to bugs in the archetype plugin),
> > I used Velocity templates and introduced a new archetype variable so
> > that the user could select if he's creating a Hadoop 1 or 2 project.
> > Maybe you get it working, there has since been a new release of the
> > archetype plugin.
> >
> > Shout if you need any help.
> >
> > Regards,
> >  Matthias
> >
> > On Monday, 2013-03-11, Josh Wills wrote:
> >> I cc'd everyone else on here, but since this was your module, I thought
> it
> >> best to solicit your opinion before refactoring it.
> >>
> >> We never managed to get crunch-archetypes working w/hadoop 2.x, which is
> >> apparently deprecating the lib/* trick for including client
> dependencies in
> >> favor of the -libjars option (see
> >>
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and
> >> http://architects.dzone.com/articles/using-libjars-option-hadoop )
> >>
> >> The way that I have found to do this in Maven is to use the
> >> copy-dependencies option of the maven-dependency-plugin and include a
> shell
> >> script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH
> and
> >> libjars arguments for use with hadoop jar. Although this approach is
> more
> >> complex than the lib/* trick, it will be able to support hadoop 1.x as
> well
> >> as hadoop 2.x.
> >>
> >> Do you have any objections to me taking this on, and/or any other
> landmines
> >> I should keep an eye out for?
> >>
> >> Thanks!
> >> Josh
> >>
> >> --
> >> Director of Data Science
> >> Cloudera <http://www.cloudera.com>
> >> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>

Re: refactoring crunch-archetype

Posted by Gabriel Reid <ga...@gmail.com>.
Hi guys,

Speaking of the archetype, I just tried to use it today (actually for the first time) and it seems that there's an issue with it -- when I tried to run the generated project within Eclipse, I ran into a class versioning issue. Namely, the version of commons-codec that is pulled in by commons-httpclient doesn't match up with the version used by Hadoop.

I was going to fix this (by adding an exclusion for commons-codec to the commons-httpclient dependency). Josh if you're going to do some work on the archetype in the short term I'll just leave this as it is and it can get tackled as part of the refactoring of the archetype. Were you planning on doing this refactoring in the pretty short term? If not, I'll fix the archetype now.

- Gabriel


On 12 Mar 2013, at 09:33, Matthias Friedrich <ma...@mafr.de> wrote:

> Hi,
> 
> sure, feel free to take this on. The tricky thing is to make sure that
> the generated project has correct dependencies for both Hadoop 1 and 2.
> 
> Last time I tried this (and failed due to bugs in the archetype plugin),
> I used Velocity templates and introduced a new archetype variable so
> that the user could select if he's creating a Hadoop 1 or 2 project.
> Maybe you get it working, there has since been a new release of the
> archetype plugin.
> 
> Shout if you need any help.
> 
> Regards,
>  Matthias
> 
> On Monday, 2013-03-11, Josh Wills wrote:
>> I cc'd everyone else on here, but since this was your module, I thought it
>> best to solicit your opinion before refactoring it.
>> 
>> We never managed to get crunch-archetypes working w/hadoop 2.x, which is
>> apparently deprecating the lib/* trick for including client dependencies in
>> favor of the -libjars option (see
>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and
>> http://architects.dzone.com/articles/using-libjars-option-hadoop )
>> 
>> The way that I have found to do this in Maven is to use the
>> copy-dependencies option of the maven-dependency-plugin and include a shell
>> script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH and
>> libjars arguments for use with hadoop jar. Although this approach is more
>> complex than the lib/* trick, it will be able to support hadoop 1.x as well
>> as hadoop 2.x.
>> 
>> Do you have any objections to me taking this on, and/or any other landmines
>> I should keep an eye out for?
>> 
>> Thanks!
>> Josh
>> 
>> -- 
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>


Re: refactoring crunch-archetype

Posted by Matthias Friedrich <ma...@mafr.de>.
Hi,

sure, feel free to take this on. The tricky thing is to make sure that
the generated project has correct dependencies for both Hadoop 1 and 2.

Last time I tried this (and failed due to bugs in the archetype plugin),
I used Velocity templates and introduced a new archetype variable so
that the user could select if he's creating a Hadoop 1 or 2 project.
Maybe you get it working, there has since been a new release of the
archetype plugin.

Shout if you need any help.

Regards,
  Matthias

On Monday, 2013-03-11, Josh Wills wrote:
> I cc'd everyone else on here, but since this was your module, I thought it
> best to solicit your opinion before refactoring it.
> 
> We never managed to get crunch-archetypes working w/hadoop 2.x, which is
> apparently deprecating the lib/* trick for including client dependencies in
> favor of the -libjars option (see
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and
> http://architects.dzone.com/articles/using-libjars-option-hadoop )
> 
> The way that I have found to do this in Maven is to use the
> copy-dependencies option of the maven-dependency-plugin and include a shell
> script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH and
> libjars arguments for use with hadoop jar. Although this approach is more
> complex than the lib/* trick, it will be able to support hadoop 1.x as well
> as hadoop 2.x.
> 
> Do you have any objections to me taking this on, and/or any other landmines
> I should keep an eye out for?
> 
> Thanks!
> Josh
> 
> -- 
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>