You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Alejandro Abdelnur <tu...@cloudera.com> on 2011/08/02 23:13:04 UTC

getting started building Mavenized hadoop common

With the HADOOP-6671 commit the way of building hadoop common has changed
significantly.

While the wiki explains these changes, and there is a BUILDING.txt
directory, still I guess things will hit many of of you.

Because of this I've put together some brief notes.

Thanks.

Alejandro

----------------------
NEW LAYOUT

After updating the trunk you'll the the following directory changes at top
level

 Removed: common/
 New: hadoop-common/, hadoop-project/, hadoop-annotations/,
hadoop-assemblies/

* hadoop-common/ is the new common/ and its sub-dirs are organized following
Maven standard project layout.
* hadoop-project/ contains Hadoop project root POM, all dependency versions
are defined there
* hadoop-annotations/ contains the Hadoop public/private annotation classes
* hadoop-assemblies/ contains the assembly files that create the
distribution directories layout

----------------------
BUILDING REQUIREMENTS

The only new build requirement is Maven 3 (it must be at least Maven 3).

The environment var FORREST_HOME must be set if building the documentation.

----------------------
FIRST MAVEN BUILD

It must be run from the trunk/ directory.

Run: 'mvn install -DskipTests'

This will install the different submodules
(project/annotations/assemblies/common) into the local Maven cache
(~/.m2/repository).

After this is done, you can build from the hadoop-commons directory.

NOTE: this will not be required once the SNAPSHOTS Maven repo has the
snapshots published.
----------------------
TARGET/ IS THE NEW BUILD/

The new build directory is target/

----------------------
USING AN IDE

Eclipse and IntelliJ will recognize and open the project from the POM file.

Make sure you run a 'mvn test -DskipTests' every time you have a clean
target/ directory as Maven generates code required for testing and sets some
directories under target/

----------------------
BUILDING

Run 'mvn compile'

To compile native code add '-Pnative'

----------------------
RUNNING TESTCASES

Run 'mvn test -Dtest=TESTCASECLASS'

To run multiple testcases separate the testcases name with comma

To run all testcases don't specifiy -'Dtest=...'

NOTE: TESTCASECLASS is just the testcase classname, no package name, no
extension.

----------------------
CREATING THE TAR

Run 'mvn package -Pbintar -DskipTests'

NOTE: The '-Ptar' profile will create the legacy layout, but the Hadoop
scripts will not work with the legacy layout (this has been the case before
 HADOOP-6671)

----------------------
RUNNING THE HADOOP SCRIPTS IN DEVELOPMENT

Run 'mvn package -Pbintar -DskipTests'

The Hadoop scripts can be executed from
hadoop-common/target/hadoop-common-0.23.0-SNAPSHOT-bin/bin/ directory.

----------------------

Re: getting started building Mavenized hadoop common

Posted by Scott Carey <sc...@richrelevance.com>.

Also note that if you trigger a profile that changes the repo, it
_replaces_ the other repos, it does not 'merge' them. So if you configure
a profile that defines an internal Maven repo, you have just 'erased' the
other repos from the POV of your build.

You must either: define both your internal repo AND the external ones in
the POM, 
OR
Configure your internal repo to cascade to an external one for things it
does not find.

Generally, the latter is recommended because it gives your organization
more control over what is pulled and from where.

On 8/5/11 9:53 AM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:

>Joep,
>
>For using a settings.xml in other location than ~/.m2 use:
>
>mvn --settings /home/foo/mysettings.xml ....
>
>For using a local cache in other location than ~/.m2/repository use:
>
>mvn -Dmaven.repo.local ...
>
>Hope this helps.
>
>Alejandro
>
>On Fri, Aug 5, 2011 at 9:43 AM, Rottinghuis, Joep <jr...@ebay.com>
>wrote:
>>
>> My ~/.m2/settings.xml has an activeProfile defined for an internal
>>Maven repo to be able to resolve 0.22 snapshots internally.
>>
>> When trying the
>> mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
>> step this failed because it tried to resolve 0.23-SNAPSHOT against our
>>internal repo.
>>
>> After first running
>> mvn install -DskipTests
>> the problem was resolved.
>>
>> This makes me wonder if it is possible to pass something like
>>-Dmaven.repo.local or -Dsettings.localRepository on the command-line?
>> It would also be nice if I can specify different active profiles per
>>build through properties.
>> Right now that all seems to come from the same shared
>>~/.m2/settings.xml.
>>
>> Cheers,
>>
>> Joep
>>
>> -----Original Message-----
>> From: Eli Collins [mailto:eli@cloudera.com]
>> Sent: Thursday, August 04, 2011 1:41 PM
>> To: general@hadoop.apache.org
>> Subject: Re: getting started building Mavenized hadoop common
>>
>> On Thu, Aug 4, 2011 at 1:38 PM, Eli Collins <el...@cloudera.com> wrote:
>> > On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
>> >> On Tue, Aug 2, 2011 at 3:47 PM, Jeffrey Naisbitt
>><jn...@yahoo-inc.com> wrote:
>> >>> On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>> >>>> Regarding adding the 'target/generated-src/test/java' dir to the
>>build path.
>> >>>> You are correct, you have to add it manually to your IDE (I use
>> >>>> IntelliJ and it is the same story). But unless you need to debug
>> >>>> through the generated code you don't need to do so (doing a 'mvn
>> >>>> test -DskipTests' will generate/compile the class and the .class
>> >>>> file will be in the IDE project classpath).
>> >>>
>> >>> I like to debug through the code :)  It would be nice if there were
>> >>> an automated way to handle that folder, but in the meantime, it
>> >>> would probably be useful to document that along with the eclipse
>>instructions.
>> >>
>> >> I had to do this step too. I've added it to the instructions on
>> >> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it
>> >> would be nice to automate this if anyone knows the relevant setting.
>> >>
>> >
>> > Using helios when I follow these instructions, selecting the top-level
>> > Hadoop directory as the root directory, just gives me MapReduceTools
>> > as the only project (no hadoop-annotations, hadoop-assemblies, and
>> > hadoop-common, etc.)   Do these instructions work for anyone else?
>> >
>>
>> Never mind, was missing the mvn eclipse:eclipse step.

Re: getting started building Mavenized hadoop common

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Joep,

For using a settings.xml in other location than ~/.m2 use:

mvn --settings /home/foo/mysettings.xml ....

For using a local cache in other location than ~/.m2/repository use:

mvn -Dmaven.repo.local ...

Hope this helps.

Alejandro

On Fri, Aug 5, 2011 at 9:43 AM, Rottinghuis, Joep <jr...@ebay.com> wrote:
>
> My ~/.m2/settings.xml has an activeProfile defined for an internal Maven repo to be able to resolve 0.22 snapshots internally.
>
> When trying the
> mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
> step this failed because it tried to resolve 0.23-SNAPSHOT against our internal repo.
>
> After first running
> mvn install -DskipTests
> the problem was resolved.
>
> This makes me wonder if it is possible to pass something like -Dmaven.repo.local or -Dsettings.localRepository on the command-line?
> It would also be nice if I can specify different active profiles per build through properties.
> Right now that all seems to come from the same shared ~/.m2/settings.xml.
>
> Cheers,
>
> Joep
>
> -----Original Message-----
> From: Eli Collins [mailto:eli@cloudera.com]
> Sent: Thursday, August 04, 2011 1:41 PM
> To: general@hadoop.apache.org
> Subject: Re: getting started building Mavenized hadoop common
>
> On Thu, Aug 4, 2011 at 1:38 PM, Eli Collins <el...@cloudera.com> wrote:
> > On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
> >> On Tue, Aug 2, 2011 at 3:47 PM, Jeffrey Naisbitt <jn...@yahoo-inc.com> wrote:
> >>> On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >>>> Regarding adding the 'target/generated-src/test/java' dir to the build path.
> >>>> You are correct, you have to add it manually to your IDE (I use
> >>>> IntelliJ and it is the same story). But unless you need to debug
> >>>> through the generated code you don't need to do so (doing a 'mvn
> >>>> test -DskipTests' will generate/compile the class and the .class
> >>>> file will be in the IDE project classpath).
> >>>
> >>> I like to debug through the code :)  It would be nice if there were
> >>> an automated way to handle that folder, but in the meantime, it
> >>> would probably be useful to document that along with the eclipse instructions.
> >>
> >> I had to do this step too. I've added it to the instructions on
> >> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it
> >> would be nice to automate this if anyone knows the relevant setting.
> >>
> >
> > Using helios when I follow these instructions, selecting the top-level
> > Hadoop directory as the root directory, just gives me MapReduceTools
> > as the only project (no hadoop-annotations, hadoop-assemblies, and
> > hadoop-common, etc.)   Do these instructions work for anyone else?
> >
>
> Never mind, was missing the mvn eclipse:eclipse step.

RE: getting started building Mavenized hadoop common

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.

My ~/.m2/settings.xml has an activeProfile defined for an internal Maven repo to be able to resolve 0.22 snapshots internally.

When trying the
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
step this failed because it tried to resolve 0.23-SNAPSHOT against our internal repo. 

After first running 
mvn install -DskipTests
the problem was resolved.

This makes me wonder if it is possible to pass something like -Dmaven.repo.local or -Dsettings.localRepository on the command-line?
It would also be nice if I can specify different active profiles per build through properties.
Right now that all seems to come from the same shared ~/.m2/settings.xml.

Cheers,

Joep

-----Original Message-----
From: Eli Collins [mailto:eli@cloudera.com] 
Sent: Thursday, August 04, 2011 1:41 PM
To: general@hadoop.apache.org
Subject: Re: getting started building Mavenized hadoop common

On Thu, Aug 4, 2011 at 1:38 PM, Eli Collins <el...@cloudera.com> wrote:
> On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
>> On Tue, Aug 2, 2011 at 3:47 PM, Jeffrey Naisbitt <jn...@yahoo-inc.com> wrote:
>>> On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>>> Regarding adding the 'target/generated-src/test/java' dir to the build path.
>>>> You are correct, you have to add it manually to your IDE (I use 
>>>> IntelliJ and it is the same story). But unless you need to debug 
>>>> through the generated code you don't need to do so (doing a 'mvn 
>>>> test -DskipTests' will generate/compile the class and the .class 
>>>> file will be in the IDE project classpath).
>>>
>>> I like to debug through the code :)  It would be nice if there were 
>>> an automated way to handle that folder, but in the meantime, it 
>>> would probably be useful to document that along with the eclipse instructions.
>>
>> I had to do this step too. I've added it to the instructions on 
>> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it 
>> would be nice to automate this if anyone knows the relevant setting.
>>
>
> Using helios when I follow these instructions, selecting the top-level 
> Hadoop directory as the root directory, just gives me MapReduceTools 
> as the only project (no hadoop-annotations, hadoop-assemblies, and 
> hadoop-common, etc.)   Do these instructions work for anyone else?
>

Never mind, was missing the mvn eclipse:eclipse step.

Re: getting started building Mavenized hadoop common

Posted by Eli Collins <el...@cloudera.com>.

On Thu, Aug 4, 2011 at 1:38 PM, Eli Collins <el...@cloudera.com> wrote:
> On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
>> On Tue, Aug 2, 2011 at 3:47 PM, Jeffrey Naisbitt <jn...@yahoo-inc.com> wrote:
>>> On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>>> Regarding adding the 'target/generated-src/test/java' dir to the build path.
>>>> You are correct, you have to add it manually to your IDE (I use IntelliJ and
>>>> it is the same story). But unless you need to debug through the generated
>>>> code you don't need to do so (doing a 'mvn test -DskipTests' will
>>>> generate/compile the class and the .class file will be in the IDE project
>>>> classpath).
>>>
>>> I like to debug through the code :)  It would be nice if there were an
>>> automated way to handle that folder, but in the meantime, it would probably
>>> be useful to document that along with the eclipse instructions.
>>
>> I had to do this step too. I've added it to the instructions on
>> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it would
>> be nice to automate this if anyone knows the relevant setting.
>>
>
> Using helios when I follow these instructions, selecting the top-level
> Hadoop directory as the root directory, just gives me MapReduceTools
> as the only project (no hadoop-annotations, hadoop-assemblies, and
> hadoop-common, etc.)   Do these instructions work for anyone else?
>

Never mind, was missing the mvn eclipse:eclipse step.

Re: getting started building Mavenized hadoop common

Posted by Eli Collins <el...@cloudera.com>.

On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
> On Tue, Aug 2, 2011 at 3:47 PM, Jeffrey Naisbitt <jn...@yahoo-inc.com> wrote:
>> On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>> Regarding adding the 'target/generated-src/test/java' dir to the build path.
>>> You are correct, you have to add it manually to your IDE (I use IntelliJ and
>>> it is the same story). But unless you need to debug through the generated
>>> code you don't need to do so (doing a 'mvn test -DskipTests' will
>>> generate/compile the class and the .class file will be in the IDE project
>>> classpath).
>>
>> I like to debug through the code :)  It would be nice if there were an
>> automated way to handle that folder, but in the meantime, it would probably
>> be useful to document that along with the eclipse instructions.
>
> I had to do this step too. I've added it to the instructions on
> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it would
> be nice to automate this if anyone knows the relevant setting.
>

Using helios when I follow these instructions, selecting the top-level
Hadoop directory as the root directory, just gives me MapReduceTools
as the only project (no hadoop-annotations, hadoop-assemblies, and
hadoop-common, etc.)   Do these instructions work for anyone else?

Thanks,
Eli

Re: getting started building Mavenized hadoop common

Posted by Ted Dunning <td...@maprtech.com>.

That is a nice answer.

On Thu, Aug 4, 2011 at 6:33 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> Can we make it a separate maven project.  Not a separate tar but something
> closer to the hadoop-annotations.  That way if nothing has changed or the
> developer does not have the tools to rebuild protocol buffers then maven can
> download the jar/source from the maven repo.  If the developer does change
> it then they can rebuild and install it as needed.
>
> --Bobby Evans
>
> On 8/4/11 6:38 AM, "Steve Loughran" <st...@apache.org> wrote:
>
> On 03/08/11 02:41, Ted Dunning wrote:
> > (the following discusses religious practices ... please don't break into
> > flames)
> >
> > In the past, the simplest approach I have seen for dealing with this is
> to
> > simply put the generated code under the normal source dir and check it
> in.
> >   This is particularly handy with Thrift since it is common for users of
> the
> > code to not have a working version of the Thrift compiler.  I then have
> an
> > optional profile that does the code generation.  In my cases, I made that
> > profile conditional on a thrift compiler being found, but there are other
> > reasonable strategies.  I did the code generation by generating into a
> temp
> > dir and then copying the code into the source tree so that if the
> generation
> > failed, no code was changed.
> >
> > The nice side effect is that IDE's see the generated code as first class
> > code.
> >
> > Many consider various aspects of this style to be bad practice.  Some
> > condemn checking in generated code as akin to checking in jars.   I kind
> of
> > agree, but lack of thrift or javacc is common enough that it really has
> to
> > be dealt with by checking these in somewhere.  Only if your code
> generator
> > really is ubiquitous is it feasible not to check in generated code.
>
> The problem with this approach is that SVN will often say "it's changed"
> when it hasn't. You can do some tricks with Ant using the <copy>
> operation and only copy if they really are different, though once the
> generator adds a timestamp to the header you are in trouble, and you
> have to look at the diffs to see if anything really has changed. I've
> had this problem in the past with Hibernate generated stuff.
>
>
> > Others consider the commingling of generated an "real" code in the same
> > directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> > strongly condemn the use of a single directory for generated and
> > non-generated code, but if all directories avoid such miscegenation, then
> I
> > don't see this as much of a problem.  Most people recognize that a
> package
> > with a name "generated" will contain generated code.
> >
>
> I'd prefer to generate the stuff in the same tree, in a subdir, with
> .svnignore set up to never commit the source. That way it's all in the
> same tree, but you can't check it in. This keeps the source there even
> when you rm -rf build, but keep it out of SCM
>
>

Re: getting started building Mavenized hadoop common

Posted by Scott Carey <sc...@richrelevance.com>.

I believe Eclipse will pick these up with the m2Eclipse plugin if you
right click on the project and "Maven > Update Project Configuration".

For whatever reason, it does not add extra source directories that are
generated by plugins when the project is imported.




On 8/4/11 10:32 AM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:

>Are the concerns about generated code not addressed already with the
>latest
>trunk?
>
>The generated code is created under target/generated-sources/ and
>target/generated-test-sources/
>
>IntelliJ and Netbeans pick up those directories automatically at project
>import time (if the existed - you run 'mvn test -DskipTests' before
>importing)
>
>Eclipse seems to have a bug and does not add these directories
>automatically
>(you have to add them as source roots manually).
>
>Thanks.
>
>Alejandro
>
>On Thu, Aug 4, 2011 at 10:09 AM, Andrew Bayer
><an...@gmail.com>wrote:
>
>> +1, for what it's worth - that seems like the right way to handle this
>>sort
>> of thing to me.
>>
>> A.
>>
>> On Thu, Aug 4, 2011 at 6:33 AM, Robert Evans <ev...@yahoo-inc.com>
>>wrote:
>>
>> > Can we make it a separate maven project.  Not a separate tar but
>> something
>> > closer to the hadoop-annotations.  That way if nothing has changed or
>>the
>> > developer does not have the tools to rebuild protocol buffers then
>>maven
>> can
>> > download the jar/source from the maven repo.  If the developer does
>> change
>> > it then they can rebuild and install it as needed.
>> >
>> > --Bobby Evans
>> >
>> > On 8/4/11 6:38 AM, "Steve Loughran" <st...@apache.org> wrote:
>> >
>> > On 03/08/11 02:41, Ted Dunning wrote:
>> > > (the following discusses religious practices ... please don't break
>> into
>> > > flames)
>> > >
>> > > In the past, the simplest approach I have seen for dealing with
>>this is
>> > to
>> > > simply put the generated code under the normal source dir and check
>>it
>> > in.
>> > >   This is particularly handy with Thrift since it is common for
>>users
>> of
>> > the
>> > > code to not have a working version of the Thrift compiler.  I then
>>have
>> > an
>> > > optional profile that does the code generation.  In my cases, I made
>> that
>> > > profile conditional on a thrift compiler being found, but there are
>> other
>> > > reasonable strategies.  I did the code generation by generating
>>into a
>> > temp
>> > > dir and then copying the code into the source tree so that if the
>> > generation
>> > > failed, no code was changed.
>> > >
>> > > The nice side effect is that IDE's see the generated code as first
>> class
>> > > code.
>> > >
>> > > Many consider various aspects of this style to be bad practice.
>>Some
>> > > condemn checking in generated code as akin to checking in jars.   I
>> kind
>> > of
>> > > agree, but lack of thrift or javacc is common enough that it really
>>has
>> > to
>> > > be dealt with by checking these in somewhere.  Only if your code
>> > generator
>> > > really is ubiquitous is it feasible not to check in generated code.
>> >
>> > The problem with this approach is that SVN will often say "it's
>>changed"
>> > when it hasn't. You can do some tricks with Ant using the <copy>
>> > operation and only copy if they really are different, though once the
>> > generator adds a timestamp to the header you are in trouble, and you
>> > have to look at the diffs to see if anything really has changed. I've
>> > had this problem in the past with Hibernate generated stuff.
>> >
>> >
>> > > Others consider the commingling of generated an "real" code in the
>>same
>> > > directory tree to be a mortal sin.  I agree, but in a lesser form.
>>I
>> > > strongly condemn the use of a single directory for generated and
>> > > non-generated code, but if all directories avoid such miscegenation,
>> then
>> > I
>> > > don't see this as much of a problem.  Most people recognize that a
>> > package
>> > > with a name "generated" will contain generated code.
>> > >
>> >
>> > I'd prefer to generate the stuff in the same tree, in a subdir, with
>> > .svnignore set up to never commit the source. That way it's all in the
>> > same tree, but you can't check it in. This keeps the source there even
>> > when you rm -rf build, but keep it out of SCM
>> >
>> >
>>

Re: getting started building Mavenized hadoop common

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Are the concerns about generated code not addressed already with the latest
trunk?

The generated code is created under target/generated-sources/ and
target/generated-test-sources/

IntelliJ and Netbeans pick up those directories automatically at project
import time (if the existed - you run 'mvn test -DskipTests' before
importing)

Eclipse seems to have a bug and does not add these directories automatically
(you have to add them as source roots manually).

Thanks.

Alejandro

On Thu, Aug 4, 2011 at 10:09 AM, Andrew Bayer <an...@gmail.com>wrote:

> +1, for what it's worth - that seems like the right way to handle this sort
> of thing to me.
>
> A.
>
> On Thu, Aug 4, 2011 at 6:33 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
>
> > Can we make it a separate maven project.  Not a separate tar but
> something
> > closer to the hadoop-annotations.  That way if nothing has changed or the
> > developer does not have the tools to rebuild protocol buffers then maven
> can
> > download the jar/source from the maven repo.  If the developer does
> change
> > it then they can rebuild and install it as needed.
> >
> > --Bobby Evans
> >
> > On 8/4/11 6:38 AM, "Steve Loughran" <st...@apache.org> wrote:
> >
> > On 03/08/11 02:41, Ted Dunning wrote:
> > > (the following discusses religious practices ... please don't break
> into
> > > flames)
> > >
> > > In the past, the simplest approach I have seen for dealing with this is
> > to
> > > simply put the generated code under the normal source dir and check it
> > in.
> > >   This is particularly handy with Thrift since it is common for users
> of
> > the
> > > code to not have a working version of the Thrift compiler.  I then have
> > an
> > > optional profile that does the code generation.  In my cases, I made
> that
> > > profile conditional on a thrift compiler being found, but there are
> other
> > > reasonable strategies.  I did the code generation by generating into a
> > temp
> > > dir and then copying the code into the source tree so that if the
> > generation
> > > failed, no code was changed.
> > >
> > > The nice side effect is that IDE's see the generated code as first
> class
> > > code.
> > >
> > > Many consider various aspects of this style to be bad practice.  Some
> > > condemn checking in generated code as akin to checking in jars.   I
> kind
> > of
> > > agree, but lack of thrift or javacc is common enough that it really has
> > to
> > > be dealt with by checking these in somewhere.  Only if your code
> > generator
> > > really is ubiquitous is it feasible not to check in generated code.
> >
> > The problem with this approach is that SVN will often say "it's changed"
> > when it hasn't. You can do some tricks with Ant using the <copy>
> > operation and only copy if they really are different, though once the
> > generator adds a timestamp to the header you are in trouble, and you
> > have to look at the diffs to see if anything really has changed. I've
> > had this problem in the past with Hibernate generated stuff.
> >
> >
> > > Others consider the commingling of generated an "real" code in the same
> > > directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> > > strongly condemn the use of a single directory for generated and
> > > non-generated code, but if all directories avoid such miscegenation,
> then
> > I
> > > don't see this as much of a problem.  Most people recognize that a
> > package
> > > with a name "generated" will contain generated code.
> > >
> >
> > I'd prefer to generate the stuff in the same tree, in a subdir, with
> > .svnignore set up to never commit the source. That way it's all in the
> > same tree, but you can't check it in. This keeps the source there even
> > when you rm -rf build, but keep it out of SCM
> >
> >
>

Re: getting started building Mavenized hadoop common

Posted by Andrew Bayer <an...@gmail.com>.

+1, for what it's worth - that seems like the right way to handle this sort
of thing to me.

A.

On Thu, Aug 4, 2011 at 6:33 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> Can we make it a separate maven project.  Not a separate tar but something
> closer to the hadoop-annotations.  That way if nothing has changed or the
> developer does not have the tools to rebuild protocol buffers then maven can
> download the jar/source from the maven repo.  If the developer does change
> it then they can rebuild and install it as needed.
>
> --Bobby Evans
>
> On 8/4/11 6:38 AM, "Steve Loughran" <st...@apache.org> wrote:
>
> On 03/08/11 02:41, Ted Dunning wrote:
> > (the following discusses religious practices ... please don't break into
> > flames)
> >
> > In the past, the simplest approach I have seen for dealing with this is
> to
> > simply put the generated code under the normal source dir and check it
> in.
> >   This is particularly handy with Thrift since it is common for users of
> the
> > code to not have a working version of the Thrift compiler.  I then have
> an
> > optional profile that does the code generation.  In my cases, I made that
> > profile conditional on a thrift compiler being found, but there are other
> > reasonable strategies.  I did the code generation by generating into a
> temp
> > dir and then copying the code into the source tree so that if the
> generation
> > failed, no code was changed.
> >
> > The nice side effect is that IDE's see the generated code as first class
> > code.
> >
> > Many consider various aspects of this style to be bad practice.  Some
> > condemn checking in generated code as akin to checking in jars.   I kind
> of
> > agree, but lack of thrift or javacc is common enough that it really has
> to
> > be dealt with by checking these in somewhere.  Only if your code
> generator
> > really is ubiquitous is it feasible not to check in generated code.
>
> The problem with this approach is that SVN will often say "it's changed"
> when it hasn't. You can do some tricks with Ant using the <copy>
> operation and only copy if they really are different, though once the
> generator adds a timestamp to the header you are in trouble, and you
> have to look at the diffs to see if anything really has changed. I've
> had this problem in the past with Hibernate generated stuff.
>
>
> > Others consider the commingling of generated an "real" code in the same
> > directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> > strongly condemn the use of a single directory for generated and
> > non-generated code, but if all directories avoid such miscegenation, then
> I
> > don't see this as much of a problem.  Most people recognize that a
> package
> > with a name "generated" will contain generated code.
> >
>
> I'd prefer to generate the stuff in the same tree, in a subdir, with
> .svnignore set up to never commit the source. That way it's all in the
> same tree, but you can't check it in. This keeps the source there even
> when you rm -rf build, but keep it out of SCM
>
>

RE: getting started building Mavenized hadoop common

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.

Have been resisting the temptation to jump in on this, but cannot help myself now.

Downloading the source from Maven if source cannot be generated sounds like a better approach to me than comitting the source itself and trying to generate on top of that.
One can even commit a tarball with the sources and expand when the proper setup is not present.

It seems we'd like the following:
1) Developers with full setup can generate source from scratch
2) Developers with partial setup can still see source in their IDE.
3) Keep it easy to prevent generated source from getting checked in.

If we do end up committing source, then at least keep it in a separate directory clearly marked as such ("something-generated").
That will not only help the human from even trying to modify the source, but also make cleanup a simpler and cleaner operation.
Mixing typed and generated source into one directory tree (even with .svnignore and .gitignore) is not a good idea in my experience.

If we produce generated java source files without actually generating them (whether directly committed or pulled from elsewhere) would we still compile those sources and use them?
In other words, what happens if developers do end up making code changes to the generated files? Will those changes get used, or get ignored?

In this respect it would be better to have a jar with sources and let the developer browse through source code that way.

Cheers,

Joep

________________________________________
From: Robert Evans [evans@yahoo-inc.com]
Sent: Thursday, August 04, 2011 6:33 AM
To: general@hadoop.apache.org
Subject: Re: getting started building Mavenized hadoop common

Can we make it a separate maven project.  Not a separate tar but something closer to the hadoop-annotations.  That way if nothing has changed or the developer does not have the tools to rebuild protocol buffers then maven can download the jar/source from the maven repo.  If the developer does change it then they can rebuild and install it as needed.

--Bobby Evans

On 8/4/11 6:38 AM, "Steve Loughran" <st...@apache.org> wrote:

On 03/08/11 02:41, Ted Dunning wrote:
> (the following discusses religious practices ... please don't break into
> flames)
>
> In the past, the simplest approach I have seen for dealing with this is to
> simply put the generated code under the normal source dir and check it in.
>   This is particularly handy with Thrift since it is common for users of the
> code to not have a working version of the Thrift compiler.  I then have an
> optional profile that does the code generation.  In my cases, I made that
> profile conditional on a thrift compiler being found, but there are other
> reasonable strategies.  I did the code generation by generating into a temp
> dir and then copying the code into the source tree so that if the generation
> failed, no code was changed.
>
> The nice side effect is that IDE's see the generated code as first class
> code.
>
> Many consider various aspects of this style to be bad practice.  Some
> condemn checking in generated code as akin to checking in jars.   I kind of
> agree, but lack of thrift or javacc is common enough that it really has to
> be dealt with by checking these in somewhere.  Only if your code generator
> really is ubiquitous is it feasible not to check in generated code.

The problem with this approach is that SVN will often say "it's changed"
when it hasn't. You can do some tricks with Ant using the <copy>
operation and only copy if they really are different, though once the
generator adds a timestamp to the header you are in trouble, and you
have to look at the diffs to see if anything really has changed. I've
had this problem in the past with Hibernate generated stuff.

> Others consider the commingling of generated an "real" code in the same
> directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> strongly condemn the use of a single directory for generated and
> non-generated code, but if all directories avoid such miscegenation, then I
> don't see this as much of a problem.  Most people recognize that a package
> with a name "generated" will contain generated code.
>

I'd prefer to generate the stuff in the same tree, in a subdir, with
.svnignore set up to never commit the source. That way it's all in the
same tree, but you can't check it in. This keeps the source there even
when you rm -rf build, but keep it out of SCM

Re: getting started building Mavenized hadoop common

Posted by Robert Evans <ev...@yahoo-inc.com>.

Can we make it a separate maven project.  Not a separate tar but something closer to the hadoop-annotations.  That way if nothing has changed or the developer does not have the tools to rebuild protocol buffers then maven can download the jar/source from the maven repo.  If the developer does change it then they can rebuild and install it as needed.

--Bobby Evans

On 8/4/11 6:38 AM, "Steve Loughran" <st...@apache.org> wrote:

On 03/08/11 02:41, Ted Dunning wrote:
> (the following discusses religious practices ... please don't break into
> flames)
>
> In the past, the simplest approach I have seen for dealing with this is to
> simply put the generated code under the normal source dir and check it in.
>   This is particularly handy with Thrift since it is common for users of the
> code to not have a working version of the Thrift compiler.  I then have an
> optional profile that does the code generation.  In my cases, I made that
> profile conditional on a thrift compiler being found, but there are other
> reasonable strategies.  I did the code generation by generating into a temp
> dir and then copying the code into the source tree so that if the generation
> failed, no code was changed.
>
> The nice side effect is that IDE's see the generated code as first class
> code.
>
> Many consider various aspects of this style to be bad practice.  Some
> condemn checking in generated code as akin to checking in jars.   I kind of
> agree, but lack of thrift or javacc is common enough that it really has to
> be dealt with by checking these in somewhere.  Only if your code generator
> really is ubiquitous is it feasible not to check in generated code.

The problem with this approach is that SVN will often say "it's changed"
when it hasn't. You can do some tricks with Ant using the <copy>
operation and only copy if they really are different, though once the
generator adds a timestamp to the header you are in trouble, and you
have to look at the diffs to see if anything really has changed. I've
had this problem in the past with Hibernate generated stuff.

> Others consider the commingling of generated an "real" code in the same
> directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> strongly condemn the use of a single directory for generated and
> non-generated code, but if all directories avoid such miscegenation, then I
> don't see this as much of a problem.  Most people recognize that a package
> with a name "generated" will contain generated code.
>

I'd prefer to generate the stuff in the same tree, in a subdir, with
.svnignore set up to never commit the source. That way it's all in the
same tree, but you can't check it in. This keeps the source there even
when you rm -rf build, but keep it out of SCM

Re: getting started building Mavenized hadoop common

Posted by Steve Loughran <st...@apache.org>.

On 03/08/11 02:41, Ted Dunning wrote:
> (the following discusses religious practices ... please don't break into
> flames)
>
> In the past, the simplest approach I have seen for dealing with this is to
> simply put the generated code under the normal source dir and check it in.
>   This is particularly handy with Thrift since it is common for users of the
> code to not have a working version of the Thrift compiler.  I then have an
> optional profile that does the code generation.  In my cases, I made that
> profile conditional on a thrift compiler being found, but there are other
> reasonable strategies.  I did the code generation by generating into a temp
> dir and then copying the code into the source tree so that if the generation
> failed, no code was changed.
>
> The nice side effect is that IDE's see the generated code as first class
> code.
>
> Many consider various aspects of this style to be bad practice.  Some
> condemn checking in generated code as akin to checking in jars.   I kind of
> agree, but lack of thrift or javacc is common enough that it really has to
> be dealt with by checking these in somewhere.  Only if your code generator
> really is ubiquitous is it feasible not to check in generated code.

The problem with this approach is that SVN will often say "it's changed" 
when it hasn't. You can do some tricks with Ant using the <copy> 
operation and only copy if they really are different, though once the 
generator adds a timestamp to the header you are in trouble, and you 
have to look at the diffs to see if anything really has changed. I've 
had this problem in the past with Hibernate generated stuff.


> Others consider the commingling of generated an "real" code in the same
> directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> strongly condemn the use of a single directory for generated and
> non-generated code, but if all directories avoid such miscegenation, then I
> don't see this as much of a problem.  Most people recognize that a package
> with a name "generated" will contain generated code.
>

I'd prefer to generate the stuff in the same tree, in a subdir, with 
.svnignore set up to never commit the source. That way it's all in the 
same tree, but you can't check it in. This keeps the source there even 
when you rm -rf build, but keep it out of SCM

Re: getting started building Mavenized hadoop common

Posted by Scott Carey <sc...@richrelevance.com>.

You can put the generated code in its own maven module, and push
snapshots.  Then the project that relies on it does not have to build it,
and only those that need to change how things are generated need to be
able to build it.

There are drawbacks to that approach of course.

On 8/3/11 3:33 PM, "Milind.Bhandarkar@emc.com" <Mi...@emc.com>
wrote:

>I had this exact same argument with Arun Murthy at the hadoop dev meeting
>about checking in protobuf generated code in the MR-279 branch. That same
>evening, he had to download entire xcode and install it on his new Mac to
>build the MR-279 branch :-)
>
>Hadoop recordio also follows the same approach. It checks in the generated
>code, and if javacc is found (I.e. Javacc.home != ""), it generates
>parser.
>
>Arun's concern was that someone might accidentally check in changes to
>generated code (because they used a different version of protobuf e.g.)
>But isn't there some way to flag these changes ?
>
>- Milind
>
>---
>Milind Bhandarkar
>Greenplum Labs, EMC
>((Disclaimer: Opinions expressed in this email are those of the author,
>and do
>not necessarily represent the views of any organization, past or present,
>the author might be affiliated with.)
>
>
>
>
>On 8/2/11 6:41 PM, "Ted Dunning" <td...@maprtech.com> wrote:
>
>>(the following discusses religious practices ... please don't break into
>>flames)
>>
>>In the past, the simplest approach I have seen for dealing with this is
>>to
>>simply put the generated code under the normal source dir and check it
>>in.
>> This is particularly handy with Thrift since it is common for users of
>>the
>>code to not have a working version of the Thrift compiler.  I then have
>>an
>>optional profile that does the code generation.  In my cases, I made that
>>profile conditional on a thrift compiler being found, but there are other
>>reasonable strategies.  I did the code generation by generating into a
>>temp
>>dir and then copying the code into the source tree so that if the
>>generation
>>failed, no code was changed.
>>
>>The nice side effect is that IDE's see the generated code as first class
>>code.
>>
>>Many consider various aspects of this style to be bad practice.  Some
>>condemn checking in generated code as akin to checking in jars.   I kind
>>of
>>agree, but lack of thrift or javacc is common enough that it really has
>>to
>>be dealt with by checking these in somewhere.  Only if your code
>>generator
>>really is ubiquitous is it feasible not to check in generated code.
>>
>>Others consider the commingling of generated an "real" code in the same
>>directory tree to be a mortal sin.  I agree, but in a lesser form.  I
>>strongly condemn the use of a single directory for generated and
>>non-generated code, but if all directories avoid such miscegenation, then
>>I
>>don't see this as much of a problem.  Most people recognize that a
>>package
>>with a name "generated" will contain generated code.
>>
>>On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
>>
>>> > I like to debug through the code :)  It would be nice if there were
>>>an
>>> > automated way to handle that folder, but in the meantime, it would
>>> probably
>>> > be useful to document that along with the eclipse instructions.
>>>
>>> I had to do this step too. I've added it to the instructions on
>>> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it would
>>> be nice to automate this if anyone knows the relevant setting.
>>>
>

Re: getting started building Mavenized hadoop common

Posted by Mi...@emc.com.

I had this exact same argument with Arun Murthy at the hadoop dev meeting
about checking in protobuf generated code in the MR-279 branch. That same
evening, he had to download entire xcode and install it on his new Mac to
build the MR-279 branch :-)

Hadoop recordio also follows the same approach. It checks in the generated
code, and if javacc is found (I.e. Javacc.home != ""), it generates parser.

Arun's concern was that someone might accidentally check in changes to
generated code (because they used a different version of protobuf e.g.)
But isn't there some way to flag these changes ?

- Milind

---
Milind Bhandarkar
Greenplum Labs, EMC
((Disclaimer: Opinions expressed in this email are those of the author,
and do
not necessarily represent the views of any organization, past or present,
the author might be affiliated with.)




On 8/2/11 6:41 PM, "Ted Dunning" <td...@maprtech.com> wrote:

>(the following discusses religious practices ... please don't break into
>flames)
>
>In the past, the simplest approach I have seen for dealing with this is to
>simply put the generated code under the normal source dir and check it in.
> This is particularly handy with Thrift since it is common for users of
>the
>code to not have a working version of the Thrift compiler.  I then have an
>optional profile that does the code generation.  In my cases, I made that
>profile conditional on a thrift compiler being found, but there are other
>reasonable strategies.  I did the code generation by generating into a
>temp
>dir and then copying the code into the source tree so that if the
>generation
>failed, no code was changed.
>
>The nice side effect is that IDE's see the generated code as first class
>code.
>
>Many consider various aspects of this style to be bad practice.  Some
>condemn checking in generated code as akin to checking in jars.   I kind
>of
>agree, but lack of thrift or javacc is common enough that it really has to
>be dealt with by checking these in somewhere.  Only if your code generator
>really is ubiquitous is it feasible not to check in generated code.
>
>Others consider the commingling of generated an "real" code in the same
>directory tree to be a mortal sin.  I agree, but in a lesser form.  I
>strongly condemn the use of a single directory for generated and
>non-generated code, but if all directories avoid such miscegenation, then
>I
>don't see this as much of a problem.  Most people recognize that a package
>with a name "generated" will contain generated code.
>
>On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:
>
>> > I like to debug through the code :)  It would be nice if there were an
>> > automated way to handle that folder, but in the meantime, it would
>> probably
>> > be useful to document that along with the eclipse instructions.
>>
>> I had to do this step too. I've added it to the instructions on
>> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it would
>> be nice to automate this if anyone knows the relevant setting.
>>

Re: getting started building Mavenized hadoop common

Posted by Ted Dunning <td...@maprtech.com>.

(the following discusses religious practices ... please don't break into
flames)

In the past, the simplest approach I have seen for dealing with this is to
simply put the generated code under the normal source dir and check it in.
 This is particularly handy with Thrift since it is common for users of the
code to not have a working version of the Thrift compiler.  I then have an
optional profile that does the code generation.  In my cases, I made that
profile conditional on a thrift compiler being found, but there are other
reasonable strategies.  I did the code generation by generating into a temp
dir and then copying the code into the source tree so that if the generation
failed, no code was changed.

The nice side effect is that IDE's see the generated code as first class
code.

Many consider various aspects of this style to be bad practice.  Some
condemn checking in generated code as akin to checking in jars.   I kind of
agree, but lack of thrift or javacc is common enough that it really has to
be dealt with by checking these in somewhere.  Only if your code generator
really is ubiquitous is it feasible not to check in generated code.

Others consider the commingling of generated an "real" code in the same
directory tree to be a mortal sin.  I agree, but in a lesser form.  I
strongly condemn the use of a single directory for generated and
non-generated code, but if all directories avoid such miscegenation, then I
don't see this as much of a problem.  Most people recognize that a package
with a name "generated" will contain generated code.

On Tue, Aug 2, 2011 at 5:44 PM, Tom White <to...@cloudera.com> wrote:

> > I like to debug through the code :)  It would be nice if there were an
> > automated way to handle that folder, but in the meantime, it would
> probably
> > be useful to document that along with the eclipse instructions.
>
> I had to do this step too. I've added it to the instructions on
> http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it would
> be nice to automate this if anyone knows the relevant setting.
>

Re: getting started building Mavenized hadoop common

Posted by Tom White <to...@cloudera.com>.

On Tue, Aug 2, 2011 at 3:47 PM, Jeffrey Naisbitt <jn...@yahoo-inc.com> wrote:
> On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>> Regarding adding the 'target/generated-src/test/java' dir to the build path.
>> You are correct, you have to add it manually to your IDE (I use IntelliJ and
>> it is the same story). But unless you need to debug through the generated
>> code you don't need to do so (doing a 'mvn test -DskipTests' will
>> generate/compile the class and the .class file will be in the IDE project
>> classpath).
>
> I like to debug through the code :)  It would be nice if there were an
> automated way to handle that folder, but in the meantime, it would probably
> be useful to document that along with the eclipse instructions.

I had to do this step too. I've added it to the instructions on
http://wiki.apache.org/hadoop/EclipseEnvironment, but I agree it would
be nice to automate this if anyone knows the relevant setting.

>
>
>> Regarding MAVEN_HOME, I don't have it in my environment and the build works.
> I was referring to running test-patch.sh.  (test-patch.sh now requires the
> MAVEN_HOME setting).  I'm ok if this is a requirement to run it now (like
> ANT_HOME was required before), but it should probably be mentioned somewhere
> since I didn't think about it :)

I've updated http://wiki.apache.org/hadoop/HowToContribute to mention
MAVEN_HOME.

Thanks,
Tom

>
>> Regarding the test-patch.sh issues, was it working for you prior to
>> HADOOP-6671? Because we didn't change that line. Also, keep in mind that the
>> injection fault tests are not wired yet.
>
> I don't remember seeing those before, but yeah, it doesn't look like your
> patch touched anything to do with them.  They seem to be ignored anyway.
>
> Thanks again...great work :)
> -Jeff
>
>
>
>
>> On Tue, Aug 2, 2011 at 2:55 PM, Jeffrey Naisbitt
>> <jn...@yahoo-inc.com>wrote:
>>
>>> Thanks for all your work and updates on this, Alejandro!  It's much
>>> better/easier to work with :)
>>>
>>> I did have a few issues/questions:
>>> First, I still had to manually add 'target/generated-src/test/java' to my
>>> build path sources in eclipse.  I don't know if this is due to something I
>>> did wrong, but I would think this should be automatic.
>>>
>>> Also, I ran into a few issues with the test-patch.sh script:
>>> First, it will fail if MAVEN_HOME is not set, and I didn't see anything
>>> about that in documentation.
>>> Also, it gives a couple of non-critical errors:
>>> ./dev-support/test-patch.sh: line 578: auxwww: command not found
>>> ./dev-support/test-patch.sh: line 578: /usr/bin/nawk: No such file or
>>> directory
>>> The first is because $PS is not set (and was previously passed in for the
>>> HUDSON version), and the second is just because my box doesn¹t have nawk on
>>> it.
>>>
>>> Thanks again.
>>> -Jeff
>>>
>>>
>>> On 8/2/11 4:13 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>>
>>>> With the HADOOP-6671 commit the way of building hadoop common has changed
>>>> significantly.
>>>>
>>>> While the wiki explains these changes, and there is a BUILDING.txt
>>>> directory, still I guess things will hit many of of you.
>>>>
>>>> Because of this I've put together some brief notes.
>>>>
>>>> Thanks.
>>>>
>>>> Alejandro
>>>>
>>>> ----------------------
>>>> NEW LAYOUT
>>>>
>>>> After updating the trunk you'll the the following directory changes at
>>> top
>>>> level
>>>>
>>>>  Removed: common/
>>>>  New: hadoop-common/, hadoop-project/, hadoop-annotations/,
>>>> hadoop-assemblies/
>>>>
>>>> * hadoop-common/ is the new common/ and its sub-dirs are organized
>>> following
>>>> Maven standard project layout.
>>>> * hadoop-project/ contains Hadoop project root POM, all dependency
>>> versions
>>>> are defined there
>>>> * hadoop-annotations/ contains the Hadoop public/private annotation
>>> classes
>>>> * hadoop-assemblies/ contains the assembly files that create the
>>>> distribution directories layout
>>>>
>>>> ----------------------
>>>> BUILDING REQUIREMENTS
>>>>
>>>> The only new build requirement is Maven 3 (it must be at least Maven 3).
>>>>
>>>> The environment var FORREST_HOME must be set if building the
>>> documentation.
>>>>
>>>> ----------------------
>>>> FIRST MAVEN BUILD
>>>>
>>>> It must be run from the trunk/ directory.
>>>>
>>>> Run: 'mvn install -DskipTests'
>>>>
>>>> This will install the different submodules
>>>> (project/annotations/assemblies/common) into the local Maven cache
>>>> (~/.m2/repository).
>>>>
>>>> After this is done, you can build from the hadoop-commons directory.
>>>>
>>>> NOTE: this will not be required once the SNAPSHOTS Maven repo has the
>>>> snapshots published.
>>>> ----------------------
>>>> TARGET/ IS THE NEW BUILD/
>>>>
>>>> The new build directory is target/
>>>>
>>>> ----------------------
>>>> USING AN IDE
>>>>
>>>> Eclipse and IntelliJ will recognize and open the project from the POM
>>> file.
>>>>
>>>> Make sure you run a 'mvn test -DskipTests' every time you have a clean
>>>> target/ directory as Maven generates code required for testing and sets
>>> some
>>>> directories under target/
>>>>
>>>> ----------------------
>>>> BUILDING
>>>>
>>>> Run 'mvn compile'
>>>>
>>>> To compile native code add '-Pnative'
>>>>
>>>> ----------------------
>>>> RUNNING TESTCASES
>>>>
>>>> Run 'mvn test -Dtest=TESTCASECLASS'
>>>>
>>>> To run multiple testcases separate the testcases name with comma
>>>>
>>>> To run all testcases don't specifiy -'Dtest=...'
>>>>
>>>> NOTE: TESTCASECLASS is just the testcase classname, no package name, no
>>>> extension.
>>>>
>>>> ----------------------
>>>> CREATING THE TAR
>>>>
>>>> Run 'mvn package -Pbintar -DskipTests'
>>>>
>>>> NOTE: The '-Ptar' profile will create the legacy layout, but the Hadoop
>>>> scripts will not work with the legacy layout (this has been the case
>>> before
>>>>  HADOOP-6671)
>>>>
>>>> ----------------------
>>>> RUNNING THE HADOOP SCRIPTS IN DEVELOPMENT
>>>>
>>>> Run 'mvn package -Pbintar -DskipTests'
>>>>
>>>> The Hadoop scripts can be executed from
>>>> hadoop-common/target/hadoop-common-0.23.0-SNAPSHOT-bin/bin/ directory.
>>>>
>>>> ----------------------
>>>
>>>
>
>

Re: getting started building Mavenized hadoop common

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Luke,

Let me try that, thanks for the tip.

Alejandro

On Tue, Aug 2, 2011 at 7:09 PM, Luke Lu <ll...@vicaya.com> wrote:

> On Tue, Aug 2, 2011 at 3:21 PM, Alejandro Abdelnur <tu...@cloudera.com>
> wrote:
> > Jeffrey,
> >
> > Thanks.
> >
> > Regarding adding the 'target/generated-src/test/java' dir to the build
> path.
> > You are correct, you have to add it manually to your IDE (I use IntelliJ
> and
> > it is the same story). But unless you need to debug through the generated
> > code you don't need to do so (doing a 'mvn test -DskipTests' will
> > generate/compile the class and the .class file will be in the IDE project
> > classpath).
>
> Well, the canonical maven generated sources directory is
> "generated-sources" instead of "generated-src". Can you try the patch
> on HADOOP-7502 (which works for me) to see if it works for you?
>
> __Luke
>

Re: getting started building Mavenized hadoop common

Posted by Luke Lu <ll...@vicaya.com>.

On Tue, Aug 2, 2011 at 3:21 PM, Alejandro Abdelnur <tu...@cloudera.com> wrote:
> Jeffrey,
>
> Thanks.
>
> Regarding adding the 'target/generated-src/test/java' dir to the build path.
> You are correct, you have to add it manually to your IDE (I use IntelliJ and
> it is the same story). But unless you need to debug through the generated
> code you don't need to do so (doing a 'mvn test -DskipTests' will
> generate/compile the class and the .class file will be in the IDE project
> classpath).

Well, the canonical maven generated sources directory is
"generated-sources" instead of "generated-src". Can you try the patch
on HADOOP-7502 (which works for me) to see if it works for you?

__Luke

Re: getting started building Mavenized hadoop common

Posted by Jeffrey Naisbitt <jn...@yahoo-inc.com>.

On 8/2/11 5:21 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> Regarding adding the 'target/generated-src/test/java' dir to the build path.
> You are correct, you have to add it manually to your IDE (I use IntelliJ and
> it is the same story). But unless you need to debug through the generated
> code you don't need to do so (doing a 'mvn test -DskipTests' will
> generate/compile the class and the .class file will be in the IDE project
> classpath).

I like to debug through the code :)  It would be nice if there were an
automated way to handle that folder, but in the meantime, it would probably
be useful to document that along with the eclipse instructions.


> Regarding MAVEN_HOME, I don't have it in my environment and the build works.
I was referring to running test-patch.sh.  (test-patch.sh now requires the
MAVEN_HOME setting).  I'm ok if this is a requirement to run it now (like
ANT_HOME was required before), but it should probably be mentioned somewhere
since I didn't think about it :)

> Regarding the test-patch.sh issues, was it working for you prior to
> HADOOP-6671? Because we didn't change that line. Also, keep in mind that the
> injection fault tests are not wired yet.

I don't remember seeing those before, but yeah, it doesn't look like your
patch touched anything to do with them.  They seem to be ignored anyway.

Thanks again...great work :)
-Jeff




> On Tue, Aug 2, 2011 at 2:55 PM, Jeffrey Naisbitt
> <jn...@yahoo-inc.com>wrote:
> 
>> Thanks for all your work and updates on this, Alejandro!  It's much
>> better/easier to work with :)
>> 
>> I did have a few issues/questions:
>> First, I still had to manually add 'target/generated-src/test/java' to my
>> build path sources in eclipse.  I don't know if this is due to something I
>> did wrong, but I would think this should be automatic.
>> 
>> Also, I ran into a few issues with the test-patch.sh script:
>> First, it will fail if MAVEN_HOME is not set, and I didn't see anything
>> about that in documentation.
>> Also, it gives a couple of non-critical errors:
>> ./dev-support/test-patch.sh: line 578: auxwww: command not found
>> ./dev-support/test-patch.sh: line 578: /usr/bin/nawk: No such file or
>> directory
>> The first is because $PS is not set (and was previously passed in for the
>> HUDSON version), and the second is just because my box doesn¹t have nawk on
>> it.
>> 
>> Thanks again.
>> -Jeff
>> 
>> 
>> On 8/2/11 4:13 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>> 
>>> With the HADOOP-6671 commit the way of building hadoop common has changed
>>> significantly.
>>> 
>>> While the wiki explains these changes, and there is a BUILDING.txt
>>> directory, still I guess things will hit many of of you.
>>> 
>>> Because of this I've put together some brief notes.
>>> 
>>> Thanks.
>>> 
>>> Alejandro
>>> 
>>> ----------------------
>>> NEW LAYOUT
>>> 
>>> After updating the trunk you'll the the following directory changes at
>> top
>>> level
>>> 
>>>  Removed: common/
>>>  New: hadoop-common/, hadoop-project/, hadoop-annotations/,
>>> hadoop-assemblies/
>>> 
>>> * hadoop-common/ is the new common/ and its sub-dirs are organized
>> following
>>> Maven standard project layout.
>>> * hadoop-project/ contains Hadoop project root POM, all dependency
>> versions
>>> are defined there
>>> * hadoop-annotations/ contains the Hadoop public/private annotation
>> classes
>>> * hadoop-assemblies/ contains the assembly files that create the
>>> distribution directories layout
>>> 
>>> ----------------------
>>> BUILDING REQUIREMENTS
>>> 
>>> The only new build requirement is Maven 3 (it must be at least Maven 3).
>>> 
>>> The environment var FORREST_HOME must be set if building the
>> documentation.
>>> 
>>> ----------------------
>>> FIRST MAVEN BUILD
>>> 
>>> It must be run from the trunk/ directory.
>>> 
>>> Run: 'mvn install -DskipTests'
>>> 
>>> This will install the different submodules
>>> (project/annotations/assemblies/common) into the local Maven cache
>>> (~/.m2/repository).
>>> 
>>> After this is done, you can build from the hadoop-commons directory.
>>> 
>>> NOTE: this will not be required once the SNAPSHOTS Maven repo has the
>>> snapshots published.
>>> ----------------------
>>> TARGET/ IS THE NEW BUILD/
>>> 
>>> The new build directory is target/
>>> 
>>> ----------------------
>>> USING AN IDE
>>> 
>>> Eclipse and IntelliJ will recognize and open the project from the POM
>> file.
>>> 
>>> Make sure you run a 'mvn test -DskipTests' every time you have a clean
>>> target/ directory as Maven generates code required for testing and sets
>> some
>>> directories under target/
>>> 
>>> ----------------------
>>> BUILDING
>>> 
>>> Run 'mvn compile'
>>> 
>>> To compile native code add '-Pnative'
>>> 
>>> ----------------------
>>> RUNNING TESTCASES
>>> 
>>> Run 'mvn test -Dtest=TESTCASECLASS'
>>> 
>>> To run multiple testcases separate the testcases name with comma
>>> 
>>> To run all testcases don't specifiy -'Dtest=...'
>>> 
>>> NOTE: TESTCASECLASS is just the testcase classname, no package name, no
>>> extension.
>>> 
>>> ----------------------
>>> CREATING THE TAR
>>> 
>>> Run 'mvn package -Pbintar -DskipTests'
>>> 
>>> NOTE: The '-Ptar' profile will create the legacy layout, but the Hadoop
>>> scripts will not work with the legacy layout (this has been the case
>> before
>>>  HADOOP-6671)
>>> 
>>> ----------------------
>>> RUNNING THE HADOOP SCRIPTS IN DEVELOPMENT
>>> 
>>> Run 'mvn package -Pbintar -DskipTests'
>>> 
>>> The Hadoop scripts can be executed from
>>> hadoop-common/target/hadoop-common-0.23.0-SNAPSHOT-bin/bin/ directory.
>>> 
>>> ----------------------
>> 
>>

Re: getting started building Mavenized hadoop common

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

Jeffrey,

Thanks.

Regarding adding the 'target/generated-src/test/java' dir to the build path.
You are correct, you have to add it manually to your IDE (I use IntelliJ and
it is the same story). But unless you need to debug through the generated
code you don't need to do so (doing a 'mvn test -DskipTests' will
generate/compile the class and the .class file will be in the IDE project
classpath).

Regarding MAVEN_HOME, I don't have it in my environment and the build works.

Regarding the test-patch.sh issues, was it working for you prior to
HADOOP-6671? Because we didn't change that line. Also, keep in mind that the
injection fault tests are not wired yet.

Thanks.

Alejandro

On Tue, Aug 2, 2011 at 2:55 PM, Jeffrey Naisbitt <jn...@yahoo-inc.com>wrote:

> Thanks for all your work and updates on this, Alejandro!  It's much
> better/easier to work with :)
>
> I did have a few issues/questions:
> First, I still had to manually add 'target/generated-src/test/java' to my
> build path sources in eclipse.  I don't know if this is due to something I
> did wrong, but I would think this should be automatic.
>
> Also, I ran into a few issues with the test-patch.sh script:
> First, it will fail if MAVEN_HOME is not set, and I didn't see anything
> about that in documentation.
> Also, it gives a couple of non-critical errors:
> ./dev-support/test-patch.sh: line 578: auxwww: command not found
> ./dev-support/test-patch.sh: line 578: /usr/bin/nawk: No such file or
> directory
> The first is because $PS is not set (and was previously passed in for the
> HUDSON version), and the second is just because my box doesn¹t have nawk on
> it.
>
> Thanks again.
> -Jeff
>
>
> On 8/2/11 4:13 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>
> > With the HADOOP-6671 commit the way of building hadoop common has changed
> > significantly.
> >
> > While the wiki explains these changes, and there is a BUILDING.txt
> > directory, still I guess things will hit many of of you.
> >
> > Because of this I've put together some brief notes.
> >
> > Thanks.
> >
> > Alejandro
> >
> > ----------------------
> > NEW LAYOUT
> >
> > After updating the trunk you'll the the following directory changes at
> top
> > level
> >
> >  Removed: common/
> >  New: hadoop-common/, hadoop-project/, hadoop-annotations/,
> > hadoop-assemblies/
> >
> > * hadoop-common/ is the new common/ and its sub-dirs are organized
> following
> > Maven standard project layout.
> > * hadoop-project/ contains Hadoop project root POM, all dependency
> versions
> > are defined there
> > * hadoop-annotations/ contains the Hadoop public/private annotation
> classes
> > * hadoop-assemblies/ contains the assembly files that create the
> > distribution directories layout
> >
> > ----------------------
> > BUILDING REQUIREMENTS
> >
> > The only new build requirement is Maven 3 (it must be at least Maven 3).
> >
> > The environment var FORREST_HOME must be set if building the
> documentation.
> >
> > ----------------------
> > FIRST MAVEN BUILD
> >
> > It must be run from the trunk/ directory.
> >
> > Run: 'mvn install -DskipTests'
> >
> > This will install the different submodules
> > (project/annotations/assemblies/common) into the local Maven cache
> > (~/.m2/repository).
> >
> > After this is done, you can build from the hadoop-commons directory.
> >
> > NOTE: this will not be required once the SNAPSHOTS Maven repo has the
> > snapshots published.
> > ----------------------
> > TARGET/ IS THE NEW BUILD/
> >
> > The new build directory is target/
> >
> > ----------------------
> > USING AN IDE
> >
> > Eclipse and IntelliJ will recognize and open the project from the POM
> file.
> >
> > Make sure you run a 'mvn test -DskipTests' every time you have a clean
> > target/ directory as Maven generates code required for testing and sets
> some
> > directories under target/
> >
> > ----------------------
> > BUILDING
> >
> > Run 'mvn compile'
> >
> > To compile native code add '-Pnative'
> >
> > ----------------------
> > RUNNING TESTCASES
> >
> > Run 'mvn test -Dtest=TESTCASECLASS'
> >
> > To run multiple testcases separate the testcases name with comma
> >
> > To run all testcases don't specifiy -'Dtest=...'
> >
> > NOTE: TESTCASECLASS is just the testcase classname, no package name, no
> > extension.
> >
> > ----------------------
> > CREATING THE TAR
> >
> > Run 'mvn package -Pbintar -DskipTests'
> >
> > NOTE: The '-Ptar' profile will create the legacy layout, but the Hadoop
> > scripts will not work with the legacy layout (this has been the case
> before
> >  HADOOP-6671)
> >
> > ----------------------
> > RUNNING THE HADOOP SCRIPTS IN DEVELOPMENT
> >
> > Run 'mvn package -Pbintar -DskipTests'
> >
> > The Hadoop scripts can be executed from
> > hadoop-common/target/hadoop-common-0.23.0-SNAPSHOT-bin/bin/ directory.
> >
> > ----------------------
>
>

Re: getting started building Mavenized hadoop common

Posted by Jeffrey Naisbitt <jn...@yahoo-inc.com>.

Thanks for all your work and updates on this, Alejandro!  It's much
better/easier to work with :)

I did have a few issues/questions:
First, I still had to manually add 'target/generated-src/test/java' to my
build path sources in eclipse.  I don't know if this is due to something I
did wrong, but I would think this should be automatic.

Also, I ran into a few issues with the test-patch.sh script:
First, it will fail if MAVEN_HOME is not set, and I didn't see anything
about that in documentation.
Also, it gives a couple of non-critical errors:
./dev-support/test-patch.sh: line 578: auxwww: command not found
./dev-support/test-patch.sh: line 578: /usr/bin/nawk: No such file or
directory
The first is because $PS is not set (and was previously passed in for the
HUDSON version), and the second is just because my box doesn¹t have nawk on
it.

Thanks again.
-Jeff


On 8/2/11 4:13 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:

> With the HADOOP-6671 commit the way of building hadoop common has changed
> significantly.
> 
> While the wiki explains these changes, and there is a BUILDING.txt
> directory, still I guess things will hit many of of you.
> 
> Because of this I've put together some brief notes.
> 
> Thanks.
> 
> Alejandro
> 
> ----------------------
> NEW LAYOUT
> 
> After updating the trunk you'll the the following directory changes at top
> level
> 
>  Removed: common/
>  New: hadoop-common/, hadoop-project/, hadoop-annotations/,
> hadoop-assemblies/
> 
> * hadoop-common/ is the new common/ and its sub-dirs are organized following
> Maven standard project layout.
> * hadoop-project/ contains Hadoop project root POM, all dependency versions
> are defined there
> * hadoop-annotations/ contains the Hadoop public/private annotation classes
> * hadoop-assemblies/ contains the assembly files that create the
> distribution directories layout
> 
> ----------------------
> BUILDING REQUIREMENTS
> 
> The only new build requirement is Maven 3 (it must be at least Maven 3).
> 
> The environment var FORREST_HOME must be set if building the documentation.
> 
> ----------------------
> FIRST MAVEN BUILD
> 
> It must be run from the trunk/ directory.
> 
> Run: 'mvn install -DskipTests'
> 
> This will install the different submodules
> (project/annotations/assemblies/common) into the local Maven cache
> (~/.m2/repository).
> 
> After this is done, you can build from the hadoop-commons directory.
> 
> NOTE: this will not be required once the SNAPSHOTS Maven repo has the
> snapshots published.
> ----------------------
> TARGET/ IS THE NEW BUILD/
> 
> The new build directory is target/
> 
> ----------------------
> USING AN IDE
> 
> Eclipse and IntelliJ will recognize and open the project from the POM file.
> 
> Make sure you run a 'mvn test -DskipTests' every time you have a clean
> target/ directory as Maven generates code required for testing and sets some
> directories under target/
> 
> ----------------------
> BUILDING
> 
> Run 'mvn compile'
> 
> To compile native code add '-Pnative'
> 
> ----------------------
> RUNNING TESTCASES
> 
> Run 'mvn test -Dtest=TESTCASECLASS'
> 
> To run multiple testcases separate the testcases name with comma
> 
> To run all testcases don't specifiy -'Dtest=...'
> 
> NOTE: TESTCASECLASS is just the testcase classname, no package name, no
> extension.
> 
> ----------------------
> CREATING THE TAR
> 
> Run 'mvn package -Pbintar -DskipTests'
> 
> NOTE: The '-Ptar' profile will create the legacy layout, but the Hadoop
> scripts will not work with the legacy layout (this has been the case before
>  HADOOP-6671)
> 
> ----------------------
> RUNNING THE HADOOP SCRIPTS IN DEVELOPMENT
> 
> Run 'mvn package -Pbintar -DskipTests'
> 
> The Hadoop scripts can be executed from
> hadoop-common/target/hadoop-common-0.23.0-SNAPSHOT-bin/bin/ directory.
> 
> ----------------------