You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Giridharan Kesavan (JIRA)" <ji...@apache.org> on 2010/04/01 13:10:27 UTC

[jira] Created: (HADOOP-6671) To use maven for hadoop common builds

To use maven for hadoop common builds
-------------------------------------

                 Key: HADOOP-6671
                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
             Project: Hadoop Common
          Issue Type: Improvement
          Components: build
    Affects Versions: 0.22.0
            Reporter: Giridharan Kesavan


We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
Drawbacks with the current approach:
* Use ivy for dependency management with ivy.xml
* Use maven-ant-task for artifact publishing to the maven repository
* pom files are not generated dynamically 

To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.

I would like to have a branch created to work on mavenizing  hadoop common.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "E. Sammer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852939#action_12852939 ] 

E. Sammer commented on HADOOP-6671:
-----------------------------------

So much of the build system today "acts as" maven it might as well be maven. Having some consistency in the way the individual projects are built would be extremely nice. Maven is as painful as every other build tool we've seen but at least its pain comes in a consistent, known, quantity. The tooling support tends to be better due to the predictable nature of the the build process. The surrounding infrastructure (such as Hudson) is already mvn aware. The main problem with mvn tends to be errant dependencies in poms but this can be avoided (see the SpringSource Ivy / Maven repositories for an example of *very* well maintained metadata - http://www.springsource.com/repository/app/). +1

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Lars Francke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853146#action_12853146 ] 

Lars Francke commented on HADOOP-6671:
--------------------------------------

I don't have a lot of insight into the Hadoop Common project itself but I've done a lot of the work on the recent HBase transition together with Paul Smith and Kay Kay and would like to offer my help if needed.

There are basically two ways to go: Either shuffle around a lot of directories to conform to the standard Maven layout or to override the Maven defaults and keep the current layout. Both ways are actually very painful but I prefer the first one. While that means a lot of work and a lot of swearing it also means that you'll get a consistent layout (with most other Maven projects) and the configuration is a lot easier. Some tools tend(ed) to not work properly with the non standard layout (strictly speaking this are bugs in the tools/plugins). You basically have to do a lot of work up front but after that it shouldn't be too hard to maintain.

The directory moving/renaming unfortunately tends not to work too well with Subversion and branches (or I was doing it wrong) so I don't know how big the benefit would be to start a new one for doing this.

What Paul Smith has done in HBASE-2099 is to provide a script that contained all the necessary commands (svn mv...) to finish the move (there are a couple more changes in other tickets) and a patch containing the .pom files. This has been much improved since and we've learned a lot from it. We are in fact still tweaking it and will change the pom structure once again to rely on the common Apache parent pom in addition to a few smaller fixes that are still outstanding.

I'd propose a similar structure as HBase now has: A pom in the main directory with common information (packaging "pom") and then two modules ("core" and "contrib"). So the src/contrib directory would move to the top-level as would the docs directory (btw: I've never used Forrest with Maven but if Ant can do it Maven should be able to do it too). The final tarball as currently (trunk) created by ant tar should be easily reproducible by Maven (I couldn't find where most of the conribs ended up, are those currently missung from the .tar?)

Paul Smith has indicated interest in this and I'd be interested and willing to help too if needed. But if you've already got someone to do the work I'm not going to complain :) It'd just be a shame to duplicate work here as the moving around of stuff in SVN is painful work if that's the way you decide to go.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Paul Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854315#action_12854315 ] 

Paul Smith commented on HADOOP-6671:
------------------------------------

Lars beat me to the jira comment, so I'll just say "Yeah, what Lars said".

Happy to put my hand up to help, rather than a branch, I'd say a simply script like what was in HBASE-2099 is simple to work with for a reviewer, it outlines the migration steps needed rather than some hideous patch to review.

In regards to AllanW's comments on sync'ing things around, that is still possible, rather than sync'ng the .ivy directory it's just ensuring the ~/.m2/repository directories are in sync, and then working in Maven's offline mode if that server doesn't have internet connectivity.

Yes, one has to get the POM right, and that does come with experience, so perhaps if Lars and I can help here to get it off in the right direction that can ease any potential pain.    IntelliJ and Eclipse Maven support is now 1st-class citizens really, Ivy less so.    For future modularization, the Maven migration will pay off, splitting out code into nice modular chunks becomes much less work keeping the build system in sync.

Anyway, happy to help out here too.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852419#action_12852419 ] 

Allen Wittenauer commented on HADOOP-6671:
------------------------------------------

They don't have access to the Internet.  They do have internal network access. And, no, I'm not willing to setup a service to build code.  I'm sure Maven is a great idea if you have a ton of resources (mainly time and effort).  I don't.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852399#action_12852399 ] 

Allen Wittenauer commented on HADOOP-6671:
------------------------------------------

My initial thoughts is that every time someone tinkers with either ivy or maven or the build process in general, it always seems to get worse and worse for those of us that build on non-Internet connected machines. This sounds like another movement towards more pain.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Patrick Angeles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852409#action_12852409 ] 

Patrick Angeles commented on HADOOP-6671:
-----------------------------------------

Giri,

I like the idea, and I'll pitch in however I can.

@Allen

You can very easily set up an internal proxy for Maven. I'm assuming your build machines have some kind of network access to get to the source code...

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852413#action_12852413 ] 

Allen Wittenauer commented on HADOOP-6671:
------------------------------------------

No, they don't.  

I bulid on my desktop mac, let ivy download all of its stuff, then scp .ivy2 and the other stuff that ivy pulls in to my real build machine.  So your assumption is false.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852423#action_12852423 ] 

Steve Loughran commented on HADOOP-6671:
----------------------------------------

I've had bad experiences with M2 in the past; these colour my opinions. I don't know how much maven2 has improved since then. 

What I do have to deal with on a regular basis, even today, is people who write POM files who get the dependencies correct for their own build and test, but which screw up everyone else downstream. Recent logging JARs are an example. Accordingly, I view a POM file as an artifact for downstream users that you have to get right, not just some internal thing, as we can do today with ivy and ant files.

This means saying "we should move to maven to eliminate having ivy.xml and POM files" isn't a good enough reason for me. If it improves testing, build times,-even to reduce ivy and ant xml maintenance costs, then yes , but not just "because you can".

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852569#action_12852569 ] 

Allen Wittenauer commented on HADOOP-6671:
------------------------------------------

Owen and Lee have told me that offline builds will have basically the same amount of pain that they do now.  I'd be happier with less pain, but same is acceptable.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Lee Tucker (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852490#action_12852490 ] 

Lee Tucker commented on HADOOP-6671:
------------------------------------

What it does do is provide a plugin architecture for a whole range of tools instead of having to roll each one independently into the build.xml.   It provides project structure and actually improves inter-project communication by managing transitive build dependencies.    Given that we already rely on net access (and maven repositories) to pull things through Ivy for dependencies, I'm not exactly sure how Maven makes that any worse.   You had to fill a cache somehow already.

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854577#action_12854577 ] 

Doug Cutting commented on HADOOP-6671:
--------------------------------------

> The directory moving/renaming unfortunately tends not to work too well with Subversion and branches (or I was doing it wrong) so I don't know how big the benefit would be to start a new one for doing this.

Yes, if the branch is at all long-lived it will require lots of merges to keep it up to date, and such merges will not be easy.

In my experience tree reorganizations are easier to develop as:
 - a shell script that contains a sequence of 'svn mv' commands
 - a patch file to be applied after the script has run

For example, AVRO-163 was developed this way.


> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Patrick Angeles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852416#action_12852416 ] 

Patrick Angeles commented on HADOOP-6671:
-----------------------------------------

How do you scp without network access?

> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6671) To use maven for hadoop common builds

Posted by "Giridharan Kesavan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852432#action_12852432 ] 

Giridharan Kesavan commented on HADOOP-6671:
--------------------------------------------

@Allen 
you can use mvn to do offline builds by passing -o argument 

@Steve 
Apart from maintaining ivy.xml and pom.xml, by having the pom file it reduces the build scripts maintenance. 
As the pom file has got standards for doing any kind of build stuff and its easy to maintain. 


> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to manage dependencies, publish artifacts and have one single xml file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.