You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (Created) (JIRA)" <ji...@apache.org> on 2011/11/08 21:39:51 UTC

[jira] [Created] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Create a single 'hadoop-mapreduce' Maven artifact
-------------------------------------------------

                 Key: MAPREDUCE-3378
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: build
    Affects Versions: 0.23.0
            Reporter: Tom White


In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172815#comment-13172815 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3378:
----------------------------------------------------

IIUC, this is only for convenience. As long as the single artifact is only to make simpler for those who need everything, we should be fine. We need the fine grained artifacts for those who want to selectively include modules. Like yarn+DistributedShell as an example app outside of mapreduce.
                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Alejandro Abdelnur (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146633#comment-13146633 ] 

Alejandro Abdelnur commented on MAPREDUCE-3378:
-----------------------------------------------

Another issue I'm facing with the multiple JARs is that the test JARs when included for testing do not pull test scope dependencies, thus I have to include all of them one by one.

Maybe a solution would be to have a hadoop-mapreduce-test artifact that includes all necessary deps to run testcases in compile mode.
                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Alejandro Abdelnur (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172833#comment-13172833 ] 

Alejandro Abdelnur commented on MAPREDUCE-3378:
-----------------------------------------------

Regarding my prev second bullet item, it seems via a classifier this is possible ( http://maven.apache.org/plugins/maven-shade-plugin/examples/attached-artifact.html ), still this is kind of uncommon for commonly used artifacts.
                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Tom White (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173747#comment-13173747 ] 

Tom White commented on MAPREDUCE-3378:
--------------------------------------

Scott, thanks for the great feedback! To summarize, we should do the following (highest priority first):

1. Make sure that the transitive dependencies are correct for the artifacts that we publish.
2. Publish client API JARs and document how to use them.
3. Possibly publish a 'fat jar' for all of Hadoop, but it should have a different classifier.

Currently in 0.23 the situation for 2 is that hadoop-mapreduce-client-core contains the API, hadoop-mapreduce-client-common contains the local job runner, and hadoop-mapreduce-client-jobclient contains the YARN client, but they all pull in too many dependencies (1). I've just noticed that hadoop-mapreduce-client-core depends on HDFS classes, which isn't right.
                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172817#comment-13172817 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3378:
----------------------------------------------------

Also, because the targeted artifact is a hadoop-all jar, can this be moved to common? Thanks!
                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Allen Wittenauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172793#comment-13172793 ] 

Allen Wittenauer commented on MAPREDUCE-3378:
---------------------------------------------

So we broke them apart so that we can merge them all again?
                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Tom White (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White resolved MAPREDUCE-3378.
----------------------------------

    Resolution: Won't Fix

I've opened HADOOP-8278 to track 1. HADOOP-8009 addressed 2. So I'm closing this JIRA now.


                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Alejandro Abdelnur (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172822#comment-13172822 ] 

Alejandro Abdelnur commented on MAPREDUCE-3378:
-----------------------------------------------

About repackaging multiple JARs into a single one, I see the following issues with this:

* Developers unknowingly adding (other versions) of the grouped JARs to the classpath.
* AFAIK there is not POM for the aggregate JARs with all the correct dependencies.

IMO the root issue is that we are not using dependencies correctly. 

There should be a hadoop-client that allows me to code and run HDFS/MR client apps (with the exact set of transitive dependencies, ie you don't need jetty stuff there).

There should be a hadoop-test that allows me to run run HDFS/MR minicluster for integration testing.

The fact that under the hood these 'hadoop-client' & 'hadoop-test' component pull 1 or 100 hadoop JARs is irrelevant (although IMO I think we have too many JARs).






                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173437#comment-13173437 ] 

Scott Carey commented on MAPREDUCE-3378:
----------------------------------------

{quote}IMO the root issue is that we are not using dependencies correctly. {quote}

Absolutely.  Hadoop's dependency setup is absolutely atrocious in 0.20.205 and 0.22  I haven't looked at 0.23 in enough detail yet but would love the situation to be fixed.

I have a project that needs to read and write from HDFS.  Declaring hadoop pulls in all of Jetty, the tomcat compiler, and a dozen other jars that I manually  have to exclude.

The above needs to be avoided for mapreduce.

Building larger jars that package dependencies in them is OK for some use cases but absolutely worthless for any real application that has any chance of dependency conflict.  Things like Jetty should be marked as provided not compile scope (or perhaps optional).

{quote}There should be a hadoop-client that allows me to code and run HDFS/MR client apps (with the exact set of transitive dependencies, ie you don't need jetty stuff there).{quote}

:-D  YES!
IMO, we need an hdfs-api.jar and mapreduce-api.jar that pull in only what is needed to build an application that uses those APIs as a client.  A user should be able to declare those in their project, and have only the transitive dependencies needed for those use cases pulled in, and nothing extra.  One could even go to the extreme of having a mapred-api.jar and mapreduce-api.jar with the old and new apis separated (and a mapreduce-common-api.jar they both depend on) if that was a bigger use case.  More modularization will be a great benefit to users, when combined with using dependencies properly in hadoop itself. 

{quote}
The fact that under the hood these 'hadoop-client' & 'hadoop-test' component pull 1 or 100 hadoop JARs is irrelevant (although IMO I think we have too many JARs).
{quote}

Yes, if the artifacts are configured properly with the right dependencies in the correct scope (e.g. jetty in provided scope since only one trying to run the framework needs it, not clients) then there is only one artifact to declare for each use.  It is not the total number of jars, it is the total _size_ of jars that matters.  Finer grained control of dependencies by users is a good thing.  As a user I want to declare what I need as simply as possible ("I need to launch a mini-mr during test, so I need hadoop-mr-test.jar"; "I need to submit a job to a cluster, so I need mr-client.jar"), what that means behind the scenes in total jar count of transitive dependencies is a different issue entirely.  As long as this pulls in only what is needed and not useless baggage (jetty, tomcat's compiler, etc).

There is no need to package 'fat jars' unless you wish to have a single artifact for uses where tooling does not build the classpath for you.

{quote}
Regarding my prev second bullet item, it seems via a classifier this is possible ( http://maven.apache.org/plugins/maven-shade-plugin/examples/attached-artifact.html ), still this is kind of uncommon for commonly used artifacts.
{quote}

I support using an attached artifact with a classifier for any jars containing dependencies.  It is an anti-pattern to put a jar with dependencies into a maven repo as the primary artifact however (unless you move those dependencies into a private scope to avoid conflicts).


                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3378) Create a single 'hadoop-mapreduce' Maven artifact

Posted by "Tom White (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3378:
---------------------------------

    Attachment: MAPREDUCE-3378.patch

Here's a patch that does something slightly more general: it creates a hadoop-all JAR that's convenient for users to consume. E.g. I tested with the following Maven dependency in another project:

{noformat}
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-all</artifactId>
  <version>0.24.0-SNAPSHOT</version>
</dependency>
{noformat}

                
> Create a single 'hadoop-mapreduce' Maven artifact
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3378
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3378
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.23.0
>            Reporter: Tom White
>         Attachments: MAPREDUCE-3378.patch
>
>
> In 0.23.0 there are multiple artifacts (hadoop-mapreduce-client-app, hadoop-mapreduce-client-common, hadoop-mapreduce-client-core, etc). It would be simpler for users to declare a dependency on hadoop-mapreduce (much like there's hadoop-common and hadoop-hdfs). (This would also be a step towards MAPREDUCE-2600.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira