You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/08/12 23:20:16 UTC

How do I add Hadoop dependency to a Maven project?

I'm building a Hadoop project using Maven. I want to add
Maven dependencies to my project. What do I do?

I think the answer is I add a <dependency></dependency> section to my .POM
file, but I'm not sure what the contents of this section (groupId,
artifactId etc.) should be. Googling does not turn up a clear answer. Is
there a canonical Hadoop Maven dependency specification?

Re: How do I add Hadoop dependency to a Maven project?

Posted by "W.P. McNeill" <bi...@gmail.com>.
More experimenting:

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>0.20.203.0</version>
            <type>POM</type>
        </dependency>

Works, but (as you indicate) gives the old Hadoop API.

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapred</artifactId>
            <version>...</version>
            <type>POM</type>
        </dependency>

Doesn't work. I can't find a hadoop-mapred artifact when I search for one on
Maven Central <http://search.maven.org/>.


On Fri, Aug 12, 2011 at 2:47 PM, Luke Lu <ll...@apache.org> wrote:

> Pre-0.21 (sustaining releases, large-scale tested)  hadoop:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-core</artifactId>
>  <version>0.20.203.0</version>
> </dependency>
>
> Pre-0.23 (small scale tested) hadoop:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-mapred</artifactId>
>  <version>...</version>
> </dependency>
>
> Trunk (currently targeting 0.23.0, large-scale tested) hadoop WILL be:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-mapreduce</artifactId>
>  <version>...</version>
> </dependency>
>
> On Fri, Aug 12, 2011 at 2:20 PM, W.P. McNeill <bi...@gmail.com> wrote:
> > I'm building a Hadoop project using Maven. I want to add
> > Maven dependencies to my project. What do I do?
> >
> > I think the answer is I add a <dependency></dependency> section to my
> .POM
> > file, but I'm not sure what the contents of this section (groupId,
> > artifactId etc.) should be. Googling does not turn up a clear answer. Is
> > there a canonical Hadoop Maven dependency specification?
> >
>

Re: How do I add Hadoop dependency to a Maven project?

Posted by "W.P. McNeill" <bi...@gmail.com>.
The versioning issue was a red herring. My problem turned out to the the
fact that I had a <type>POM</type> in my Hadoop dependencies section, which
was causing the JAR files not to be downloaded. I now have this project
building.

Other people trying to set up a simple Hadoop project in Maven can use Word
Count Test Adapter <https://github.com/wpm/WordCountTestAdapter> as an
example.

Thanks everybody for your help.

On Tue, Aug 16, 2011 at 10:29 AM, Joey Echeverria <jo...@cloudera.com> wrote:

> If you're talking about the org.apache.hadoop.mapreduce.* API, that
> was introduced in 0.20.0. There should be no need to use the 0.21
> version.
>
> -Joey
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: How do I add Hadoop dependency to a Maven project?

Posted by Joey Echeverria <jo...@cloudera.com>.
If you're talking about the org.apache.hadoop.mapreduce.* API, that
was introduced in 0.20.0. There should be no need to use the 0.21
version.

-Joey

On Tue, Aug 16, 2011 at 1:22 PM, W.P. McNeill <bi...@gmail.com> wrote:
> Here is my specific problem:
>
> I have a sample word count Hadoop program up on github (
> https://github.com/wpm/WordCountTestAdapter) that illustrates unit testing
> techniques for Hadoop. This code uses the new API. (On my development
> machine I'm using version 0.20.2) I want to use Maven for its build
> framework because that seems like the way the Java world is going. Currently
> the pom.xml for this project makes no mention of Hadoop. If you try to do a
> "mvn install" you get the errors I described earlier. I want to change this
> project so that "mvn install" builds it.
>
> I can find the pre-0.21 (old API) hadoop-core JARs on
> http://mvnrepository.com, but I can't find the post-0.21 (new API)
> hadoop-mapred here. Do I need to add another Maven repository server to get
> the new API JARs?
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: How do I add Hadoop dependency to a Maven project?

Posted by "W.P. McNeill" <bi...@gmail.com>.
Here is my specific problem:

I have a sample word count Hadoop program up on github (
https://github.com/wpm/WordCountTestAdapter) that illustrates unit testing
techniques for Hadoop. This code uses the new API. (On my development
machine I'm using version 0.20.2) I want to use Maven for its build
framework because that seems like the way the Java world is going. Currently
the pom.xml for this project makes no mention of Hadoop. If you try to do a
"mvn install" you get the errors I described earlier. I want to change this
project so that "mvn install" builds it.

I can find the pre-0.21 (old API) hadoop-core JARs on
http://mvnrepository.com, but I can't find the post-0.21 (new API)
hadoop-mapred here. Do I need to add another Maven repository server to get
the new API JARs?

Re: How do I add Hadoop dependency to a Maven project?

Posted by "W.P. McNeill" <bi...@gmail.com>.
I'm trying to build a simple Hadoop word count application. I have the
following pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>wpmcn</groupId>
    <artifactId>WordCountTestAdapter</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>WordCountTestAdapter</name>
    <url>http://maven.apache.org</url>
    <properties>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop</artifactId>
            <version>0.22.0</version>
            <type>POM</type>
        </dependency>
    </dependencies>
</project>

When I run "mvn install" I see the following error:

[ERROR] Failed to execute goal on project WordCountTestAdapter: Could not
resolve dependencies for project
wpmcn:WordCountTestAdapter:jar:1.0-SNAPSHOT: Could not find artifact
org.apache.hadoop:hadoop:POM:0.22.0 in central (
http://repo1.maven.org/maven2) -> [Help 1]

I've tried various different things in the hadoop entry to no avail.

This is a vanilla Maven 3 install which works fine for building simple
non-Hadoop Hello World applications, and I'm a Maven newbie so I may be
missing something obvious. Can someone tell me what I'm doing wrong or
direct me to a pom.xml that builds a simple Hadoop application?


On Fri, Aug 12, 2011 at 2:47 PM, Luke Lu <ll...@apache.org> wrote:

> Pre-0.21 (sustaining releases, large-scale tested)  hadoop:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-core</artifactId>
>  <version>0.20.203.0</version>
> </dependency>
>
> Pre-0.23 (small scale tested) hadoop:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-mapred</artifactId>
>  <version>...</version>
> </dependency>
>
> Trunk (currently targeting 0.23.0, large-scale tested) hadoop WILL be:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-mapreduce</artifactId>
>  <version>...</version>
> </dependency>
>
> On Fri, Aug 12, 2011 at 2:20 PM, W.P. McNeill <bi...@gmail.com> wrote:
> > I'm building a Hadoop project using Maven. I want to add
> > Maven dependencies to my project. What do I do?
> >
> > I think the answer is I add a <dependency></dependency> section to my
> .POM
> > file, but I'm not sure what the contents of this section (groupId,
> > artifactId etc.) should be. Googling does not turn up a clear answer. Is
> > there a canonical Hadoop Maven dependency specification?
> >
>

Re: How do I add Hadoop dependency to a Maven project?

Posted by Steve Loughran <st...@apache.org>.
On 16/08/11 16:56, W.P. McNeill wrote:
> Just to make sure I understand, the drop of smartfrog.svn.sourceforge.net is
> just a build of the latest Hadoop JARs, right? I can't use it as a Maven
> repository (because it's POM-less).
>
It's an example of what to do.

I use it in ivy because of its pom-less-ness is unimportant, and I can 
set up the dependancies downstream ( http://bit.ly/n5hbuB  )

they aren't private builds; they're the official releases, though by 
stripping out the log4j files I have diverged slightly.

You can do something similar for your own project in the absence of a 
0.21 release

Re: How do I add Hadoop dependency to a Maven project?

Posted by "W.P. McNeill" <bi...@gmail.com>.
Just to make sure I understand, the drop of smartfrog.svn.sourceforge.net is
just a build of the latest Hadoop JARs, right? I can't use it as a Maven
repository (because it's POM-less).

Re: How do I add Hadoop dependency to a Maven project?

Posted by Steve Loughran <st...@apache.org>.
On 13/08/11 00:08, W.P. McNeill wrote:
> I want the latest version of Hadoop (with the new API). I guess that's the
> trunk version, but I don't see the hadoop-mapreduce artifact listed on
> https://repository.apache.org/index.html#nexus-search;quick~hadoop
>

I have a set up elsewhere, POM-less

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/antbuild/repository/org.apache.hadoop/

There are some tagged .nolog to say the log4j.properties file has been 
stripped out

Re: How do I add Hadoop dependency to a Maven project?

Posted by Luke Lu <ll...@vicaya.com>.
There is a reason I capitalized WILL (SHALL) :)  The current trunk
mapreduce code is influx. Once mr2 (MAPREDUCE-279) is merged into
trunk (soon!). We'll be producing hadoop-mapreduce-0.23.0-SNAPSHOT,
which depends on hadoop-hdfs-0.23.0-SNAPSHOT, which depends on
hadoop-common-0.23.0-SNAPSHOT.

If you just want to play with the "new" API, you can use the
0.22.0-SNAPSHOT artifacts. 0.23.0 is supposedly source compatible with
previous hadoop versions including 0.20.x (for legacy API).

On Fri, Aug 12, 2011 at 4:08 PM, W.P. McNeill <bi...@gmail.com> wrote:
> I want the latest version of Hadoop (with the new API). I guess that's the
> trunk version, but I don't see the hadoop-mapreduce artifact listed on
> https://repository.apache.org/index.html#nexus-search;quick~hadoop
>
> On Fri, Aug 12, 2011 at 2:47 PM, Luke Lu <ll...@apache.org> wrote:
>
>> Pre-0.21 (sustaining releases, large-scale tested)  hadoop:
>> <dependency>
>>  <groupId>org.apache.hadoop</groupId>
>>  <artifactId>hadoop-core</artifactId>
>>  <version>0.20.203.0</version>
>> </dependency>
>>
>> Pre-0.23 (small scale tested) hadoop:
>> <dependency>
>>  <groupId>org.apache.hadoop</groupId>
>>  <artifactId>hadoop-mapred</artifactId>
>>  <version>...</version>
>> </dependency>
>>
>> Trunk (currently targeting 0.23.0, large-scale tested) hadoop WILL be:
>> <dependency>
>>  <groupId>org.apache.hadoop</groupId>
>>  <artifactId>hadoop-mapreduce</artifactId>
>>  <version>...</version>
>> </dependency>
>>
>> On Fri, Aug 12, 2011 at 2:20 PM, W.P. McNeill <bi...@gmail.com> wrote:
>> > I'm building a Hadoop project using Maven. I want to add
>> > Maven dependencies to my project. What do I do?
>> >
>> > I think the answer is I add a <dependency></dependency> section to my
>> .POM
>> > file, but I'm not sure what the contents of this section (groupId,
>> > artifactId etc.) should be. Googling does not turn up a clear answer. Is
>> > there a canonical Hadoop Maven dependency specification?
>> >
>>
>

Re: How do I add Hadoop dependency to a Maven project?

Posted by "W.P. McNeill" <bi...@gmail.com>.
I want the latest version of Hadoop (with the new API). I guess that's the
trunk version, but I don't see the hadoop-mapreduce artifact listed on
https://repository.apache.org/index.html#nexus-search;quick~hadoop

On Fri, Aug 12, 2011 at 2:47 PM, Luke Lu <ll...@apache.org> wrote:

> Pre-0.21 (sustaining releases, large-scale tested)  hadoop:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-core</artifactId>
>  <version>0.20.203.0</version>
> </dependency>
>
> Pre-0.23 (small scale tested) hadoop:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-mapred</artifactId>
>  <version>...</version>
> </dependency>
>
> Trunk (currently targeting 0.23.0, large-scale tested) hadoop WILL be:
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-mapreduce</artifactId>
>  <version>...</version>
> </dependency>
>
> On Fri, Aug 12, 2011 at 2:20 PM, W.P. McNeill <bi...@gmail.com> wrote:
> > I'm building a Hadoop project using Maven. I want to add
> > Maven dependencies to my project. What do I do?
> >
> > I think the answer is I add a <dependency></dependency> section to my
> .POM
> > file, but I'm not sure what the contents of this section (groupId,
> > artifactId etc.) should be. Googling does not turn up a clear answer. Is
> > there a canonical Hadoop Maven dependency specification?
> >
>

Re: How do I add Hadoop dependency to a Maven project?

Posted by Luke Lu <ll...@apache.org>.
Pre-0.21 (sustaining releases, large-scale tested)  hadoop:
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>0.20.203.0</version>
</dependency>

Pre-0.23 (small scale tested) hadoop:
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapred</artifactId>
  <version>...</version>
</dependency>

Trunk (currently targeting 0.23.0, large-scale tested) hadoop WILL be:
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce</artifactId>
  <version>...</version>
</dependency>

On Fri, Aug 12, 2011 at 2:20 PM, W.P. McNeill <bi...@gmail.com> wrote:
> I'm building a Hadoop project using Maven. I want to add
> Maven dependencies to my project. What do I do?
>
> I think the answer is I add a <dependency></dependency> section to my .POM
> file, but I'm not sure what the contents of this section (groupId,
> artifactId etc.) should be. Googling does not turn up a clear answer. Is
> there a canonical Hadoop Maven dependency specification?
>

Re: hdfs format command issue

Posted by Harsh J <ha...@cloudera.com>.
Generally though, the coreutil 'yes' lets you accomplish these kind of
tasks where you need to repeatedly put out a string in order to get
through some interaction/etc..

On Tue, Aug 16, 2011 at 5:32 AM, Dhodapkar, Chinmay
<ch...@qualcomm.com> wrote:
> Perfect :)
>
> -----Original Message-----
> From: Giridharan Kesavan [mailto:gkesavan@hortonworks.com]
> Sent: Sunday, August 14, 2011 5:42 PM
> To: common-user@hadoop.apache.org
> Subject: Re: hdfs format command issue
>
> this should help.
>
> echo Y | ${hadoophdfshome}/bin/hdfs namenode -format
>
> -giri
>
> On Sat, Aug 13, 2011 at 8:41 AM, Dhodapkar, Chinmay
> <ch...@qualcomm.com>wrote:
>
>> I am trying to automate the installation/bringup of a complete hadoop/hbase
>> cluster from a single script. I have run into a very small issue...
>> Before bringing up the namenode, I have to format it with the usual "hadoop
>> namenode -format"
>>
>> Executing the above command prompts the user for Y/N?. Is there an option
>> that can be passed to force the format without prompting?
>> The aim is for the script to complete without any human intervention...
>>
>>
>>
>>
>



-- 
Harsh J

RE: hdfs format command issue

Posted by "Dhodapkar, Chinmay" <ch...@qualcomm.com>.
Perfect :)

-----Original Message-----
From: Giridharan Kesavan [mailto:gkesavan@hortonworks.com] 
Sent: Sunday, August 14, 2011 5:42 PM
To: common-user@hadoop.apache.org
Subject: Re: hdfs format command issue

this should help.

echo Y | ${hadoophdfshome}/bin/hdfs namenode -format

-giri

On Sat, Aug 13, 2011 at 8:41 AM, Dhodapkar, Chinmay
<ch...@qualcomm.com>wrote:

> I am trying to automate the installation/bringup of a complete hadoop/hbase
> cluster from a single script. I have run into a very small issue...
> Before bringing up the namenode, I have to format it with the usual "hadoop
> namenode -format"
>
> Executing the above command prompts the user for Y/N?. Is there an option
> that can be passed to force the format without prompting?
> The aim is for the script to complete without any human intervention...
>
>
>
>

Re: hdfs format command issue

Posted by Giridharan Kesavan <gk...@hortonworks.com>.
this should help.

echo Y | ${hadoophdfshome}/bin/hdfs namenode -format

-giri

On Sat, Aug 13, 2011 at 8:41 AM, Dhodapkar, Chinmay
<ch...@qualcomm.com>wrote:

> I am trying to automate the installation/bringup of a complete hadoop/hbase
> cluster from a single script. I have run into a very small issue...
> Before bringing up the namenode, I have to format it with the usual "hadoop
> namenode -format"
>
> Executing the above command prompts the user for Y/N?. Is there an option
> that can be passed to force the format without prompting?
> The aim is for the script to complete without any human intervention...
>
>
>
>

hdfs format command issue

Posted by "Dhodapkar, Chinmay" <ch...@qualcomm.com>.
I am trying to automate the installation/bringup of a complete hadoop/hbase cluster from a single script. I have run into a very small issue...
Before bringing up the namenode, I have to format it with the usual "hadoop namenode -format"

Executing the above command prompts the user for Y/N?. Is there an option that can be passed to force the format without prompting?
The aim is for the script to complete without any human intervention...