You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Diego Ceccarelli <di...@gmail.com> on 2012/10/29 19:06:25 UTC

Using Mahout with Hadoop 0.23

Dear all,

I'm trying to run mahout on a cluster with hadoop 0.23
I set all the environment variables but when I run the job
i got this error:

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

I tried both mahout 0.0.7 and 0.0.8-SNAPSHOT, with the same result
It is normal that MAHOUT-JOB takes mahout-examples-0.8-SNAPSHOT-job.jar?

Thanks,
Diego

Re: Using Mahout with Hadoop 0.23

Posted by Diego Ceccarelli <di...@gmail.com>.
Dear Sean,

At the end, yesterday I solved : I removed the hadoop-core dependency
from the main pom, but the problem was that examples module
depends also on classes in hadoop-core/hadoop-common, but
hadoop common was not in used in examples/pom.xml.
I was able to compile adding this dependency in examples/pom.xml
(and also hadoop-mapreduce-client-core).

Anyway this did not solve the problem
it was simpler :) when I call cvb:

bin/mahout cvb -i /user/diegolo/twitter/tweets-rowid -o
/user/diegolo/twitter/text_lda -k 100 -dict
/user/diegolo/twitter/dictionary.file-0 --maxIter 20

I put as input the output of rowid, while cvb was expecting the matrix
inside rowid output ( /user/diegolo/twitter/tweets-rowid/matrix)

bin/mahout cvb -i /user/diegolo/twitter/tweets-rowid/matrix -o
/user/diegolo/twitter/text_lda -k 100 -dict
/user/diegolo/twitter/dictionary.file-0 --maxIter 20

 made hadoop happy :)

now I've my output and I'm trying to understand it,
I've some problems with vector dump, it seems that:

./bin/mahout vectordump -i lda/part-m-00000 -o prob --dictionary
vector/dictionary.file-0 -dt sequencefile

creates a file where for each topic I have the probability for each term
to be in the topic. I would like to see the most probable terms per
topic:

./bin/mahout vectordump -i ~/twitter-lda/lda/part-m-00000 -o
~/twitter-lda/prob -d ~/twitter-lda/vector/dictionary.file-0 -dt
sequencefile -sort true -vs 20

but i always got this error: (also with really small vector sizes)

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/11/01 00:57:08 INFO common.AbstractJob: Command line arguments:
{--dictionary=[/Users/diego/twitter-lda/vector/dictionary.file-0],
--dictionaryType=[sequencefile], --endPhase=[2147483647],
--input=[/Users/diego/twitter-lda/lda/part-m-00000],
--output=[/Users/diego/twitter-lda/prob], --sortVectors=[true],
--startPhase=[0], --tempDir=[temp], --vectorSize=[20]}
2012-11-01 00:57:08.827 java[10552:1203] Unable to load realm info
from SCDynamicStore
12/11/01 00:57:09 INFO vectors.VectorDumper: Sort? true
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:108)
	at org.apache.mahout.utils.vectors.VectorHelper$TDoublePQ.<init>(VectorHelper.java:221)
	at org.apache.mahout.utils.vectors.VectorHelper$TDoublePQ.<init>(VectorHelper.java:218)
	at org.apache.mahout.utils.vectors.VectorHelper.topEntries(VectorHelper.java:84)

do you know this issue? also, I don't undestand how to see
the topics for a given tweet.

Thanks,
Diego



On Tue, Oct 30, 2012 at 12:44 PM, Sean Owen <sr...@gmail.com> wrote:
> If you want to use Hadoop 0.23, there is no point in specifying 0.22 (a
> mostly abandoned branch), or 0.20 (an old version of the stable branch, but
> something I thought you didn't want to use for some reason). So I would
> simply stop bothering with any of that. Don't use SNAPSHOTs of anything.
>
> examples / integration depend on core, but if core works, they should work.
> You have to 'mvn install' your core artifact locally to make it use it.
> Your error may be caused by that.
>
> Why do you want to use 0.23 in the first place? 1.1.x and 2.0.x are the
> best stable / experimental branches now.
>
> On Tue, Oct 30, 2012 at 11:27 AM, Diego Ceccarelli <
> diego.ceccarelli@gmail.com> wrote:
>
>> Thanks Sean,
>>
>> So I first tried commenting the hadoop-core dependency but it did not work,
>> then I added a different version for hadoop-core (0.22.0-SNAPSHOT)
>> and I was able to compile the mahout core ( mvn -P hadoop-0.23 install
>> -DskipTests)
>> I had errors with the integration and examples modules (and it
>> seems that I need to compile also them to run mahout). (integration
>> [1]), (examples errors: [2])
>>
>> So I set hadoop-core version to 0.20.2, and I was able to compile
>> everything except
>> the integration module (which I excluded from the reactor).
>> When I run mahout anyway I got the same initial error.
>> So I used hadoop-core 0.22.0-SNAPSHOT and I compiled
>> separately mahout examples with the 0.20.2 version
>>
>> Then I tried to run lda on my twitter dataset:
>>
>> bin/mahout cvb -i /user/diegolo/twitter/tweets-rowid -o
>> /user/diegolo/twitter/text_lda -k 100 -dict
>> /user/diegolo/twitter/dictionary.file-0 --maxIter 20
>>
>> The job started but I got this error:
>>
>>
>> 12/10/30 11:19:44 INFO mapreduce.Job: Running job: job_1351559192903_4948
>> 12/10/30 11:19:55 INFO mapreduce.Job: Job job_1351559192903_4948
>> running in uber mode : false
>> 12/10/30 11:19:55 INFO mapreduce.Job:  map 0% reduce 0%
>> 12/10/30 11:20:07 INFO mapreduce.Job: Task Id :
>> attempt_1351559192903_4948_m_000001_0, Status : FAILED
>> Error: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot
>> be cast to org.apache.mahout.math.VectorWritable
>>         at
>> org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.map(CachingCVB0Mapper.java:55)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
>>
>> do you think is due  to the dirty mix I did? why bin/mahout needs the
>> folder examples?
>>
>> Thanks,
>> Diego
>>
>>
>> [1] http://pastebin.com/q6VsSAFB
>> [2] http://pastebin.com/YvcegjBZ
>>
>> On Mon, Oct 29, 2012 at 11:20 PM, Sean Owen <sr...@gmail.com> wrote:
>> > I haven't tried it, I don't know if it works. From reading the pom.xml it
>> > looks like it should not consider hadoop-core a dependency if you select
>> > the other profile. If not, I don't know why. You could always just delete
>> > all the hadoop-core bits and do away with the alternate profile, that
>> would
>> > work.
>> >
>> > On Mon, Oct 29, 2012 at 10:07 PM, Diego Ceccarelli <
>> > diego.ceccarelli@gmail.com> wrote:
>> >
>> >> > But, most of all note that you are not looking for hadoop-core but
>> >> > hadoop-common
>> >>
>> >> Sorry, but it's 11 pm here and I'm bit tired ;) I don't understand the
>> >> above sentence:
>> >> in the main pom.xml hadoop-core and hadoop-common are imported with the
>> >> same
>> >> placeholder $hadoop.version, and the problem that I have is that i
>> >> can't compile
>> >> because maven does not find the version 0.23.3/4 of hadoop-core.
>> >> You are telling me that I have to exclude hadoop core? or to use an
>> >> older version
>> >> for the core?
>> >> Sorry again :(
>> >>
>> >> cheers
>> >> Diego
>> >>
>> >>
>>
>>
>>
>> --
>> Computers are useless. They can only give you answers.
>> (Pablo Picasso)
>> _______________
>> Diego Ceccarelli
>> High Performance Computing Laboratory
>> Information Science and Technologies Institute (ISTI)
>> Italian National Research Council (CNR)
>> Via Moruzzi, 1
>> 56124 - Pisa - Italy
>>
>> Phone: +39 050 315 3055
>> Fax: +39 050 315 2040
>> ________________________________________
>>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Re: Using Mahout with Hadoop 0.23

Posted by Sean Owen <sr...@gmail.com>.
If you want to use Hadoop 0.23, there is no point in specifying 0.22 (a
mostly abandoned branch), or 0.20 (an old version of the stable branch, but
something I thought you didn't want to use for some reason). So I would
simply stop bothering with any of that. Don't use SNAPSHOTs of anything.

examples / integration depend on core, but if core works, they should work.
You have to 'mvn install' your core artifact locally to make it use it.
Your error may be caused by that.

Why do you want to use 0.23 in the first place? 1.1.x and 2.0.x are the
best stable / experimental branches now.

On Tue, Oct 30, 2012 at 11:27 AM, Diego Ceccarelli <
diego.ceccarelli@gmail.com> wrote:

> Thanks Sean,
>
> So I first tried commenting the hadoop-core dependency but it did not work,
> then I added a different version for hadoop-core (0.22.0-SNAPSHOT)
> and I was able to compile the mahout core ( mvn -P hadoop-0.23 install
> -DskipTests)
> I had errors with the integration and examples modules (and it
> seems that I need to compile also them to run mahout). (integration
> [1]), (examples errors: [2])
>
> So I set hadoop-core version to 0.20.2, and I was able to compile
> everything except
> the integration module (which I excluded from the reactor).
> When I run mahout anyway I got the same initial error.
> So I used hadoop-core 0.22.0-SNAPSHOT and I compiled
> separately mahout examples with the 0.20.2 version
>
> Then I tried to run lda on my twitter dataset:
>
> bin/mahout cvb -i /user/diegolo/twitter/tweets-rowid -o
> /user/diegolo/twitter/text_lda -k 100 -dict
> /user/diegolo/twitter/dictionary.file-0 --maxIter 20
>
> The job started but I got this error:
>
>
> 12/10/30 11:19:44 INFO mapreduce.Job: Running job: job_1351559192903_4948
> 12/10/30 11:19:55 INFO mapreduce.Job: Job job_1351559192903_4948
> running in uber mode : false
> 12/10/30 11:19:55 INFO mapreduce.Job:  map 0% reduce 0%
> 12/10/30 11:20:07 INFO mapreduce.Job: Task Id :
> attempt_1351559192903_4948_m_000001_0, Status : FAILED
> Error: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot
> be cast to org.apache.mahout.math.VectorWritable
>         at
> org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.map(CachingCVB0Mapper.java:55)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
>
> do you think is due  to the dirty mix I did? why bin/mahout needs the
> folder examples?
>
> Thanks,
> Diego
>
>
> [1] http://pastebin.com/q6VsSAFB
> [2] http://pastebin.com/YvcegjBZ
>
> On Mon, Oct 29, 2012 at 11:20 PM, Sean Owen <sr...@gmail.com> wrote:
> > I haven't tried it, I don't know if it works. From reading the pom.xml it
> > looks like it should not consider hadoop-core a dependency if you select
> > the other profile. If not, I don't know why. You could always just delete
> > all the hadoop-core bits and do away with the alternate profile, that
> would
> > work.
> >
> > On Mon, Oct 29, 2012 at 10:07 PM, Diego Ceccarelli <
> > diego.ceccarelli@gmail.com> wrote:
> >
> >> > But, most of all note that you are not looking for hadoop-core but
> >> > hadoop-common
> >>
> >> Sorry, but it's 11 pm here and I'm bit tired ;) I don't understand the
> >> above sentence:
> >> in the main pom.xml hadoop-core and hadoop-common are imported with the
> >> same
> >> placeholder $hadoop.version, and the problem that I have is that i
> >> can't compile
> >> because maven does not find the version 0.23.3/4 of hadoop-core.
> >> You are telling me that I have to exclude hadoop core? or to use an
> >> older version
> >> for the core?
> >> Sorry again :(
> >>
> >> cheers
> >> Diego
> >>
> >>
>
>
>
> --
> Computers are useless. They can only give you answers.
> (Pablo Picasso)
> _______________
> Diego Ceccarelli
> High Performance Computing Laboratory
> Information Science and Technologies Institute (ISTI)
> Italian National Research Council (CNR)
> Via Moruzzi, 1
> 56124 - Pisa - Italy
>
> Phone: +39 050 315 3055
> Fax: +39 050 315 2040
> ________________________________________
>

Re: Using Mahout with Hadoop 0.23

Posted by Diego Ceccarelli <di...@gmail.com>.
Thanks Sean,

So I first tried commenting the hadoop-core dependency but it did not work,
then I added a different version for hadoop-core (0.22.0-SNAPSHOT)
and I was able to compile the mahout core ( mvn -P hadoop-0.23 install
-DskipTests)
I had errors with the integration and examples modules (and it
seems that I need to compile also them to run mahout). (integration
[1]), (examples errors: [2])

So I set hadoop-core version to 0.20.2, and I was able to compile
everything except
the integration module (which I excluded from the reactor).
When I run mahout anyway I got the same initial error.
So I used hadoop-core 0.22.0-SNAPSHOT and I compiled
separately mahout examples with the 0.20.2 version

Then I tried to run lda on my twitter dataset:

bin/mahout cvb -i /user/diegolo/twitter/tweets-rowid -o
/user/diegolo/twitter/text_lda -k 100 -dict
/user/diegolo/twitter/dictionary.file-0 --maxIter 20

The job started but I got this error:


12/10/30 11:19:44 INFO mapreduce.Job: Running job: job_1351559192903_4948
12/10/30 11:19:55 INFO mapreduce.Job: Job job_1351559192903_4948
running in uber mode : false
12/10/30 11:19:55 INFO mapreduce.Job:  map 0% reduce 0%
12/10/30 11:20:07 INFO mapreduce.Job: Task Id :
attempt_1351559192903_4948_m_000001_0, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot
be cast to org.apache.mahout.math.VectorWritable
        at org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.map(CachingCVB0Mapper.java:55)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)

do you think is due  to the dirty mix I did? why bin/mahout needs the
folder examples?

Thanks,
Diego


[1] http://pastebin.com/q6VsSAFB
[2] http://pastebin.com/YvcegjBZ

On Mon, Oct 29, 2012 at 11:20 PM, Sean Owen <sr...@gmail.com> wrote:
> I haven't tried it, I don't know if it works. From reading the pom.xml it
> looks like it should not consider hadoop-core a dependency if you select
> the other profile. If not, I don't know why. You could always just delete
> all the hadoop-core bits and do away with the alternate profile, that would
> work.
>
> On Mon, Oct 29, 2012 at 10:07 PM, Diego Ceccarelli <
> diego.ceccarelli@gmail.com> wrote:
>
>> > But, most of all note that you are not looking for hadoop-core but
>> > hadoop-common
>>
>> Sorry, but it's 11 pm here and I'm bit tired ;) I don't understand the
>> above sentence:
>> in the main pom.xml hadoop-core and hadoop-common are imported with the
>> same
>> placeholder $hadoop.version, and the problem that I have is that i
>> can't compile
>> because maven does not find the version 0.23.3/4 of hadoop-core.
>> You are telling me that I have to exclude hadoop core? or to use an
>> older version
>> for the core?
>> Sorry again :(
>>
>> cheers
>> Diego
>>
>>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Re: Using Mahout with Hadoop 0.23

Posted by Sean Owen <sr...@gmail.com>.
I haven't tried it, I don't know if it works. From reading the pom.xml it
looks like it should not consider hadoop-core a dependency if you select
the other profile. If not, I don't know why. You could always just delete
all the hadoop-core bits and do away with the alternate profile, that would
work.

On Mon, Oct 29, 2012 at 10:07 PM, Diego Ceccarelli <
diego.ceccarelli@gmail.com> wrote:

> > But, most of all note that you are not looking for hadoop-core but
> > hadoop-common
>
> Sorry, but it's 11 pm here and I'm bit tired ;) I don't understand the
> above sentence:
> in the main pom.xml hadoop-core and hadoop-common are imported with the
> same
> placeholder $hadoop.version, and the problem that I have is that i
> can't compile
> because maven does not find the version 0.23.3/4 of hadoop-core.
> You are telling me that I have to exclude hadoop core? or to use an
> older version
> for the core?
> Sorry again :(
>
> cheers
> Diego
>
>

Re: Using Mahout with Hadoop 0.23

Posted by Diego Ceccarelli <di...@gmail.com>.
> But, most of all note that you are not looking for hadoop-core but
> hadoop-common

Sorry, but it's 11 pm here and I'm bit tired ;) I don't understand the
above sentence:
in the main pom.xml hadoop-core and hadoop-common are imported with the same
placeholder $hadoop.version, and the problem that I have is that i
can't compile
because maven does not find the version 0.23.3/4 of hadoop-core.
You are telling me that I have to exclude hadoop core? or to use an
older version
for the core?
Sorry again :(

cheers
Diego


> You're really on your own though, this is not supported or recommended by
> anyone. If you don't need to use 0.23.x you shouldn't.
>
> On Mon, Oct 29, 2012 at 9:11 PM, Diego Ceccarelli <
> diego.ceccarelli@gmail.com> wrote:
>
>> On Mon, Oct 29, 2012 at 9:38 PM, Sean Owen <sr...@gmail.com> wrote:
>> > Yes, you must also specify hadoop version or else you would be using
>> 0.23.x
>> > artifacts with version 1.0.4 where they don't exist. The most recent
>> 0.23.x
>> > build is not "0.23" but "0.23.4". (Use 0.23.3, if that hasn't made it to
>> > the repo yet.)
>> >
>> > 0.20.2 is quite old, older than 1.0.4. I thought you wanted the 0.23.x /
>> > 2.0.x line?
>>
>> The profile in core/pom.xml is called hadoop-0.23, and does not
>> seem to work.
>> I tried to replace <hadoop.version> 1.0.3 </hadoop-version> in the
>> main pom, with 0.23.4 but it seems that the central repository does
>> not contain this jar,
>> and I'm not able to find it on the Web :P
>> I tried the version 0.22.0-SNAPSHOT from asf repository [1],
>> but I got compilation errors  [2].
>>
>> > I wouldn't recommend you use 0.23.x but that's how you would try to.
>> Unfortunately I can't downgrade/upgrade the hadoop version on the cluster
>> :)
>> Thanks for your help
>>
>> [1]
>> https://repository.apache.org/content/repositories/public/org/apache/hadoop/hadoop-core/
>> [2]  http://pastebin.com/8vVJDVQW
>>
>> > On Mon, Oct 29, 2012 at 7:06 PM, Diego Ceccarelli <
>> > diego.ceccarelli@gmail.com> wrote:
>>
>> >> Hi Sean,
>> >> I took a look to the profiles and in core there is this 'hadoop-0.23'
>> but
>> >> if I
>> >> try to compile:
>> >>
>> >> mvn -P hadoop-0.23 install -DskipTests
>> >>
>> >> Maven returns with this weird error:
>> >>
>> >> [ERROR] Failed to execute goal on project mahout-core: Could not
>> >> resolve dependencies for project
>> >> org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT: The following
>> >> artifacts could not be resolved:
>> >> org.apache.hadoop:hadoop-common:jar:1.0.4,
>> >> org.apache.hadoop:hadoop-mapreduce-client-common:jar:1.0.4,
>> >> org.apache.hadoop:hadoop-mapreduce-client-core:jar:1.0.4: Failure to
>> >> find org.apache.hadoop:hadoop-common:jar:1.0.4 in
>> >> http://repo1.maven.org/maven2 was cached in the local repository,
>> >> resolution will not be reattempted until the update interval of
>> >> central has elapsed or updates are forced -> [Help 1]
>> >>
>> >> (seems that it is looking for the wrong version of hadoop..)
>> >> I also tried to replace the property hadoop.version in the main pom,
>> >> but it seems that there is not
>> >> hadoop 0.23* in maven central, and I'm not able to find a the a jar to
>> >> install it in my local repo :((
>> >> At the end I tried with 0.20.2, and I was able to compile but I got
>> >> the same previous error
>> >> when I launched mahout :P
>> >>
>> >>
>>
>>
>>
>> --
>> Computers are useless. They can only give you answers.
>> (Pablo Picasso)
>> _______________
>> Diego Ceccarelli
>> High Performance Computing Laboratory
>> Information Science and Technologies Institute (ISTI)
>> Italian National Research Council (CNR)
>> Via Moruzzi, 1
>> 56124 - Pisa - Italy
>>
>> Phone: +39 050 315 3055
>> Fax: +39 050 315 2040
>> ________________________________________
>>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Re: Using Mahout with Hadoop 0.23

Posted by Sean Owen <sr...@gmail.com>.
Yes, did you read my reply? 0.23.4 is the latest 0.23.x release, but, for
whatever reason it takes a while for them to get around to pushing it to
the central repo. 0.23.3 is definitely out there, at least:

http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/0.23.3

But if you use the ASF repo you should be able to get 0.23.4:

https://repository.apache.org/content/repositories/public/org/apache/hadoop/hadoop-common/0.23.4/

But, most of all note that you are not looking for hadoop-core but
hadoop-common

You're really on your own though, this is not supported or recommended by
anyone. If you don't need to use 0.23.x you shouldn't.

On Mon, Oct 29, 2012 at 9:11 PM, Diego Ceccarelli <
diego.ceccarelli@gmail.com> wrote:

> On Mon, Oct 29, 2012 at 9:38 PM, Sean Owen <sr...@gmail.com> wrote:
> > Yes, you must also specify hadoop version or else you would be using
> 0.23.x
> > artifacts with version 1.0.4 where they don't exist. The most recent
> 0.23.x
> > build is not "0.23" but "0.23.4". (Use 0.23.3, if that hasn't made it to
> > the repo yet.)
> >
> > 0.20.2 is quite old, older than 1.0.4. I thought you wanted the 0.23.x /
> > 2.0.x line?
>
> The profile in core/pom.xml is called hadoop-0.23, and does not
> seem to work.
> I tried to replace <hadoop.version> 1.0.3 </hadoop-version> in the
> main pom, with 0.23.4 but it seems that the central repository does
> not contain this jar,
> and I'm not able to find it on the Web :P
> I tried the version 0.22.0-SNAPSHOT from asf repository [1],
> but I got compilation errors  [2].
>
> > I wouldn't recommend you use 0.23.x but that's how you would try to.
> Unfortunately I can't downgrade/upgrade the hadoop version on the cluster
> :)
> Thanks for your help
>
> [1]
> https://repository.apache.org/content/repositories/public/org/apache/hadoop/hadoop-core/
> [2]  http://pastebin.com/8vVJDVQW
>
> > On Mon, Oct 29, 2012 at 7:06 PM, Diego Ceccarelli <
> > diego.ceccarelli@gmail.com> wrote:
>
> >> Hi Sean,
> >> I took a look to the profiles and in core there is this 'hadoop-0.23'
> but
> >> if I
> >> try to compile:
> >>
> >> mvn -P hadoop-0.23 install -DskipTests
> >>
> >> Maven returns with this weird error:
> >>
> >> [ERROR] Failed to execute goal on project mahout-core: Could not
> >> resolve dependencies for project
> >> org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT: The following
> >> artifacts could not be resolved:
> >> org.apache.hadoop:hadoop-common:jar:1.0.4,
> >> org.apache.hadoop:hadoop-mapreduce-client-common:jar:1.0.4,
> >> org.apache.hadoop:hadoop-mapreduce-client-core:jar:1.0.4: Failure to
> >> find org.apache.hadoop:hadoop-common:jar:1.0.4 in
> >> http://repo1.maven.org/maven2 was cached in the local repository,
> >> resolution will not be reattempted until the update interval of
> >> central has elapsed or updates are forced -> [Help 1]
> >>
> >> (seems that it is looking for the wrong version of hadoop..)
> >> I also tried to replace the property hadoop.version in the main pom,
> >> but it seems that there is not
> >> hadoop 0.23* in maven central, and I'm not able to find a the a jar to
> >> install it in my local repo :((
> >> At the end I tried with 0.20.2, and I was able to compile but I got
> >> the same previous error
> >> when I launched mahout :P
> >>
> >>
>
>
>
> --
> Computers are useless. They can only give you answers.
> (Pablo Picasso)
> _______________
> Diego Ceccarelli
> High Performance Computing Laboratory
> Information Science and Technologies Institute (ISTI)
> Italian National Research Council (CNR)
> Via Moruzzi, 1
> 56124 - Pisa - Italy
>
> Phone: +39 050 315 3055
> Fax: +39 050 315 2040
> ________________________________________
>

Re: Using Mahout with Hadoop 0.23

Posted by Diego Ceccarelli <di...@gmail.com>.
On Mon, Oct 29, 2012 at 9:38 PM, Sean Owen <sr...@gmail.com> wrote:
> Yes, you must also specify hadoop version or else you would be using 0.23.x
> artifacts with version 1.0.4 where they don't exist. The most recent 0.23.x
> build is not "0.23" but "0.23.4". (Use 0.23.3, if that hasn't made it to
> the repo yet.)
>
> 0.20.2 is quite old, older than 1.0.4. I thought you wanted the 0.23.x /
> 2.0.x line?

The profile in core/pom.xml is called hadoop-0.23, and does not
seem to work.
I tried to replace <hadoop.version> 1.0.3 </hadoop-version> in the
main pom, with 0.23.4 but it seems that the central repository does
not contain this jar,
and I'm not able to find it on the Web :P
I tried the version 0.22.0-SNAPSHOT from asf repository [1],
but I got compilation errors  [2].

> I wouldn't recommend you use 0.23.x but that's how you would try to.
Unfortunately I can't downgrade/upgrade the hadoop version on the cluster :)
Thanks for your help

[1]  https://repository.apache.org/content/repositories/public/org/apache/hadoop/hadoop-core/
[2]  http://pastebin.com/8vVJDVQW

> On Mon, Oct 29, 2012 at 7:06 PM, Diego Ceccarelli <
> diego.ceccarelli@gmail.com> wrote:

>> Hi Sean,
>> I took a look to the profiles and in core there is this 'hadoop-0.23' but
>> if I
>> try to compile:
>>
>> mvn -P hadoop-0.23 install -DskipTests
>>
>> Maven returns with this weird error:
>>
>> [ERROR] Failed to execute goal on project mahout-core: Could not
>> resolve dependencies for project
>> org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT: The following
>> artifacts could not be resolved:
>> org.apache.hadoop:hadoop-common:jar:1.0.4,
>> org.apache.hadoop:hadoop-mapreduce-client-common:jar:1.0.4,
>> org.apache.hadoop:hadoop-mapreduce-client-core:jar:1.0.4: Failure to
>> find org.apache.hadoop:hadoop-common:jar:1.0.4 in
>> http://repo1.maven.org/maven2 was cached in the local repository,
>> resolution will not be reattempted until the update interval of
>> central has elapsed or updates are forced -> [Help 1]
>>
>> (seems that it is looking for the wrong version of hadoop..)
>> I also tried to replace the property hadoop.version in the main pom,
>> but it seems that there is not
>> hadoop 0.23* in maven central, and I'm not able to find a the a jar to
>> install it in my local repo :((
>> At the end I tried with 0.20.2, and I was able to compile but I got
>> the same previous error
>> when I launched mahout :P
>>
>>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Re: Using Mahout with Hadoop 0.23

Posted by Sean Owen <sr...@gmail.com>.
Yes, you must also specify hadoop version or else you would be using 0.23.x
artifacts with version 1.0.4 where they don't exist. The most recent 0.23.x
build is not "0.23" but "0.23.4". (Use 0.23.3, if that hasn't made it to
the repo yet.)

0.20.2 is quite old, older than 1.0.4. I thought you wanted the 0.23.x /
2.0.x line?

I wouldn't recommend you use 0.23.x but that's how you would try to.

On Mon, Oct 29, 2012 at 7:06 PM, Diego Ceccarelli <
diego.ceccarelli@gmail.com> wrote:

> Hi Sean,
> I took a look to the profiles and in core there is this 'hadoop-0.23' but
> if I
> try to compile:
>
> mvn -P hadoop-0.23 install -DskipTests
>
> Maven returns with this weird error:
>
> [ERROR] Failed to execute goal on project mahout-core: Could not
> resolve dependencies for project
> org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT: The following
> artifacts could not be resolved:
> org.apache.hadoop:hadoop-common:jar:1.0.4,
> org.apache.hadoop:hadoop-mapreduce-client-common:jar:1.0.4,
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:1.0.4: Failure to
> find org.apache.hadoop:hadoop-common:jar:1.0.4 in
> http://repo1.maven.org/maven2 was cached in the local repository,
> resolution will not be reattempted until the update interval of
> central has elapsed or updates are forced -> [Help 1]
>
> (seems that it is looking for the wrong version of hadoop..)
> I also tried to replace the property hadoop.version in the main pom,
> but it seems that there is not
> hadoop 0.23* in maven central, and I'm not able to find a the a jar to
> install it in my local repo :((
> At the end I tried with 0.20.2, and I was able to compile but I got
> the same previous error
> when I launched mahout :P
>
>

Re: Using Mahout with Hadoop 0.23

Posted by Diego Ceccarelli <di...@gmail.com>.
Hi Sean,
I took a look to the profiles and in core there is this 'hadoop-0.23' but if I
try to compile:

mvn -P hadoop-0.23 install -DskipTests

Maven returns with this weird error:

[ERROR] Failed to execute goal on project mahout-core: Could not
resolve dependencies for project
org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT: The following
artifacts could not be resolved:
org.apache.hadoop:hadoop-common:jar:1.0.4,
org.apache.hadoop:hadoop-mapreduce-client-common:jar:1.0.4,
org.apache.hadoop:hadoop-mapreduce-client-core:jar:1.0.4: Failure to
find org.apache.hadoop:hadoop-common:jar:1.0.4 in
http://repo1.maven.org/maven2 was cached in the local repository,
resolution will not be reattempted until the update interval of
central has elapsed or updates are forced -> [Help 1]

(seems that it is looking for the wrong version of hadoop..)
I also tried to replace the property hadoop.version in the main pom,
but it seems that there is not
hadoop 0.23* in maven central, and I'm not able to find a the a jar to
install it in my local repo :((
At the end I tried with 0.20.2, and I was able to compile but I got
the same previous error
when I launched mahout :P



On Mon, Oct 29, 2012 at 7:14 PM, Sean Owen <sr...@gmail.com> wrote:
> I don't think anyone has tried 0.23 in force because it (and the 2.0
> branch) are still alpha. But I don't know a strong reason it wouldn't work
> other than you have to compile against 0.23 not the current supported
> version (1.0.4). And I think someone even added a Maven profile to select
> the 0.23 version as the dependency. It is not just a matter of changing the
> version in pom.xml since the artifacts split and changed names. Dig around
> in pom.xml  to see what I mean.
> On Oct 29, 2012 6:07 PM, "Diego Ceccarelli" <di...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> I'm trying to run mahout on a cluster with hadoop 0.23
>> I set all the environment variables but when I run the job
>> i got this error:
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>
>> I tried both mahout 0.0.7 and 0.0.8-SNAPSHOT, with the same result
>> It is normal that MAHOUT-JOB takes mahout-examples-0.8-SNAPSHOT-job.jar?
>>
>> Thanks,
>> Diego
>>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Re: Using Mahout with Hadoop 0.23

Posted by Sean Owen <sr...@gmail.com>.
I don't think anyone has tried 0.23 in force because it (and the 2.0
branch) are still alpha. But I don't know a strong reason it wouldn't work
other than you have to compile against 0.23 not the current supported
version (1.0.4). And I think someone even added a Maven profile to select
the 0.23 version as the dependency. It is not just a matter of changing the
version in pom.xml since the artifacts split and changed names. Dig around
in pom.xml  to see what I mean.
On Oct 29, 2012 6:07 PM, "Diego Ceccarelli" <di...@gmail.com>
wrote:

> Dear all,
>
> I'm trying to run mahout on a cluster with hadoop 0.23
> I set all the environment variables but when I run the job
> i got this error:
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> I tried both mahout 0.0.7 and 0.0.8-SNAPSHOT, with the same result
> It is normal that MAHOUT-JOB takes mahout-examples-0.8-SNAPSHOT-job.jar?
>
> Thanks,
> Diego
>