You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sebastian Schelter <ss...@googlemail.com> on 2010/08/05 14:51:08 UTC

Mahout on Elastic MapReduce

Hi,

I'm currently evaluating ItemSimilarityJob and RecommenderJob on Elastic
MapReduce, it seems we have some small problems with S3, mostly due to
the fact that we need to use Filesystem.get(path.toUri(), conf) instead
of Filesystem.get(conf) in the code. I will create a patch for that the
next days.

I'm writing this mail because I encountered another problem I currently
can't solve. RecommenderJob is emulating MultipleInputs (which is
currently missing in Hadoop 0.20 AFAIK) by reading data from a combined
path that is built like that:

    new Path(prePartialMultiplyPath1 + "," + prePartialMultiplyPath2)

My Job always fails with this exception here:

    java.lang.IllegalArgumentException: Invalid hostname in URI
s3:/testingbucket-12345/tmp/prePartialMultiply2

Any ideas how to fix this?

Thanks,
Sebastian

(Wrong email in the last mail, sry)

###

Stacktrace (line numbers might not correspond with the latest version
from HEAD):


Exception in thread "main" java.lang.IllegalArgumentException: Invalid
hostname in URI s3:/testingbucket-12345/tmp/prePartialMultiply2
    at
org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
    at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at org.apache.hadoop.fs.s3native.$Proxy2.initialize(Unknown Source)
    at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:278)
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1418)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1443)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:203)
    at
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
    at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:908)
    at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:802)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:241)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:286)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



Re: Mahout on Elastic MapReduce

Posted by Andrew Hitchcock <ad...@gmail.com>.
I think you are missing a slash in the URI:

s3:/testingbucket-12345/tmp/prePartialMultiply2

What happens if you try:

s3://testingbucket-12345/tmp/prePartialMultiply2

Andrew

On Thu, Aug 5, 2010 at 5:51 AM, Sebastian Schelter
<ss...@googlemail.com> wrote:
> Hi,
>
> I'm currently evaluating ItemSimilarityJob and RecommenderJob on Elastic
> MapReduce, it seems we have some small problems with S3, mostly due to
> the fact that we need to use Filesystem.get(path.toUri(), conf) instead
> of Filesystem.get(conf) in the code. I will create a patch for that the
> next days.
>
> I'm writing this mail because I encountered another problem I currently
> can't solve. RecommenderJob is emulating MultipleInputs (which is
> currently missing in Hadoop 0.20 AFAIK) by reading data from a combined
> path that is built like that:
>
>    new Path(prePartialMultiplyPath1 + "," + prePartialMultiplyPath2)
>
> My Job always fails with this exception here:
>
>    java.lang.IllegalArgumentException: Invalid hostname in URI
> s3:/testingbucket-12345/tmp/prePartialMultiply2
>
> Any ideas how to fix this?
>
> Thanks,
> Sebastian
>
> (Wrong email in the last mail, sry)
>
> ###
>
> Stacktrace (line numbers might not correspond with the latest version
> from HEAD):
>
>
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> hostname in URI s3:/testingbucket-12345/tmp/prePartialMultiply2
>    at
> org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
>    at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:53)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>    at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>    at org.apache.hadoop.fs.s3native.$Proxy2.initialize(Unknown Source)
>    at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:278)
>    at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1418)
>    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>    at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1443)
>    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
>    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>    at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:203)
>    at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
>    at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:908)
>    at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:802)
>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>    at
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:241)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:286)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>