You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Konstantin Slisenko <ks...@gmail.com> on 2014/03/16 12:07:53 UTC

Problem with K-Means clustering on Amazon EMR

Hello!

I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
Reduce. As input and output I use S3 Amazon file system. I specify all
paths as "s3://bucket-name/folder-name".

SparceVectorsFromSequenceFile works correctly with S3
but when I start K-Means clustering job, I get this error:

Exception in thread "main" java.lang.IllegalArgumentException: This
file system object (hdfs://172.31.41.65:9000) does not support access
to the request path
's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
You possibly called FileSystem.get(conf) when you should have called
FileSystem.get(uri, conf) to obtain a file system supporting your
path.

	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
	at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
	at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
	at bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
of this a
	at bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I checked RandomSeedGenerator.buildRandom
(http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f)
and I assume it has correct code:

FileSystem fs = FileSystem.get(output.toUri(), conf);


I can not run clustering because of this error. May be you have any
ideas how to fix this?

Re: Problem with K-Means clustering on Amazon EMR

Posted by Konstantin Slisenko <ks...@gmail.com>.

Hi,

I created MAHOUT-1487. I also want to submit this path. I can do it on next
weekend or later.


2014-03-23 17:14 GMT+03:00 Sebastian Schelter <ss...@apache.org>:

> Hi Konstantin,
>
> Great to see that you located the error. Could you open a jira issue and
> submit a patch that contains an updated error message?
>
> Thank you,
> Sebastian
>
>
> On 03/23/2014 02:57 PM, Konstantin Slisenko wrote:
>
>> Hi!
>>
>> I investigated the situation. RandomSeedGenerator (
>> http://grepcode.com/file/repo1.maven.org/maven2/org.
>> apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/
>> kmeans/RandomSeedGenerator.java?av=f)
>> has following code:
>>
>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>>
>> ...
>>
>> fs.getFileStatus(input).isDir()
>>
>> FileSystem object was created from output path, which was not specified
>> correctly by me. (I didn't use prefix "s3://" for this path). Afterwards
>> getFileStatus has parameter for input path, which was correct. This caused
>> misunderstanding.
>>
>> To prevent this misunderstanding, I propose to improve error message
>> adding
>> following details:
>> 1. Specify which filesystem type used (DistributedFileSystem,
>> NativeS3FileSystem, etc. using fs.getClass().getName())
>> 2. Then specify which path can not be processed correctly.
>>
>> This can be done by validation utility which can be applied to many places
>> in Mahout. When we use Mahout we need to specify many paths and we also
>> can
>> use many types of file systems: local for debugging, distributed on
>> Hadoop,
>> and s3 on Amazon. In this case better error messages can save much time. I
>> think that refactoring is not needed for this case.
>>
>> 2014-03-16 22:19 GMT+03:00 Jay Vyas <ja...@gmail.com>:
>>
>>  I agree best to be explicit when creating filesystem instances by using
>>> the two argument get(...). it's time to update it filesystem 2.0 Apis.
>>>  Can
>>> you file a Jira for this ?  If not I will :)
>>>
>>>  On Mar 16, 2014, at 12:37 PM, Sebastian Schelter <ss...@apache.org>
>>>> wrote:
>>>>
>>>> I've also encountered a similar error once. It's really just the
>>>>
>>> FileSystem.get call that needs to be modified. I think its a good idea to
>>> walk through the codebase and refactor this where necessary.
>>>
>>>>
>>>> --sebastian
>>>>
>>>>
>>>>  On 03/16/2014 05:16 PM, Andrew Musselman wrote:
>>>>> Another wild guess, I've had issues trying to use the 's3' protocol
>>>>>
>>>> from Hadoop and got things working by using the 's3n' protocol instead.
>>>
>>>>
>>>>>  On Mar 16, 2014, at 8:41 AM, Jay Vyas <ja...@gmail.com> wrote:
>>>>>>
>>>>>> I specifically have fixed mapreduce jobs by doing what the error
>>>>>>
>>>>> message suggests.
>>>
>>>>
>>>>>> But maybe (hopefully) there is another workaround that is
>>>>>>
>>>>> configuration driven.
>>>
>>>>
>>>>>> Just a hunch but, Maybe mahout needs to be refactored to create fs
>>>>>>
>>>>> objects using the get(uri,conf) calls?
>>>
>>>>
>>>>>> As hadoop evolves to support different flavored of hcfs probably using
>>>>>>
>>>>> API calls that are more flexible (i.e. Like the fs.get(uri,conf) one),
>>> will
>>> probably be a good thing to keep in mind.
>>>
>>>>
>>>>>>  On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl>
>>>>>>>
>>>>>> wrote:
>>>
>>>>
>>>>>>> Hi Konstantin,
>>>>>>>
>>>>>>> Good to hear from you.
>>>>>>>
>>>>>>> The link you mentioned points to EigenSeedGenerator not
>>>>>>> RandomSeedGenerator. The problem seems to be with the call to
>>>>>>>
>>>>>>> fs.getFileStatus(input).isDir()
>>>>>>>
>>>>>>>
>>>>>>> It's been a while and I don't remember but perhaps you have to set
>>>>>>> additional Hadoop fs properties to use S3. See
>>>>>>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the
>>>>>>>
>>>>>> cause of
>>>
>>>> this by creating a small Java main app with that line of code and run
>>>>>>>
>>>>>> it in
>>>
>>>> the debugger.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
>>>>>>> <ks...@gmail.com>wrote:
>>>>>>>
>>>>>>>  Hello!
>>>>>>>>
>>>>>>>> I run a text-documents clustering on Hadoop cluster in Amazon
>>>>>>>>
>>>>>>> Elastic Map
>>>
>>>> Reduce. As input and output I use S3 Amazon file system. I specify
>>>>>>>>
>>>>>>> all
>>>
>>>> paths as "s3://bucket-name/folder-name".
>>>>>>>>
>>>>>>>> SparceVectorsFromSequenceFile works correctly with S3
>>>>>>>> but when I start K-Means clustering job, I get this error:
>>>>>>>>
>>>>>>>> Exception in thread "main" java.lang.IllegalArgumentException: This
>>>>>>>> file system object (hdfs://172.31.41.65:9000) does not support
>>>>>>>>
>>>>>>> access
>>>
>>>> to the request path
>>>>>>>>
>>>>>>>>
>>>>>>>>  's3://by.kslisenko.bigdata/stackovweflow-small/out_new/
>>> sparse/tfidf-vectors'
>>>
>>>> You possibly called FileSystem.get(conf) when you should have called
>>>>>>>> FileSystem.get(uri, conf) to obtain a file system supporting your
>>>>>>>> path.
>>>>>>>>
>>>>>>>>        at
>>>>>>>>
>>>>>>> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(
>>> DistributedFileSystem.java:106)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(
>>> DistributedFileSystem.java:162)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
>>> DistributedFileSystem.java:530)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  org.apache.mahout.clustering.kmeans.RandomSeedGenerator.
>>> buildRandom(RandomSeedGenerator.java:76)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  org.apache.mahout.clustering.kmeans.KMeansDriver.run(
>>> KMeansDriver.java:93)
>>>
>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>>        at
>>>>>>>>
>>>>>>>>  bbuzz2011.stackoverflow.runner.RunnerWithInParams.
>>> cluster(RunnerWithInParams.java:121)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(
>>> RunnerWithInParams.java:52)cause
>>>
>>>> of this a
>>>>>>>>        at
>>>>>>>>
>>>>>>>>  bbuzz2011.stackoverflow.runner.RunnerWithInParams.
>>> main(RunnerWithInParams.java:41)
>>>
>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>        at
>>>>>>>>
>>>>>>>>  sun.reflect.NativeMethodAccessorImpl.invoke(
>>> NativeMethodAccessorImpl.java:39)
>>>
>>>>        at
>>>>>>>>
>>>>>>>>  sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>> DelegatingMethodAccessorImpl.java:25)
>>>
>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>>
>>>>>>>>
>>>>>>>> I checked RandomSeedGenerator.buildRandom
>>>>>>>> (
>>>>>>>>
>>>>>>>>  http://grepcode.com/file/repo1.maven.org/maven2/org.
>>> apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/
>>> kmeans/EigenSeedGenerator.java?av=f
>>>
>>>> )
>>>>>>>> and I assume it has correct code:
>>>>>>>>
>>>>>>>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>>>>>>>>
>>>>>>>>
>>>>>>>> I can not run clustering because of this error. May be you have any
>>>>>>>> ideas how to fix this?
>>>>>>>>
>>>>>>>
>>>>
>>>
>>
>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Konstantin,

Great to see that you located the error. Could you open a jira issue and 
submit a patch that contains an updated error message?

Thank you,
Sebastian

On 03/23/2014 02:57 PM, Konstantin Slisenko wrote:
> Hi!
>
> I investigated the situation. RandomSeedGenerator (
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java?av=f)
> has following code:
>
> FileSystem fs = FileSystem.get(output.toUri(), conf);
>
> ...
>
> fs.getFileStatus(input).isDir()
>
> FileSystem object was created from output path, which was not specified
> correctly by me. (I didn't use prefix "s3://" for this path). Afterwards
> getFileStatus has parameter for input path, which was correct. This caused
> misunderstanding.
>
> To prevent this misunderstanding, I propose to improve error message adding
> following details:
> 1. Specify which filesystem type used (DistributedFileSystem,
> NativeS3FileSystem, etc. using fs.getClass().getName())
> 2. Then specify which path can not be processed correctly.
>
> This can be done by validation utility which can be applied to many places
> in Mahout. When we use Mahout we need to specify many paths and we also can
> use many types of file systems: local for debugging, distributed on Hadoop,
> and s3 on Amazon. In this case better error messages can save much time. I
> think that refactoring is not needed for this case.
>
> 2014-03-16 22:19 GMT+03:00 Jay Vyas <ja...@gmail.com>:
>
>> I agree best to be explicit when creating filesystem instances by using
>> the two argument get(...). it's time to update it filesystem 2.0 Apis.  Can
>> you file a Jira for this ?  If not I will :)
>>
>>> On Mar 16, 2014, at 12:37 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>>
>>> I've also encountered a similar error once. It's really just the
>> FileSystem.get call that needs to be modified. I think its a good idea to
>> walk through the codebase and refactor this where necessary.
>>>
>>> --sebastian
>>>
>>>
>>>> On 03/16/2014 05:16 PM, Andrew Musselman wrote:
>>>> Another wild guess, I've had issues trying to use the 's3' protocol
>> from Hadoop and got things working by using the 's3n' protocol instead.
>>>>
>>>>> On Mar 16, 2014, at 8:41 AM, Jay Vyas <ja...@gmail.com> wrote:
>>>>>
>>>>> I specifically have fixed mapreduce jobs by doing what the error
>> message suggests.
>>>>>
>>>>> But maybe (hopefully) there is another workaround that is
>> configuration driven.
>>>>>
>>>>> Just a hunch but, Maybe mahout needs to be refactored to create fs
>> objects using the get(uri,conf) calls?
>>>>>
>>>>> As hadoop evolves to support different flavored of hcfs probably using
>> API calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will
>> probably be a good thing to keep in mind.
>>>>>
>>>>>> On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl>
>> wrote:
>>>>>>
>>>>>> Hi Konstantin,
>>>>>>
>>>>>> Good to hear from you.
>>>>>>
>>>>>> The link you mentioned points to EigenSeedGenerator not
>>>>>> RandomSeedGenerator. The problem seems to be with the call to
>>>>>>
>>>>>> fs.getFileStatus(input).isDir()
>>>>>>
>>>>>>
>>>>>> It's been a while and I don't remember but perhaps you have to set
>>>>>> additional Hadoop fs properties to use S3. See
>>>>>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the
>> cause of
>>>>>> this by creating a small Java main app with that line of code and run
>> it in
>>>>>> the debugger.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
>>>>>> <ks...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> I run a text-documents clustering on Hadoop cluster in Amazon
>> Elastic Map
>>>>>>> Reduce. As input and output I use S3 Amazon file system. I specify
>> all
>>>>>>> paths as "s3://bucket-name/folder-name".
>>>>>>>
>>>>>>> SparceVectorsFromSequenceFile works correctly with S3
>>>>>>> but when I start K-Means clustering job, I get this error:
>>>>>>>
>>>>>>> Exception in thread "main" java.lang.IllegalArgumentException: This
>>>>>>> file system object (hdfs://172.31.41.65:9000) does not support
>> access
>>>>>>> to the request path
>>>>>>>
>>>>>>>
>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
>>>>>>> You possibly called FileSystem.get(conf) when you should have called
>>>>>>> FileSystem.get(uri, conf) to obtain a file system supporting your
>>>>>>> path.
>>>>>>>
>>>>>>>        at
>> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>>>>>>>        at
>>>>>>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>>>>>>>        at
>>>>>>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>>>>>>>        at
>>>>>>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>>>>>>>        at
>>>>>>>
>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>>>>>>>        at
>>>>>>>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>>>>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>        at
>>>>>>>
>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>>>>>>>        at
>>>>>>>
>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
>>>>>>> of this a
>>>>>>>        at
>>>>>>>
>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>        at
>>>>>>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>        at
>>>>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>
>>>>>>>
>>>>>>> I checked RandomSeedGenerator.buildRandom
>>>>>>> (
>>>>>>>
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
>>>>>>> )
>>>>>>> and I assume it has correct code:
>>>>>>>
>>>>>>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>>>>>>>
>>>>>>>
>>>>>>> I can not run clustering because of this error. May be you have any
>>>>>>> ideas how to fix this?
>>>
>>
>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Konstantin Slisenko <ks...@gmail.com>.

Hi!

I investigated the situation. RandomSeedGenerator (
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java?av=f)
has following code:

FileSystem fs = FileSystem.get(output.toUri(), conf);

...

fs.getFileStatus(input).isDir()

FileSystem object was created from output path, which was not specified
correctly by me. (I didn't use prefix "s3://" for this path). Afterwards
getFileStatus has parameter for input path, which was correct. This caused
misunderstanding.

To prevent this misunderstanding, I propose to improve error message adding
following details:
1. Specify which filesystem type used (DistributedFileSystem,
NativeS3FileSystem, etc. using fs.getClass().getName())
2. Then specify which path can not be processed correctly.

This can be done by validation utility which can be applied to many places
in Mahout. When we use Mahout we need to specify many paths and we also can
use many types of file systems: local for debugging, distributed on Hadoop,
and s3 on Amazon. In this case better error messages can save much time. I
think that refactoring is not needed for this case.

2014-03-16 22:19 GMT+03:00 Jay Vyas <ja...@gmail.com>:

> I agree best to be explicit when creating filesystem instances by using
> the two argument get(...). it's time to update it filesystem 2.0 Apis.  Can
> you file a Jira for this ?  If not I will :)
>
> > On Mar 16, 2014, at 12:37 PM, Sebastian Schelter <ss...@apache.org> wrote:
> >
> > I've also encountered a similar error once. It's really just the
> FileSystem.get call that needs to be modified. I think its a good idea to
> walk through the codebase and refactor this where necessary.
> >
> > --sebastian
> >
> >
> >> On 03/16/2014 05:16 PM, Andrew Musselman wrote:
> >> Another wild guess, I've had issues trying to use the 's3' protocol
> from Hadoop and got things working by using the 's3n' protocol instead.
> >>
> >>> On Mar 16, 2014, at 8:41 AM, Jay Vyas <ja...@gmail.com> wrote:
> >>>
> >>> I specifically have fixed mapreduce jobs by doing what the error
> message suggests.
> >>>
> >>> But maybe (hopefully) there is another workaround that is
> configuration driven.
> >>>
> >>> Just a hunch but, Maybe mahout needs to be refactored to create fs
> objects using the get(uri,conf) calls?
> >>>
> >>> As hadoop evolves to support different flavored of hcfs probably using
> API calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will
> probably be a good thing to keep in mind.
> >>>
> >>>> On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl>
> wrote:
> >>>>
> >>>> Hi Konstantin,
> >>>>
> >>>> Good to hear from you.
> >>>>
> >>>> The link you mentioned points to EigenSeedGenerator not
> >>>> RandomSeedGenerator. The problem seems to be with the call to
> >>>>
> >>>> fs.getFileStatus(input).isDir()
> >>>>
> >>>>
> >>>> It's been a while and I don't remember but perhaps you have to set
> >>>> additional Hadoop fs properties to use S3. See
> >>>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the
> cause of
> >>>> this by creating a small Java main app with that line of code and run
> it in
> >>>> the debugger.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Frank
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
> >>>> <ks...@gmail.com>wrote:
> >>>>
> >>>>> Hello!
> >>>>>
> >>>>> I run a text-documents clustering on Hadoop cluster in Amazon
> Elastic Map
> >>>>> Reduce. As input and output I use S3 Amazon file system. I specify
> all
> >>>>> paths as "s3://bucket-name/folder-name".
> >>>>>
> >>>>> SparceVectorsFromSequenceFile works correctly with S3
> >>>>> but when I start K-Means clustering job, I get this error:
> >>>>>
> >>>>> Exception in thread "main" java.lang.IllegalArgumentException: This
> >>>>> file system object (hdfs://172.31.41.65:9000) does not support
> access
> >>>>> to the request path
> >>>>>
> >>>>>
> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
> >>>>> You possibly called FileSystem.get(conf) when you should have called
> >>>>> FileSystem.get(uri, conf) to obtain a file system supporting your
> >>>>> path.
> >>>>>
> >>>>>       at
> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
> >>>>>       at
> >>>>>
> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
> >>>>>       at
> >>>>>
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
> >>>>>       at
> >>>>>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
> >>>>>       at
> >>>>>
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
> >>>>>       at
> >>>>>
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
> >>>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>>>>       at
> >>>>>
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
> >>>>>       at
> >>>>>
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
> >>>>> of this a
> >>>>>       at
> >>>>>
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
> >>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>       at
> >>>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>>>       at
> >>>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >>>>>
> >>>>>
> >>>>> I checked RandomSeedGenerator.buildRandom
> >>>>> (
> >>>>>
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
> >>>>> )
> >>>>> and I assume it has correct code:
> >>>>>
> >>>>> FileSystem fs = FileSystem.get(output.toUri(), conf);
> >>>>>
> >>>>>
> >>>>> I can not run clustering because of this error. May be you have any
> >>>>> ideas how to fix this?
> >
>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Jay Vyas <ja...@gmail.com>.

I agree best to be explicit when creating filesystem instances by using the two argument get(...). it's time to update it filesystem 2.0 Apis.  Can you file a Jira for this ?  If not I will :)

> On Mar 16, 2014, at 12:37 PM, Sebastian Schelter <ss...@apache.org> wrote:
> 
> I've also encountered a similar error once. It's really just the FileSystem.get call that needs to be modified. I think its a good idea to walk through the codebase and refactor this where necessary.
> 
> --sebastian
> 
> 
>> On 03/16/2014 05:16 PM, Andrew Musselman wrote:
>> Another wild guess, I've had issues trying to use the 's3' protocol from Hadoop and got things working by using the 's3n' protocol instead.
>> 
>>> On Mar 16, 2014, at 8:41 AM, Jay Vyas <ja...@gmail.com> wrote:
>>> 
>>> I specifically have fixed mapreduce jobs by doing what the error message suggests.
>>> 
>>> But maybe (hopefully) there is another workaround that is configuration driven.
>>> 
>>> Just a hunch but, Maybe mahout needs to be refactored to create fs objects using the get(uri,conf) calls?
>>> 
>>> As hadoop evolves to support different flavored of hcfs probably using API calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will probably be a good thing to keep in mind.
>>> 
>>>> On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
>>>> 
>>>> Hi Konstantin,
>>>> 
>>>> Good to hear from you.
>>>> 
>>>> The link you mentioned points to EigenSeedGenerator not
>>>> RandomSeedGenerator. The problem seems to be with the call to
>>>> 
>>>> fs.getFileStatus(input).isDir()
>>>> 
>>>> 
>>>> It's been a while and I don't remember but perhaps you have to set
>>>> additional Hadoop fs properties to use S3. See
>>>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of
>>>> this by creating a small Java main app with that line of code and run it in
>>>> the debugger.
>>>> 
>>>> Cheers,
>>>> 
>>>> Frank
>>>> 
>>>> 
>>>> 
>>>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
>>>> <ks...@gmail.com>wrote:
>>>> 
>>>>> Hello!
>>>>> 
>>>>> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
>>>>> Reduce. As input and output I use S3 Amazon file system. I specify all
>>>>> paths as "s3://bucket-name/folder-name".
>>>>> 
>>>>> SparceVectorsFromSequenceFile works correctly with S3
>>>>> but when I start K-Means clustering job, I get this error:
>>>>> 
>>>>> Exception in thread "main" java.lang.IllegalArgumentException: This
>>>>> file system object (hdfs://172.31.41.65:9000) does not support access
>>>>> to the request path
>>>>> 
>>>>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
>>>>> You possibly called FileSystem.get(conf) when you should have called
>>>>> FileSystem.get(uri, conf) to obtain a file system supporting your
>>>>> path.
>>>>> 
>>>>>       at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>>>>>       at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>>>>>       at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>>>>>       at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>>>>>       at
>>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>>>>>       at
>>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>>>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>       at
>>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>>>>>       at
>>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
>>>>> of this a
>>>>>       at
>>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>       at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>       at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>> 
>>>>> 
>>>>> I checked RandomSeedGenerator.buildRandom
>>>>> (
>>>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
>>>>> )
>>>>> and I assume it has correct code:
>>>>> 
>>>>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>>>>> 
>>>>> 
>>>>> I can not run clustering because of this error. May be you have any
>>>>> ideas how to fix this?
>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Sebastian Schelter <ss...@apache.org>.

I've also encountered a similar error once. It's really just the 
FileSystem.get call that needs to be modified. I think its a good idea 
to walk through the codebase and refactor this where necessary.

--sebastian


On 03/16/2014 05:16 PM, Andrew Musselman wrote:
> Another wild guess, I've had issues trying to use the 's3' protocol from Hadoop and got things working by using the 's3n' protocol instead.
>
>> On Mar 16, 2014, at 8:41 AM, Jay Vyas <ja...@gmail.com> wrote:
>>
>> I specifically have fixed mapreduce jobs by doing what the error message suggests.
>>
>> But maybe (hopefully) there is another workaround that is configuration driven.
>>
>> Just a hunch but, Maybe mahout needs to be refactored to create fs objects using the get(uri,conf) calls?
>>
>> As hadoop evolves to support different flavored of hcfs probably using API calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will probably be a good thing to keep in mind.
>>
>>> On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
>>>
>>> Hi Konstantin,
>>>
>>> Good to hear from you.
>>>
>>> The link you mentioned points to EigenSeedGenerator not
>>> RandomSeedGenerator. The problem seems to be with the call to
>>>
>>> fs.getFileStatus(input).isDir()
>>>
>>>
>>> It's been a while and I don't remember but perhaps you have to set
>>> additional Hadoop fs properties to use S3. See
>>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of
>>> this by creating a small Java main app with that line of code and run it in
>>> the debugger.
>>>
>>> Cheers,
>>>
>>> Frank
>>>
>>>
>>>
>>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
>>> <ks...@gmail.com>wrote:
>>>
>>>> Hello!
>>>>
>>>> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
>>>> Reduce. As input and output I use S3 Amazon file system. I specify all
>>>> paths as "s3://bucket-name/folder-name".
>>>>
>>>> SparceVectorsFromSequenceFile works correctly with S3
>>>> but when I start K-Means clustering job, I get this error:
>>>>
>>>> Exception in thread "main" java.lang.IllegalArgumentException: This
>>>> file system object (hdfs://172.31.41.65:9000) does not support access
>>>> to the request path
>>>>
>>>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
>>>> You possibly called FileSystem.get(conf) when you should have called
>>>> FileSystem.get(uri, conf) to obtain a file system supporting your
>>>> path.
>>>>
>>>>        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>>>>        at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>>>>        at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>>>>        at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>>>>        at
>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>>>>        at
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>        at
>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>>>>        at
>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
>>>> of this a
>>>>        at
>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>        at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>        at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>
>>>>
>>>> I checked RandomSeedGenerator.buildRandom
>>>> (
>>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
>>>> )
>>>> and I assume it has correct code:
>>>>
>>>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>>>>
>>>>
>>>> I can not run clustering because of this error. May be you have any
>>>> ideas how to fix this?
>>>>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Andrew Musselman <an...@gmail.com>.

Another wild guess, I've had issues trying to use the 's3' protocol from Hadoop and got things working by using the 's3n' protocol instead.

> On Mar 16, 2014, at 8:41 AM, Jay Vyas <ja...@gmail.com> wrote:
> 
> I specifically have fixed mapreduce jobs by doing what the error message suggests.
> 
> But maybe (hopefully) there is another workaround that is configuration driven.
> 
> Just a hunch but, Maybe mahout needs to be refactored to create fs objects using the get(uri,conf) calls?
> 
> As hadoop evolves to support different flavored of hcfs probably using API calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will probably be a good thing to keep in mind.
> 
>> On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
>> 
>> Hi Konstantin,
>> 
>> Good to hear from you.
>> 
>> The link you mentioned points to EigenSeedGenerator not
>> RandomSeedGenerator. The problem seems to be with the call to
>> 
>> fs.getFileStatus(input).isDir()
>> 
>> 
>> It's been a while and I don't remember but perhaps you have to set
>> additional Hadoop fs properties to use S3. See
>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of
>> this by creating a small Java main app with that line of code and run it in
>> the debugger.
>> 
>> Cheers,
>> 
>> Frank
>> 
>> 
>> 
>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
>> <ks...@gmail.com>wrote:
>> 
>>> Hello!
>>> 
>>> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
>>> Reduce. As input and output I use S3 Amazon file system. I specify all
>>> paths as "s3://bucket-name/folder-name".
>>> 
>>> SparceVectorsFromSequenceFile works correctly with S3
>>> but when I start K-Means clustering job, I get this error:
>>> 
>>> Exception in thread "main" java.lang.IllegalArgumentException: This
>>> file system object (hdfs://172.31.41.65:9000) does not support access
>>> to the request path
>>> 
>>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
>>> You possibly called FileSystem.get(conf) when you should have called
>>> FileSystem.get(uri, conf) to obtain a file system supporting your
>>> path.
>>> 
>>>       at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>>>       at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>>>       at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>>>       at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>>>       at
>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>>>       at
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>       at
>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>>>       at
>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
>>> of this a
>>>       at
>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> 
>>> 
>>> I checked RandomSeedGenerator.buildRandom
>>> (
>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
>>> )
>>> and I assume it has correct code:
>>> 
>>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>>> 
>>> 
>>> I can not run clustering because of this error. May be you have any
>>> ideas how to fix this?
>>>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Jay Vyas <ja...@gmail.com>.

I specifically have fixed mapreduce jobs by doing what the error message suggests.

But maybe (hopefully) there is another workaround that is configuration driven.

Just a hunch but, Maybe mahout needs to be refactored to create fs objects using the get(uri,conf) calls?

As hadoop evolves to support different flavored of hcfs probably using API calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will probably be a good thing to keep in mind.

> On Mar 16, 2014, at 9:22 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
> 
> Hi Konstantin,
> 
> Good to hear from you.
> 
> The link you mentioned points to EigenSeedGenerator not
> RandomSeedGenerator. The problem seems to be with the call to
> 
> fs.getFileStatus(input).isDir()
> 
> 
> It's been a while and I don't remember but perhaps you have to set
> additional Hadoop fs properties to use S3. See
> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of
> this by creating a small Java main app with that line of code and run it in
> the debugger.
> 
> Cheers,
> 
> Frank
> 
> 
> 
> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
> <ks...@gmail.com>wrote:
> 
>> Hello!
>> 
>> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
>> Reduce. As input and output I use S3 Amazon file system. I specify all
>> paths as "s3://bucket-name/folder-name".
>> 
>> SparceVectorsFromSequenceFile works correctly with S3
>> but when I start K-Means clustering job, I get this error:
>> 
>> Exception in thread "main" java.lang.IllegalArgumentException: This
>> file system object (hdfs://172.31.41.65:9000) does not support access
>> to the request path
>> 
>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
>> You possibly called FileSystem.get(conf) when you should have called
>> FileSystem.get(uri, conf) to obtain a file system supporting your
>> path.
>> 
>>        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>>        at
>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>>        at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>>        at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>>        at
>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at
>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>>        at
>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
>> of this a
>>        at
>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> 
>> 
>> I checked RandomSeedGenerator.buildRandom
>> (
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
>> )
>> and I assume it has correct code:
>> 
>> FileSystem fs = FileSystem.get(output.toUri(), conf);
>> 
>> 
>> I can not run clustering because of this error. May be you have any
>> ideas how to fix this?
>>

Re: Problem with K-Means clustering on Amazon EMR

Posted by Frank Scholten <fr...@frankscholten.nl>.

Hi Konstantin,

Good to hear from you.

The link you mentioned points to EigenSeedGenerator not
RandomSeedGenerator. The problem seems to be with the call to

fs.getFileStatus(input).isDir()


It's been a while and I don't remember but perhaps you have to set
additional Hadoop fs properties to use S3. See
https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of
this by creating a small Java main app with that line of code and run it in
the debugger.

Cheers,

Frank



On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
<ks...@gmail.com>wrote:

> Hello!
>
> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
> Reduce. As input and output I use S3 Amazon file system. I specify all
> paths as "s3://bucket-name/folder-name".
>
> SparceVectorsFromSequenceFile works correctly with S3
> but when I start K-Means clustering job, I get this error:
>
> Exception in thread "main" java.lang.IllegalArgumentException: This
> file system object (hdfs://172.31.41.65:9000) does not support access
> to the request path
>
> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
> You possibly called FileSystem.get(conf) when you should have called
> FileSystem.get(uri, conf) to obtain a file system supporting your
> path.
>
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>         at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>         at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>         at
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
> of this a
>         at
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> I checked RandomSeedGenerator.buildRandom
> (
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
> )
> and I assume it has correct code:
>
> FileSystem fs = FileSystem.get(output.toUri(), conf);
>
>
> I can not run clustering because of this error. May be you have any
> ideas how to fix this?
>