You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Paul Ingles <pa...@oobaloo.co.uk> on 2009/07/14 16:01:46 UTC

Re: Error with KMeans example in trunk (793894)

Hi,

The latest: I've updated to Subversion revision 793894 for trunk, the code compiles and runs all of its tests successfully (mvn install inside the project root/checkout dir).

If I then run the kmeans example:

$ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It finishes the Iteration 0 but then errors with the following:

09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs
java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
	at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
	at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
	at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering 

It then moves onto the Clustering phase and reports the following:

09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering 
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output/points Distance: org.apache.mahout.utils.EuclideanDistanceMeasure
09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.matrix.SparseVector
09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to process : 271
09/07/14 14:42:16 INFO mapred.JobClient: Running job: job_200907141434_0004
09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
09/07/14 14:42:28 INFO mapred.JobClient: Task Id : attempt_200907141434_0004_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
	at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
	at org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)
	at org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.google.gson.reflect.TypeToken
	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
	... 20 more

Again, not sure why it's not able to load the gson jar file, it's definitely in the dependencies folder and is included in the built mahout-*.job inside the lib folder.



On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <pa...@oobaloo.co.uk> wrote:
> I'm not sure I'm afraid, they were whilst I was building at home.
> 
> I've just updated trunk here and the current revision (793894) builds  
> successfully. I'm going to switch the cluster over to 0.20.0 and see  
> whether I can get the KMeans example to run without the GSon problem I  
> was having before.
> 
> Thanks again,
> Paul
> 
> 
> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
> 
> >
> > On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
> >
> >> Hi,
> >>
> >> I've been going over the kmeans stuff the last few days to try and  
> >> understand how it works, and how I might extend it to work with the  
> >> data I'm looking to process. It's taken me a while to get a basic  
> >> understanding of things, and really appreciate having lists like  
> >> this around for support.
> >>
> >> I need to be able to label the vectors: each vector holds (for a  
> >> document) a set of similarity scores across a number of attributes.  
> >> I did some searching around payloads (after coming across the term  
> >> in some comments) but couldn't see how I add a payload to the  
> >> Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65 
> >> ) that mentions the addition of the setName method to Vector. I've  
> >> tried building trunk, and although there were a few test failures  
> >> for other (seemingly unrelated) examples I continued and managed to  
> >> get the mahout-examples jar/job files built to give it a whirl.
> >
> > What were the errors?

Re: Error with KMeans example in trunk (793894)

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

r793974 adds another validity test to the isConverged() valid file 
filter. This will skip over any _log files that mysteriously get added 
to the clusters directories. Now, only files beginning with "part" and 
not ending with ".crc" will be processed.



Jeff Eastman wrote:
> Why are log files being written to the clusters directories? That is 
> not happening in my trunk checkout and putting any other files into 
> the clusters directories will break the isConverged() method and 
> probably also the mapper & reducer configure() methods.
>
>
> Grant Ingersoll wrote:
>> Are you running in standalone, pseudo-distributed or fully 
>> distributed mode in Hadoop?
>>
>> It looks like a permission error in Hadoop, but maybe we need to make 
>> sure we have appropriate access.  I'm not that familiar with the 
>> Hadoop permission capabilities.
>>
>> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
>>
>>> I'm definitely scratching my head now, although I think it's most 
>>> likely some kind of dodgy configuration/setup on the cluster I'm 
>>> using- if I run some of the other examples I get class loading 
>>> errors for the example classes!
>>>
>>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a 
>>> new checkout of Mahout trunk, and it compiled, tested, and ran 
>>> through the kmeans example without trouble.
>>>
>>> If I find out what causes the problem I'll let the list know.
>>>
>>> Thanks,
>>> Paul
>>>
>>> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>>>
>>>> Hi,
>>>>
>>>> The latest: I've updated to Subversion revision 793894 for trunk, 
>>>> the code compiles and runs all of its tests successfully (mvn 
>>>> install inside the project root/checkout dir).
>>>>
>>>> If I then run the kmeans example:
>>>>
>>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job 
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>
>>>> It finishes the Iteration 0 but then errors with the following:
>>>>
>>>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
>>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: 
>>>> Cannot open filename /user/pair/output/clusters-0/_logs
>>>> java.io.IOException: Cannot open filename 
>>>> /user/pair/output/clusters-0/_logs
>>>>     at 
>>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435) 
>>>>
>>>>     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>>>>     at 
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56) 
>>>>
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>
>>>>     at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>>
>>>> It then moves onto the Clustering phase and reports the following:
>>>>
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data 
>>>> Clusters In: output/clusters-0 Out: output/points Distance: 
>>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input 
>>>> Vectors: org.apache.mahout.matrix.SparseVector
>>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser 
>>>> for parsing the arguments. Applications should implement Tool for 
>>>> the same.
>>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to 
>>>> process : 271
>>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job: 
>>>> job_200907141434_0004
>>>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id : 
>>>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>>>>     at java.lang.ClassLoader.defineClass1(Native Method)
>>>>     at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>>>>     at 
>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) 
>>>>
>>>>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>>>     at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>>     at 
>>>> org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39) 
>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32) 
>>>>
>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> Caused by: java.lang.ClassNotFoundException: 
>>>> com.google.gson.reflect.TypeToken
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>>     ... 20 more
>>>>
>>>> Again, not sure why it's not able to load the gson jar file, it's 
>>>> definitely in the dependencies folder and is included in the built 
>>>> mahout-*.job inside the lib folder.
>>>>
>>>>
>>>>
>>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <pa...@oobaloo.co.uk> 
>>>> wrote:
>>>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>>>
>>>>> I've just updated trunk here and the current revision (793894) builds
>>>>> successfully. I'm going to switch the cluster over to 0.20.0 and see
>>>>> whether I can get the KMeans example to run without the GSon 
>>>>> problem I
>>>>> was having before.
>>>>>
>>>>> Thanks again,
>>>>> Paul
>>>>>
>>>>>
>>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>>>
>>>>>>
>>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been going over the kmeans stuff the last few days to try and
>>>>>>> understand how it works, and how I might extend it to work with the
>>>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>>>> understanding of things, and really appreciate having lists like
>>>>>>> this around for support.
>>>>>>>
>>>>>>> I need to be able to label the vectors: each vector holds (for a
>>>>>>> document) a set of similarity scores across a number of attributes.
>>>>>>> I did some searching around payloads (after coming across the term
>>>>>>> in some comments) but couldn't see how I add a payload to the
>>>>>>> Vector. I then stumbled on MAHOUT-65 
>>>>>>> (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>>>> ) that mentions the addition of the setName method to Vector. I've
>>>>>>> tried building trunk, and although there were a few test failures
>>>>>>> for other (seemingly unrelated) examples I continued and managed to
>>>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>>>
>>>>>> What were the errors?
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
>
>

Re: Error with KMeans example in trunk (793894)

Posted by Paul Ingles <pa...@oobaloo.co.uk>.

Hi,

The latest: I've updated to Subversion revision 793894 for trunk, the code compiles and runs all of its tests successfully (mvn install inside the project root/checkout dir).

If I then run the kmeans example:

$ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It finishes the Iteration 0 but then errors with the following:

09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs
java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
	at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
	at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
	at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering 

It then moves onto the Clustering phase and reports the following:

09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering 
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output/points Distance: org.apache.mahout.utils.EuclideanDistanceMeasure
09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.matrix.SparseVector
09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to process : 271
09/07/14 14:42:16 INFO mapred.JobClient: Running job: job_200907141434_0004
09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
09/07/14 14:42:28 INFO mapred.JobClient: Task Id : attempt_200907141434_0004_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
	at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
	at org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)
	at org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.google.gson.reflect.TypeToken
	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
	... 20 more

Again, not sure why it's not able to load the gson jar file, it's definitely in the dependencies folder and is included in the built mahout-*.job inside the lib folder.



On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <pa...@oobaloo.co.uk> wrote:
> I'm not sure I'm afraid, they were whilst I was building at home.
> 
> I've just updated trunk here and the current revision (793894) builds  
> successfully. I'm going to switch the cluster over to 0.20.0 and see  
> whether I can get the KMeans example to run without the GSon problem I  
> was having before.
> 
> Thanks again,
> Paul
> 
> 
> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
> 
> >
> > On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
> >
> >> Hi,
> >>
> >> I've been going over the kmeans stuff the last few days to try and  
> >> understand how it works, and how I might extend it to work with the  
> >> data I'm looking to process. It's taken me a while to get a basic  
> >> understanding of things, and really appreciate having lists like  
> >> this around for support.
> >>
> >> I need to be able to label the vectors: each vector holds (for a  
> >> document) a set of similarity scores across a number of attributes.  
> >> I did some searching around payloads (after coming across the term  
> >> in some comments) but couldn't see how I add a payload to the  
> >> Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65 
> >> ) that mentions the addition of the setName method to Vector. I've  
> >> tried building trunk, and although there were a few test failures  
> >> for other (seemingly unrelated) examples I continued and managed to  
> >> get the mahout-examples jar/job files built to give it a whirl.
> >
> > What were the errors?

Re: Error with KMeans example in trunk (793894)

Posted by Grant Ingersoll <gs...@apache.org>.

Are you running in standalone, pseudo-distributed or fully distributed  
mode in Hadoop?

It looks like a permission error in Hadoop, but maybe we need to make  
sure we have appropriate access.  I'm not that familiar with the  
Hadoop permission capabilities.

On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:

> I'm definitely scratching my head now, although I think it's most  
> likely some kind of dodgy configuration/setup on the cluster I'm  
> using- if I run some of the other examples I get class loading  
> errors for the example classes!
>
> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a  
> new checkout of Mahout trunk, and it compiled, tested, and ran  
> through the kmeans example without trouble.
>
> If I find out what causes the problem I'll let the list know.
>
> Thanks,
> Paul
>
> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>
>> Hi,
>>
>> The latest: I've updated to Subversion revision 793894 for trunk,  
>> the code compiles and runs all of its tests successfully (mvn  
>> install inside the project root/checkout dir).
>>
>> If I then run the kmeans example:
>>
>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job  
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>
>> It finishes the Iteration 0 but then errors with the following:
>>
>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException:  
>> Cannot open filename /user/pair/output/clusters-0/_logs
>> java.io.IOException: Cannot open filename /user/pair/output/ 
>> clusters-0/_logs
>> 	at org.apache.hadoop.hdfs.DFSClient 
>> $DFSInputStream.openInfo(DFSClient.java:1444)
>> 	at org.apache.hadoop.hdfs.DFSClient 
>> $DFSInputStream.<init>(DFSClient.java:1435)
>> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>> 	at  
>> org 
>> .apache 
>> .hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java: 
>> 178)
>> 	at org.apache.hadoop.io.SequenceFile 
>> $Reader.openFile(SequenceFile.java:1437)
>> 	at org.apache.hadoop.io.SequenceFile 
>> $Reader.<init>(SequenceFile.java:1424)
>> 	at org.apache.hadoop.io.SequenceFile 
>> $Reader.<init>(SequenceFile.java:1417)
>> 	at org.apache.hadoop.io.SequenceFile 
>> $Reader.<init>(SequenceFile.java:1412)
>> 	at  
>> org 
>> .apache 
>> .mahout 
>> .clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
>> 	at  
>> org 
>> .apache 
>> .mahout 
>> .clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
>> 	at  
>> org 
>> .apache 
>> .mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
>> 	at  
>> org 
>> .apache 
>> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
>> 	at  
>> org 
>> .apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java: 
>> 56)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at  
>> sun 
>> .reflect 
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at  
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>
>> It then moves onto the Clustering phase and reports the following:
>>
>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data  
>> Clusters In: output/clusters-0 Out: output/points Distance:  
>> org.apache.mahout.utils.EuclideanDistanceMeasure
>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input  
>> Vectors: org.apache.mahout.matrix.SparseVector
>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser  
>> for parsing the arguments. Applications should implement Tool for  
>> the same.
>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to  
>> process : 271
>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job:  
>> job_200907141434_0004
>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id :  
>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>> 	at java.lang.ClassLoader.defineClass1(Native Method)
>> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>> 	at  
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 
>> 124)
>> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>> 	at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>> 	at  
>> org 
>> .apache 
>> .mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)
>> 	at  
>> org 
>> .apache 
>> .mahout 
>> .clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java: 
>> 198)
>> 	at  
>> org 
>> .apache 
>> .mahout 
>> .clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java: 
>> 39)
>> 	at  
>> org 
>> .apache 
>> .mahout 
>> .clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java: 
>> 32)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:  
>> com.google.gson.reflect.TypeToken
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>> 	... 20 more
>>
>> Again, not sure why it's not able to load the gson jar file, it's  
>> definitely in the dependencies folder and is included in the built  
>> mahout-*.job inside the lib folder.
>>
>>
>>
>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <pa...@oobaloo.co.uk>  
>> wrote:
>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>
>>> I've just updated trunk here and the current revision (793894)  
>>> builds
>>> successfully. I'm going to switch the cluster over to 0.20.0 and see
>>> whether I can get the KMeans example to run without the GSon  
>>> problem I
>>> was having before.
>>>
>>> Thanks again,
>>> Paul
>>>
>>>
>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>
>>>>
>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've been going over the kmeans stuff the last few days to try and
>>>>> understand how it works, and how I might extend it to work with  
>>>>> the
>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>> understanding of things, and really appreciate having lists like
>>>>> this around for support.
>>>>>
>>>>> I need to be able to label the vectors: each vector holds (for a
>>>>> document) a set of similarity scores across a number of  
>>>>> attributes.
>>>>> I did some searching around payloads (after coming across the term
>>>>> in some comments) but couldn't see how I add a payload to the
>>>>> Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>> ) that mentions the addition of the setName method to Vector. I've
>>>>> tried building trunk, and although there were a few test failures
>>>>> for other (seemingly unrelated) examples I continued and managed  
>>>>> to
>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>
>>>> What were the errors?
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Error with KMeans example in trunk (793894)

Posted by Paul Ingles <pa...@oobaloo.co.uk>.

That was running fully distributed (albeit on a 5-node Mac Pro  
cluster). I'm now running standalone and it works fine. When i looked  
initially the file was available, and accessible to the user that was  
submitting the job. I need to setup a more permanent cluster on 0.20  
and will try again with that.

On 14 Jul 2009, at 16:38, Grant Ingersoll wrote:

> Are you running in standalone, pseudo-distributed or fully  
> distributed mode in Hadoop?
>
> It looks like a permission error in Hadoop, but maybe we need to  
> make sure we have appropriate access.  I'm not that familiar with  
> the Hadoop permission capabilities.
>
> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
>
>> I'm definitely scratching my head now, although I think it's most  
>> likely some kind of dodgy configuration/setup on the cluster I'm  
>> using- if I run some of the other examples I get class loading  
>> errors for the example classes!
>>
>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a  
>> new checkout of Mahout trunk, and it compiled, tested, and ran  
>> through the kmeans example without trouble.
>>
>> If I find out what causes the problem I'll let the list know.
>>
>> Thanks,
>> Paul
>>
>> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>>
>>> Hi,
>>>
>>> The latest: I've updated to Subversion revision 793894 for trunk,  
>>> the code compiles and runs all of its tests successfully (mvn  
>>> install inside the project root/checkout dir).
>>>
>>> If I then run the kmeans example:
>>>
>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job  
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>
>>> It finishes the Iteration 0 but then errors with the following:
>>>
>>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input  
>>> records=449
>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException:  
>>> Cannot open filename /user/pair/output/clusters-0/_logs
>>> java.io.IOException: Cannot open filename /user/pair/output/ 
>>> clusters-0/_logs
>>> 	at org.apache.hadoop.hdfs.DFSClient 
>>> $DFSInputStream.openInfo(DFSClient.java:1444)
>>> 	at org.apache.hadoop.hdfs.DFSClient 
>>> $DFSInputStream.<init>(DFSClient.java:1435)
>>> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>>> 	at  
>>> org 
>>> .apache 
>>> .hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java: 
>>> 178)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.openFile(SequenceFile.java:1437)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1424)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1417)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1412)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at  
>>> sun 
>>> .reflect 
>>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at  
>>> sun 
>>> .reflect 
>>> .DelegatingMethodAccessorImpl 
>>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>
>>> It then moves onto the Clustering phase and reports the following:
>>>
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data  
>>> Clusters In: output/clusters-0 Out: output/points Distance:  
>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input  
>>> Vectors: org.apache.mahout.matrix.SparseVector
>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser  
>>> for parsing the arguments. Applications should implement Tool for  
>>> the same.
>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths  
>>> to process : 271
>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job:  
>>> job_200907141434_0004
>>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id :  
>>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>>> 	at java.lang.ClassLoader.defineClass1(Native Method)
>>> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>>> 	at  
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 
>>> 124)
>>> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>> 	at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.matrix.AbstractVector.asFormatString(AbstractVector.java: 
>>> 374)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java: 
>>> 198)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering 
>>> .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering 
>>> .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
>>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> Caused by: java.lang.ClassNotFoundException:  
>>> com.google.gson.reflect.TypeToken
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>> 	... 20 more
>>>
>>> Again, not sure why it's not able to load the gson jar file, it's  
>>> definitely in the dependencies folder and is included in the built  
>>> mahout-*.job inside the lib folder.
>>>
>>>
>>>
>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <pa...@oobaloo.co.uk>  
>>> wrote:
>>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>>
>>>> I've just updated trunk here and the current revision (793894)  
>>>> builds
>>>> successfully. I'm going to switch the cluster over to 0.20.0 and  
>>>> see
>>>> whether I can get the KMeans example to run without the GSon  
>>>> problem I
>>>> was having before.
>>>>
>>>> Thanks again,
>>>> Paul
>>>>
>>>>
>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>>
>>>>>
>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been going over the kmeans stuff the last few days to try  
>>>>>> and
>>>>>> understand how it works, and how I might extend it to work with  
>>>>>> the
>>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>>> understanding of things, and really appreciate having lists like
>>>>>> this around for support.
>>>>>>
>>>>>> I need to be able to label the vectors: each vector holds (for a
>>>>>> document) a set of similarity scores across a number of  
>>>>>> attributes.
>>>>>> I did some searching around payloads (after coming across the  
>>>>>> term
>>>>>> in some comments) but couldn't see how I add a payload to the
>>>>>> Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>>> ) that mentions the addition of the setName method to Vector.  
>>>>>> I've
>>>>>> tried building trunk, and although there were a few test failures
>>>>>> for other (seemingly unrelated) examples I continued and  
>>>>>> managed to
>>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>>
>>>>> What were the errors?
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

Re: Error with KMeans example in trunk (793894)

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

Why are log files being written to the clusters directories? That is not 
happening in my trunk checkout and putting any other files into the 
clusters directories will break the isConverged() method and probably 
also the mapper & reducer configure() methods.


Grant Ingersoll wrote:
> Are you running in standalone, pseudo-distributed or fully distributed 
> mode in Hadoop?
>
> It looks like a permission error in Hadoop, but maybe we need to make 
> sure we have appropriate access.  I'm not that familiar with the 
> Hadoop permission capabilities.
>
> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
>
>> I'm definitely scratching my head now, although I think it's most 
>> likely some kind of dodgy configuration/setup on the cluster I'm 
>> using- if I run some of the other examples I get class loading errors 
>> for the example classes!
>>
>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a 
>> new checkout of Mahout trunk, and it compiled, tested, and ran 
>> through the kmeans example without trouble.
>>
>> If I find out what causes the problem I'll let the list know.
>>
>> Thanks,
>> Paul
>>
>> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>>
>>> Hi,
>>>
>>> The latest: I've updated to Subversion revision 793894 for trunk, 
>>> the code compiles and runs all of its tests successfully (mvn 
>>> install inside the project root/checkout dir).
>>>
>>> If I then run the kmeans example:
>>>
>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job 
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>
>>> It finishes the Iteration 0 but then errors with the following:
>>>
>>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: 
>>> Cannot open filename /user/pair/output/clusters-0/_logs
>>> java.io.IOException: Cannot open filename 
>>> /user/pair/output/clusters-0/_logs
>>>     at 
>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444) 
>>>
>>>     at 
>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435) 
>>>
>>>     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>>>     at 
>>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) 
>>>
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437) 
>>>
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56) 
>>>
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>
>>>     at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>
>>> It then moves onto the Clustering phase and reports the following:
>>>
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data 
>>> Clusters In: output/clusters-0 Out: output/points Distance: 
>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input 
>>> Vectors: org.apache.mahout.matrix.SparseVector
>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser 
>>> for parsing the arguments. Applications should implement Tool for 
>>> the same.
>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to 
>>> process : 271
>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job: 
>>> job_200907141434_0004
>>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id : 
>>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>>>     at java.lang.ClassLoader.defineClass1(Native Method)
>>>     at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>>>     at 
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>>>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>>     at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>     at 
>>> org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39) 
>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32) 
>>>
>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> Caused by: java.lang.ClassNotFoundException: 
>>> com.google.gson.reflect.TypeToken
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>     ... 20 more
>>>
>>> Again, not sure why it's not able to load the gson jar file, it's 
>>> definitely in the dependencies folder and is included in the built 
>>> mahout-*.job inside the lib folder.
>>>
>>>
>>>
>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <pa...@oobaloo.co.uk> 
>>> wrote:
>>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>>
>>>> I've just updated trunk here and the current revision (793894) builds
>>>> successfully. I'm going to switch the cluster over to 0.20.0 and see
>>>> whether I can get the KMeans example to run without the GSon problem I
>>>> was having before.
>>>>
>>>> Thanks again,
>>>> Paul
>>>>
>>>>
>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>>
>>>>>
>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been going over the kmeans stuff the last few days to try and
>>>>>> understand how it works, and how I might extend it to work with the
>>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>>> understanding of things, and really appreciate having lists like
>>>>>> this around for support.
>>>>>>
>>>>>> I need to be able to label the vectors: each vector holds (for a
>>>>>> document) a set of similarity scores across a number of attributes.
>>>>>> I did some searching around payloads (after coming across the term
>>>>>> in some comments) but couldn't see how I add a payload to the
>>>>>> Vector. I then stumbled on MAHOUT-65 
>>>>>> (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>>> ) that mentions the addition of the setName method to Vector. I've
>>>>>> tried building trunk, and although there were a few test failures
>>>>>> for other (seemingly unrelated) examples I continued and managed to
>>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>>
>>>>> What were the errors?
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>