You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Niko Gamulin <ni...@gmail.com> on 2014/10/26 22:19:21 UTC

Problems with K-Means Spectral Clustering on EMR

Hi,

I tried to run Spectral clustering example from mahout website on EMR.

I uploaded to the bucket the following files:
affinity.txt (affinity matrix)
mahout-core-0.9-job.jar
mahout-core-0.9.jar
update-lucene.sh
lucene-4.3.0.tgz

The update-lucene.sh contains the following:

#!/bin/bash
cd /home/hadoop
wget https://s3.amazonaws.com/hellomahout/lucene-4.3.0.tgz
tar -xzf lucene-4.3.0.tgz
cd lib
rm lucene-*.jar
cd ..
cd lucene-4.3.0
find . | grep lucene- | grep jar$ | xargs -I {} cp {} ../lib

The Cluster configuration is the following:

Hadoop Distribution: Amazon, AMI version: 3.2.1

EC" instance types:
Master: m1.large, 1
Core: m1.large, 1
Task: None (m1.medium,1)

Bootstrap Actions:
Custom action, S3 location: s3://hellomahout/update-lucene.sh

Steps:

Custom JAR, JAR location: s3://hellomahout/mahout-core-0.9-job.jar,
Arguments: org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver
--input s3://hellomahout/testdata/affinity.txt --output
s3://hellomahout/testdata/results -d 3 -k 2 -x 10

When I try to run it, I get the following exception:

Exception in thread "main" java.io.FileNotFoundException: No such file
or directory 'hdfs://172.31.1.27:9000/user/hadoop/temp/calculations/unitvectors'
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:759)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:507)
    at org.apache.mahout.clustering.kmeans.EigenSeedGenerator.buildFromEigens(EigenSeedGenerator.java:67)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:243)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:127)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:118)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



Does anyone know what causes the exception?
Could anyone provide any suggestions about how to run spectral clustering
on EMR?

Thank you.

Niko