You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Rob Podolski <ro...@yahoo.co.uk> on 2014/01/29 23:07:36 UTC
CanopyClusterer makes output folder OK then crashes and tell's me "Mkdirs Failed To Create (same) Output Folder"
Hi
I am trying out the canopy-clustering driver from Java using Mahout-0.8 and am getting a very odd error.
java.io.IOException: Mkdirs failed to create /test_clustering_output/clusters-0-final
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:896)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:884)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:876)
at org.apache.mahout.clustering.classify.ClusterClassifier.writePolicy(ClusterClassifier.java:234)
at org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:373)
at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:157)
at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:168)
at service.clustering.algorithms.CanopyClusterer.cluster(Unknown Source)
at service.clustering.ClusterRunner.doClustering(Unknown Source)
at test.service.NonJunitClustererTest.testClustering(Unknown Source)
at test.service.NonJunitClustererTest.main(Unknown Source)
Clustering failed: Mkdirs failed to create /test_clustering_output/clusters-0-final
Contrary to the message, the output folder /test_clustering_output/clusters-0-final HAS BEEN CREATED. If I do...
"hadoop fs -ls /test_clustering_output/clusters-0-final" I get...
Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r-- 1 rob supergroup 0 2014-01-29 21:33 /test_clustering_output/clusters-0-final/_SUCCESS
drwxr-xr-x - rob supergroup 0 2014-01-29 21:32 /test_clustering_output/clusters-0-final/_logs
-rw-r--r-- 1 rob supergroup 106 2014-01-29 21:33 /test_clustering_output/clusters-0-final/part-r-00000
---
I am running on a single node hadoop cluster on AWS/Ubuntu and I'm trying to run the driver from Java...
Configuration hfsConf = new Configuration();
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/core-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/hdfs-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/mapred-site.xml"));
try {
CanopyDriver.run(
hfsConf, // HS file system configuration
new Path(hadoopInputSequenceFile), // Input sequence file of geovectors
new Path(hadoopOutputFile), // Output file
dm, // Distance measure
t1, // Canopy T1 radius
t2, // Canopy T2 radius
true, // true to cluster the input vectors
0.0, // vectors having pdf below this value will not be clustered. Its value should be between 0 and 1
false); // execute sequentially if true
return true;
} catch (Exception e) {
e.printStackTrace();
}
Any help would be most appreciated. I have tried almost everything I can think of including switching off permissions in the hadoop config, ensuring that my hadoop.tmp folder has open permissions. Only remaining hunches are (a) perhaps the Configuration object does not have enough information (b) I am adding one or two separate jars to the HADOOP_CLASSPATH instead of trying to add all to the mahout-job jar.
Rob