You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Rob Podolski <ro...@yahoo.co.uk> on 2014/01/29 23:07:36 UTC

CanopyClusterer makes output folder OK then crashes and tell's me "Mkdirs Failed To Create (same) Output Folder"

Hi

I am trying out the canopy-clustering driver from Java using Mahout-0.8 and am getting a very odd error.

java.io.IOException: Mkdirs failed to create /test_clustering_output/clusters-0-final
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:896)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:884)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:876)
        at org.apache.mahout.clustering.classify.ClusterClassifier.writePolicy(ClusterClassifier.java:234)
        at org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:373)
        at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:157)
        at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:168)
        at service.clustering.algorithms.CanopyClusterer.cluster(Unknown Source)
        at service.clustering.ClusterRunner.doClustering(Unknown Source)
        at test.service.NonJunitClustererTest.testClustering(Unknown Source)
        at test.service.NonJunitClustererTest.main(Unknown Source)
Clustering failed: Mkdirs failed to create /test_clustering_output/clusters-0-final

Contrary to the message, the output folder /test_clustering_output/clusters-0-final HAS BEEN CREATED. If I do...

"hadoop fs -ls /test_clustering_output/clusters-0-final" I get...

Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r--   1 rob supergroup          0 2014-01-29 21:33 /test_clustering_output/clusters-0-final/_SUCCESS
drwxr-xr-x   - rob supergroup          0 2014-01-29 21:32 /test_clustering_output/clusters-0-final/_logs
-rw-r--r--   1 rob supergroup        106 2014-01-29 21:33 /test_clustering_output/clusters-0-final/part-r-00000

---
I am running on a single node hadoop cluster on AWS/Ubuntu and I'm trying to run the driver from Java...

Configuration hfsConf = new Configuration();
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/core-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/hdfs-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/mapred-site.xml"));
try {
    CanopyDriver.run(
        hfsConf,     // HS file system configuration 
        new Path(hadoopInputSequenceFile),   // Input sequence file of geovectors
        new Path(hadoopOutputFile),  // Output file
        dm,                          // Distance measure
        t1,                         // Canopy T1 radius
        t2,                            // Canopy T2 radius 
        true,                         // true to cluster the input vectors
        0.0,                         // vectors having pdf below this value will not be clustered. Its value should be between 0 and 1
        false);                        //  execute sequentially if true
    return true;
} catch (Exception e) {
    e.printStackTrace();
} 

Any help would be most appreciated.  I have tried almost everything I can think of including switching off permissions in the hadoop config,  ensuring that my hadoop.tmp folder has open permissions. Only remaining hunches are (a) perhaps the Configuration object does not have enough information (b) I am adding one or two separate jars to the HADOOP_CLASSPATH instead of trying to add all to the mahout-job jar. 

Rob