You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Andrey Davydov (JIRA)" <ji...@apache.org> on 2012/12/17 16:02:12 UTC

[jira] [Updated] (MAHOUT-1128) MAHOUT-999 issue still actual

     [ https://issues.apache.org/jira/browse/MAHOUT-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrey Davydov updated MAHOUT-1128:
-----------------------------------

    Environment: 
I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "client" machine and it submits tasks to cluster.



  was:
I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "clien" machine and it submit tasks to cluster.



    
>  MAHOUT-999 issue still actual
> ------------------------------
>
>                 Key: MAHOUT-1128
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1128
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>         Environment: I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
> I run my application from separated "client" machine and it submits tasks to cluster.
>            Reporter: Andrey Davydov
>
> I'm sorry my english is not well and I'm newbie with Mahout. But it seems that MAHOUT-999 issue still actual.
> I use mahout-core 0.7 loaded from maven-central and I've got the same fail. 
> I've investigate sources and found following in the org.apache.mahout.clustering.classify.ClusterClassifier class:
>   public void writeToSeqFiles(Path path) throws IOException {
>     writePolicy(policy, path);
>     Configuration config = new Configuration();
>     FileSystem fs = FileSystem.get(path.toUri(), config);
>     SequenceFile.Writer writer = null;
>     ClusterWritable cw = new ClusterWritable();
>     for (int i = 0; i < models.size(); i++) {
> ...
>       } finally {
>         Closeables.closeQuietly(writer);
>       }
>     }
>   }
>   
>   public void readFromSeqFiles(Configuration conf, Path path) throws IOException {
>     Configuration config = new Configuration();
>     List<Cluster> clusters = Lists.newArrayList();
>     for (ClusterWritable cw : new SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
>         PathFilters.logsCRCFilter(), config)) {
> ...
>     }
>     this.models = clusters;
>     modelClass = models.get(0).getClass().getName();
>     this.policy = readPolicy(path);
>   }
> Both methods use new default Configuration and they try to work with local file system. I.e. KMeansDriver wrote initial clusters to local file system of the "client" system and CIMapper try to read it from cluster node local file system.
> It seems that current implementation can work only pseudo-distributed hadoop system. I think that ClusterClassifier should store intermediate results in the HDFS using Configuration passed by api from user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira