You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Rachana <ra...@gmail.com> on 2011/11/25 07:36:43 UTC

Re:ClusteredPoints

Hi all,

I am new to Mahout.
I have successfully run kmeans in mahout using Synthetic Control Data. 
I wish to see the mapping information present in clusteredPoints
directory. 
Is there any way to extract the data present in clusteredPoints
directory to Text file
(as we do for the clusters directory  using clusterdump tool)?

Anyhelp is appreciated.

Thank you,
Rachana


Re: ClusteredPoints

Posted by Paritosh Ranjan <pr...@xebia.com>.
Run this code after the kmeans clustering is done.

I have arranged code so that you can simply use the process method by 
supplying it the path of clusteredPoints directory inside the output 
path for clustering, the hadoop fileSystem and Configuration.

   //use clusterId and vector here to write to a local file.

At this line you will get the clusterId and vector. Use it to write to 
the file.


public void process(Path clusteredPoints, FileSystem fileSystem, 
Configuration conf){
  FileStatus[] partFiles = getAllClusteredPointPartFiles();
     for (FileStatus partFile : partFiles) {
       SequenceFile.Reader clusteredPointsReader = new 
SequenceFile.Reader(fileSystem, partFile.getPath(),
           conf);
       WritableComparable clusterIdAsKey = (WritableComparable) 
clusteredPointsReader.getKeyClass()
           .newInstance();
       Writable vector = (Writable) 
clusteredPointsReader.getValueClass().newInstance();
       while (clusteredPointsReader.next(clusterIdAsKey, vector)) {
         //use clusterId and vector here to write to a local file.

       }
       clusteredPointsReader.close();
     }
   }
}

  private FileStatus[] getAllClusteredPointPartFiles(Path 
clusteredPoints, FileSystem fileSystem) throws IOException {
     Path[] partFilePaths = 
FileUtil.stat2Paths(fileSystem.globStatus(clusteredPoints,
       PathFilters.partFilter()));
     FileStatus[] partFileStatuses = 
fileSystem.listStatus(partFilePaths, PathFilters.partFilter());
     return partFileStatuses;
   }

Paritosh


On 25-11-2011 12:27, Rachana wrote:
> Hi Ranjan,
>
> Thank you for your response, but as I am newbee I am kind of confused a bit!
> Where should I include this code?
> Or should I run this as a seperate program.
>
>
> Rachana.
>
>
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4037 - Release Date: 11/24/11


Re: ClusteredPoints

Posted by Rachana <ra...@gmail.com>.
Hi Ranjan,

Thank you for your response, but as I am newbee I am kind of confused a bit!
Where should I include this code?
Or should I run this as a seperate program.


Rachana.




Re: ClusteredPoints

Posted by Paritosh Ranjan <pr...@xebia.com>.
I just read the file and write it on disk. Its pretty easy.

  FileStatus[] partFiles = getAllClusteredPointPartFiles();
     for (FileStatus partFile : partFiles) {
       SequenceFile.Reader clusteredPointsReader = new SequenceFile.Reader(fileSystem, partFile.getPath(),
           conf);
       WritableComparable clusterIdAsKey = (WritableComparable) clusteredPointsReader.getKeyClass()
           .newInstance();
       Writable vector = (Writable) clusteredPointsReader.getValueClass().newInstance();
       while (clusteredPointsReader.next(clusterIdAsKey, vector)) {
         //use clusterId and vector here

       }

       clusteredPointsReader.close();
       closeWriters();
     }

   }


Paritosh


On 25-11-2011 12:06, Rachana wrote:
> Hi all,
>
> I am new to Mahout.
> I have successfully run kmeans in mahout using Synthetic Control Data.
> I wish to see the mapping information present in clusteredPoints
> directory.
> Is there any way to extract the data present in clusteredPoints
> directory to Text file
> (as we do for the clusters directory  using clusterdump tool)?
>
> Anyhelp is appreciated.
>
> Thank you,
> Rachana
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4037 - Release Date: 11/24/11