You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yue Guan <pi...@gmail.com> on 2011/12/31 20:23:30 UTC

KMeansDriver output?

Hi, all

I'm learning Mahout, so plz bear me for some simple questions.

I can use KMeansDriver. But it has void return type. So is this indication that:
1. Designer prefer to use command line to control the process flow.
like followed by clusterdumper.
2. If in any case, we'd like to find the result in java. What we need
to do is find a subfold of name ending with "final"?

Best

--Yue

Re: KMeansDriver output?

Posted by Yue Guan <pi...@gmail.com>.
Thank you so much. That's want I need.

On Sat, Dec 31, 2011 at 11:37 PM, Paritosh Ranjan <pr...@xebia.com> wrote:
> KMeansDriver writes the output in the output directory provided to the run method.
>
> The output directory will store the final clusters ( formed in last iteration ) in cluster-*-final directory. This will hold the centroids of the clusters found.
>
> If runClustering variable is set to true, then the cluster output will be written in clusteredPoints directory ( inside output directory ). This will hold the list of vectors and the clusters they belong.
>
> If you want to get the information in a grouped manner i.e. grouping the vectors belonging to each cluster. Then you will have to use ClusterOutputPostProcessorDriver. In it, you will have to give the path of output given to KMeansDriver. ClusterOutputPostProcessor will create several directories with clusters name ( in the output path provided to its run method ) and write vectors belonging to each cluster inside its directory.
>
> You can read all this data to analyze the results.
> ________________________________________
> From: Yue Guan [pipehappy@gmail.com]
> Sent: Saturday, December 31, 2011 8:23 PM
> To: user@mahout.apache.org
> Subject: KMeansDriver output?
>
> Hi, all
>
> I'm learning Mahout, so plz bear me for some simple questions.
>
> I can use KMeansDriver. But it has void return type. So is this indication that:
> 1. Designer prefer to use command line to control the process flow.
> like followed by clusterdumper.
> 2. If in any case, we'd like to find the result in java. What we need
> to do is find a subfold of name ending with "final"?
>
> Best
>
> --Yue

RE: KMeansDriver output?

Posted by Paritosh Ranjan <pr...@xebia.com>.
KMeansDriver writes the output in the output directory provided to the run method.

The output directory will store the final clusters ( formed in last iteration ) in cluster-*-final directory. This will hold the centroids of the clusters found.

If runClustering variable is set to true, then the cluster output will be written in clusteredPoints directory ( inside output directory ). This will hold the list of vectors and the clusters they belong.

If you want to get the information in a grouped manner i.e. grouping the vectors belonging to each cluster. Then you will have to use ClusterOutputPostProcessorDriver. In it, you will have to give the path of output given to KMeansDriver. ClusterOutputPostProcessor will create several directories with clusters name ( in the output path provided to its run method ) and write vectors belonging to each cluster inside its directory.

You can read all this data to analyze the results.
________________________________________
From: Yue Guan [pipehappy@gmail.com]
Sent: Saturday, December 31, 2011 8:23 PM
To: user@mahout.apache.org
Subject: KMeansDriver output?

Hi, all

I'm learning Mahout, so plz bear me for some simple questions.

I can use KMeansDriver. But it has void return type. So is this indication that:
1. Designer prefer to use command line to control the process flow.
like followed by clusterdumper.
2. If in any case, we'd like to find the result in java. What we need
to do is find a subfold of name ending with "final"?

Best

--Yue