You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Terry Blankers <te...@amritanet.com> on 2014/03/19 04:25:10 UTC

clusterdump samplePoints parameter

Hi all,

Can someone please answer a quick question about the --samplePoints 
parameter in the clusterdump utility? I understand it specifies the 
number of points returned per cluster. But are the points per cluster 
ordered or ranked in any way before this truncation occurs?

Thanks,

Terry

Re: clusterdump samplePoints parameter

Posted by Terry Blankers <te...@amritanet.com>.
I understand that part. What I'm unclear on is if there is any ranking 
or ordering of the points in each cluster before they are limited. In 
other words, are the points in each cluster random ordered? Or ordered 
alphabetically by the document id or filename? Or ordered by some 
calculation as to how they contributed mathematically to the formation 
of the cluster?

Thanks,

Terry



On 3/18/14, 9:41 PM, Suneel Marthi wrote:
> Its the max. no. of points to include from each cluster in the clusterdump. If not specified all points would be included.
>
>
>
>
>
> On Tuesday, March 18, 2014 11:25 PM, Terry Blankers <te...@amritanet.com> wrote:
>   
> Hi all,
>
> Can someone please answer a quick question about the --samplePoints
> parameter in the clusterdump utility? I understand it specifies the
> number of points returned per cluster. But are the points per cluster
> ordered or ranked in any way before this truncation occurs?
>
> Thanks,
>
> Terry


Re: clusterdump samplePoints parameter

Posted by Terry Blankers <te...@amritanet.com>.
Can you please clarify as to whether the points are somehow ordered if 
the number of points are specified? In other words, suppose I set max 
points = 100 and there are 1000 points in a cluster. Which 100 of the 
1000 points are returned? Alphanumeric sort of point ID, etc?



On 3/18/14, 11:41 PM, Suneel Marthi wrote:
> Its the max. no. of points to include from each cluster in the clusterdump. If not specified all points would be included.
>
>
>
>
>
> On Tuesday, March 18, 2014 11:25 PM, Terry Blankers <te...@amritanet.com> wrote:
>   
> Hi all,
>
> Can someone please answer a quick question about the --samplePoints
> parameter in the clusterdump utility? I understand it specifies the
> number of points returned per cluster. But are the points per cluster
> ordered or ranked in any way before this truncation occurs?
>
> Thanks,
>
> Terry


Re: clusterdump samplePoints parameter

Posted by Suneel Marthi <su...@yahoo.com>.
Its the max. no. of points to include from each cluster in the clusterdump. If not specified all points would be included.





On Tuesday, March 18, 2014 11:25 PM, Terry Blankers <te...@amritanet.com> wrote:
 
Hi all,

Can someone please answer a quick question about the --samplePoints 
parameter in the clusterdump utility? I understand it specifies the 
number of points returned per cluster. But are the points per cluster 
ordered or ranked in any way before this truncation occurs?

Thanks,

Terry