You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Terry Blankers <te...@amritanet.com> on 2014/03/19 04:25:10 UTC
clusterdump samplePoints parameter
Hi all,
Can someone please answer a quick question about the --samplePoints
parameter in the clusterdump utility? I understand it specifies the
number of points returned per cluster. But are the points per cluster
ordered or ranked in any way before this truncation occurs?
Thanks,
Terry
Re: clusterdump samplePoints parameter
Posted by Terry Blankers <te...@amritanet.com>.
I understand that part. What I'm unclear on is if there is any ranking
or ordering of the points in each cluster before they are limited. In
other words, are the points in each cluster random ordered? Or ordered
alphabetically by the document id or filename? Or ordered by some
calculation as to how they contributed mathematically to the formation
of the cluster?
Thanks,
Terry
On 3/18/14, 9:41 PM, Suneel Marthi wrote:
> Its the max. no. of points to include from each cluster in the clusterdump. If not specified all points would be included.
>
>
>
>
>
> On Tuesday, March 18, 2014 11:25 PM, Terry Blankers <te...@amritanet.com> wrote:
>
> Hi all,
>
> Can someone please answer a quick question about the --samplePoints
> parameter in the clusterdump utility? I understand it specifies the
> number of points returned per cluster. But are the points per cluster
> ordered or ranked in any way before this truncation occurs?
>
> Thanks,
>
> Terry
Re: clusterdump samplePoints parameter
Posted by Terry Blankers <te...@amritanet.com>.
Can you please clarify as to whether the points are somehow ordered if
the number of points are specified? In other words, suppose I set max
points = 100 and there are 1000 points in a cluster. Which 100 of the
1000 points are returned? Alphanumeric sort of point ID, etc?
On 3/18/14, 11:41 PM, Suneel Marthi wrote:
> Its the max. no. of points to include from each cluster in the clusterdump. If not specified all points would be included.
>
>
>
>
>
> On Tuesday, March 18, 2014 11:25 PM, Terry Blankers <te...@amritanet.com> wrote:
>
> Hi all,
>
> Can someone please answer a quick question about the --samplePoints
> parameter in the clusterdump utility? I understand it specifies the
> number of points returned per cluster. But are the points per cluster
> ordered or ranked in any way before this truncation occurs?
>
> Thanks,
>
> Terry
Re: clusterdump samplePoints parameter
Posted by Suneel Marthi <su...@yahoo.com>.
Its the max. no. of points to include from each cluster in the clusterdump. If not specified all points would be included.
On Tuesday, March 18, 2014 11:25 PM, Terry Blankers <te...@amritanet.com> wrote:
Hi all,
Can someone please answer a quick question about the --samplePoints
parameter in the clusterdump utility? I understand it specifies the
number of points returned per cluster. But are the points per cluster
ordered or ranked in any way before this truncation occurs?
Thanks,
Terry